[转] Making GTFS query more convenient
url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/
这简直说出了我的心声。
I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.
One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.
This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this
SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;
To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.
SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;
Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.
GTFS More Useful
Structure Structure route line
| |
| V
| route*
| | \
| shape | +-> route_shape
| ^ | |
| / | +-> route_stops*
| / |
V / V
trips trips
| |
| stops | stops
| ^ |
| / |
V / V
stop_times stop_times
Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.
The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.
[转] Making GTFS query more convenient的更多相关文章
- Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
- Using dojo/query(翻译)
In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...
- Query classification; understanding user intent
http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...
- The 5th tip of DB Query Analyzer
The 5th tip of DB Query Analyzer Ma Genfeng (Guangdong UnitollServices incorporated, G ...
- Data access between different DBMS and other txt/csv data source by DB Query Analyzer
1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...
- How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer
How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...
- Install and run DB Query Analyzer 6.04 on Microsoft Windows 10
Install and run DB Query Analyzer 6.04 on Microsoft Windows 10 DB Query Analyzer is presented ...
- DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published
DB Query Analyzer 6.04 is distributed,78 articles concerned have been published DB Query Analyz ...
- FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】
官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...
随机推荐
- C++类的嵌套(2)-访问权限和调用关系
类似于命名空间,一个类也是一个类命名空间.因此类嵌套的作用是帮助实现外层类,并且避免命名冲突. 对于命名空间(不再赘述可以参考<c++ prime plus>),其中定义的变量和函数的作 ...
- js 网站顶部通用导航
js代码: (function (scriptId, styleVersion) { var hotgameData = { 'title': '热门游戏', 'list': [ {'text': ' ...
- (实用篇)php通过会话控制实现身份验证实例
会话控制的思想就是指能够在网站中根据一个会话跟踪用户.这里整理了详细的代码,有需要的小伙伴可以参考下. 概述 http 协议是无状态的,对于每个请求,服务端无法区分用户.PHP 会话控制就是给了用户一 ...
- JAVA修饰符类型(public,protected,private,friendly)
转自:http://www.cnblogs.com/webapplee/p/3771708.html JAVA修饰符类型(public,protected,private,friendly) publ ...
- Spring源码学习之:ClassLoader学习(1)
转载:http://longdick.iteye.com/blog/442213 java应用环境中不同的class分别由不同的ClassLoader负责加载. 一个jvm中默认的classloade ...
- phpstorm内网远程debug
家里用的是广电的网络,接了无线路由,在192.168.1.1里面配置了端口转发,将9001转发到192.168.1.101 ,然后在php.ini中相应的配置了xdebug, phpstorm 启动监 ...
- CentOS7 登陆密码遗忘和修改
在虚拟机当中我们设置的linux密码可能会遗忘,那么我们怎么来找回密码,并且重置密码呢? 1:我们需要进入单用户模式才能修改密码 1:重启linux,进入grub界面,敲击空格键暂停 2:按 ...
- ocanvas 画板
使用ocanvas做了个简单的在线画板. ocanvas参考:http://ocanvas.org/ 效果如下: 主要代码如下: <!DOCTYPE html> <html> ...
- Newtonsoft.Json 的解析用法。
JsonView是查看和分析json的利器,目录下的Newtonsoft.Json.dll ,我们可以当第三方引用之. >>> //想服务器端发送请求,获取订单信息 myReques ...
- Apache Shiro 使用手册(三)Shiro 授权
授权即访问控制,它将判断用户在应用程序中对资源是否拥有相应的访问权限. 如,判断一个用户有查看页面的权限,编辑数据的权限,拥有某一按钮的权限,以及是否拥有打印的权限等等. 一.授权的三要素 授权有着三 ...