[转] Making GTFS query more convenient

url：http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/

这简直说出了我的心声。

I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.

One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.

This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this

SELECT

 shape_id

FROM route

 JOIN trips

 JOIN shape

GROUP BY shape_id;

To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.

SELECT

 shape_id, stop_id

FROM route

 JOIN trips

 JOIN stop_times

 JOIN stops

GROUP BY shape_id, stop_id;

Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.

    GTFS             More Useful

  Structure           Structure

    route              line

     |                   |

     |                   V

     |                 route*

     |                   | \

     |    shape          |  +-> route_shape

     |     ^             |  |

     |    /              |  +-> route_stops*

     |   /               |

     V  /                V

    trips              trips

     |                   |

     |        stops      |          stops

     |        ^          |

     |       /           |

     V      /            V

    stop_times         stop_times

Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.

The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.

[转] Making GTFS query more convenient的更多相关文章

Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
Using dojo/query（翻译）
In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...
Query classification; understanding user intent
http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...
The 5th tip of DB Query Analyzer
The 5th tip of DB Query Analyzer Ma Genfeng (Guangdong UnitollServices incorporated, G ...
Data access between different DBMS and other txt/csv data source by DB Query Analyzer
1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...
How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer
How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...
Install and run DB Query Analyzer 6.04 on Microsoft Windows 10
Install and run DB Query Analyzer 6.04 on Microsoft Windows 10 DB Query Analyzer is presented ...
DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published
DB Query Analyzer 6.04 is distributed,78 articles concerned have been published DB Query Analyz ...
FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】
官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...

随机推荐

Day14 summary
Since I am writing blog in Ubuntu which has not installed Chinese language package, this blog will b ...
JS中 obj.style.left 与 obj.offsetLeft 的区别
1.obj.style.left 取得是字符串:28px; 而 obj.offsetLeft取得的值为数字 28: 2.obj.style.left 需要事先将left值写在HTML中,不能致谢在CS ...
HDU 4777 Rabbit Kingdom （2013杭州赛区1008题，预处理，树状数组）
Rabbit Kingdom Time Limit: 6000/3000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)To ...
iOS-Gdata XML解析配置和简单使用
简单介绍使用废话少说直接上图就能看明白... 导入libxml2,使用第三方AFNetworking网络请求,第三方XML解析GData GData需要的配置 Build Settings 里搜索,添 ...
ExpandableListView的OnitemLongclickListener事件
expandableListView是带分组的Listview,通常会有setOnChildClickListener,setOnGroupClickListener,但如果是长按的事件,可以用以下方 ...
iTunesConnect进行App转移2-官方说明
Can I transfer an app to another developer's iTunes Connect account? Yes, you can transfer your app ...
C语言编译和链接过程
1.程序的编译一般而言,大多数编译系统都提供编译驱动程序(complier driver),根据用户需求调用语言预处理器,编译器,汇编器和链接器.例如有如下历程://main.c void swa ...
Javascript 中的 && 和 || 使用小结
准备两个对象用于下面的讨论. var alice = { name: "alice", toString: function () { return this.name; } }; ...
JS eval() 特殊用法
最近项目有有个模块有若干功能菜单,这些菜单查询部分都是一样的,所以就像提取一个通用的查询页面然后使用$('#ele').load('../**.aspx #searchID', {}, funct ...
Sqlserver数据库总结
由于公司项目需要这段时间一直在做有关于数据库方面的工作.趁这段时间有空,对数据库方面的知识进行一个梳理和归纳,以便以后需要时,查看起来方便. 使用的数据库主要有ORACLE10g和Sqlserver2 ...

[转] Making GTFS query more convenient

[转] Making GTFS query more convenient的更多相关文章

随机推荐

热门专题