url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/

这简直说出了我的心声。

I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.

One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.

This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this

SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;

To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.

SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;

Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.

    GTFS             More Useful
  Structure           Structure     route              line
     |                   |
     |                   V
     |                 route*
     |                   | \
     |    shape          |  +-> route_shape
     |     ^             |  |
     |    /              |  +-> route_stops*
     |   /               |
     V  /                V
    trips              trips
     |                   |
     |        stops      |          stops
     |        ^          |
     |       /           |
     V      /            V
    stop_times         stop_times

Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.

The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.

[转] Making GTFS query more convenient的更多相关文章

  1. Spring Boot Reference Guide

    Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch,  ...

  2. Using dojo/query(翻译)

    In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...

  3. Query classification; understanding user intent

    http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...

  4. The 5th tip of DB Query Analyzer

    The 5th tip of DB Query Analyzer             Ma Genfeng   (Guangdong UnitollServices incorporated, G ...

  5. Data access between different DBMS and other txt/csv data source by DB Query Analyzer

        1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...

  6. How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer

    How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...

  7. Install and run DB Query Analyzer 6.04 on Microsoft Windows 10

          Install and run DB Query Analyzer 6.04 on Microsoft Windows 10  DB Query Analyzer is presented ...

  8. DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published

        DB Query Analyzer 6.04 is distributed,78 articles concerned have been published  DB Query Analyz ...

  9. FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】

    官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...

随机推荐

  1. activity 和 生命周期 :流程

    activity是android的一个基本的组件.讨论生命周期,taskstack等等的话题的时候.就不得不去看一下android framework层的源码了. 生命周期,实际就是系统调用andro ...

  2. LINUX系统编程 由REDIS的持久化机制联想到的子进程退出的相关问题

    19:22:01 2014-08-27 引言: 以前对wait waitpid 以及exit这几个函数只是大致上了解,但是看REDIS的AOF和RDB 2种持久化时 均要处理子进程运行完成退出和父进程 ...

  3. python的复制,深拷贝和浅拷贝的区别

    在python中,对象赋值实际上是对象的引用.当创建一个对象,然后把它赋给另一个变量的时候,python并没有拷贝这个对象,而只是拷贝了这个对象的引用 一般有三种方法, alist=[1,2,3,[& ...

  4. 数据库update死锁

    比较常见的死锁场景,并发批量update时的一个场景: update cross_marketing set gmtModified = NOW(), pageview = pageview+ #ex ...

  5. toad 9.6和toad 12.1工具使用比较

    toad是我工作中使用最频繁的软件之一,阴错阳差的把2个版本都装到了电脑上,使用过程中逐渐发现2者的差异,特此做下记录,以便提示自己和其他有需要的人们.(随时更新中······)由于能力有限,难免会有 ...

  6. Java笔记1-Java相关概念和如何实现跨平台

    一.Java相关概念 1.Java语言的核心特点跨平台面向对象 2.Java的历史版本JDK1.0,JDK1.1,JDK1.2....JDK5.0,JDK6.0,JDK7.0,JDK8.0 注意:JD ...

  7. LeetCode() Ugly Number II 背下来!

    一个别人,非常牛逼的思路,膜拜了!orz!!!! vector <int> results (1,1); int i = 0, j = 0, k = 0; while (results.s ...

  8. 关于tomcat7下websocket不能使用

    tomcat启动时提示 信息: JSR 356 WebSocket (Java WebSocket 1.0) support is not available when running on Java ...

  9. IplImage 结构解读(转)

    typedef struct _IplImage { int nSize;                             /* IplImage大小 */ int ID;           ...

  10. JSBinding / Home

    Description JSBinding is a tool enabling you to run actual javascript in Unity3D. It contains Mozill ...