Thursday, August 12, 2010

Geospatial Analytics using Teradata: Part II - Railinc Source Systems

[This is Part II in a series of posts on Railinc's venture into geospatial analytics. See Part I.]

Before getting into details of the various analytics that Railinc is working on, I need to explain the source data behind these analytics. Ralinc is basically a large data store of information that is received from various parties in the North American rail industry. We process, distribute, and store large volumes of data on a daily basis. Roughly 3 million messages are received from the industry which can translate into 9 million records to process daily. The data is categorized in four ways:
  • Asset - rail cars and all attributes for those rail cars
  • Asset health - damage and repair information for assets
  • Movement - location and logistic information for assets
  • Industry reference - supporting data for assets including stations, commodities, routes, etc.
Assets (rail cars) are at the center of almost all of Railinc's applications. We keep the inventory of the nearly 2 million rail cars in North America. For the most part, data we receive either has an asset component or in some way supports asset-based applications. The analytics that we are currently creating from this data falls into three main categories: 1) logistics, 2) management/utilization, and 3) health.

Logistics is an easy one because movement information encompasses the bulk of the data we receive on a daily basis. If there is a question about the last reported location of a rail car, we can answer it. The key there is "last reported location." Currently we receive notifications from the industry whenever a predefined event occurs. These events tend to occur at particular locations (e.g., stations). In between those locations is a black hole for us. At least for now, that is. More and more rail cars are being equipped with GPS devices that can pin point a car's exact location at any point in time. We are now working with the industry to start receiving such data to fill in that black hole.

Management/utilization requires more information than just location, however. If a car is currently moving and is loaded with a commodity then it is making money for its owner; if it is sitting empty somewhere then it is not. Using information provided by Railinc, car owners and the industry as a whole can get a better view into how fleets of cars are being used.

Finally, asset health analytics provide another dimension into the view of the North American fleet. Railinc, through its sister organization TTCI, has access to events recorded by track-side detectors. These devices can detect, among others, wheel problems at speed. TTCI performs some initial analysis on these events before forwarding them on to Railinc which then creates alert messages that are sent to subscribers. Railinc will also collect data on repairs that are performed on rail cars. With a history of such events we can perform degradation analytics to help the industry better understand the life-cycle of assets and asset components.

Railinc is unique in the rail industry in that it can be viewed as a data store of various information. We are just now starting to tap into this data to get a unique view into the industry. Future posts will examine some of these efforts.

No comments: