Showing posts with label analytics. Show all posts
Showing posts with label analytics. Show all posts

Thursday, August 12, 2010

Geospatial Analytics using Teradata: Part II - Railinc Source Systems

[This is Part II in a series of posts on Railinc's venture into geospatial analytics. See Part I.]

Before getting into details of the various analytics that Railinc is working on, I need to explain the source data behind these analytics. Ralinc is basically a large data store of information that is received from various parties in the North American rail industry. We process, distribute, and store large volumes of data on a daily basis. Roughly 3 million messages are received from the industry which can translate into 9 million records to process daily. The data is categorized in four ways:
  • Asset - rail cars and all attributes for those rail cars
  • Asset health - damage and repair information for assets
  • Movement - location and logistic information for assets
  • Industry reference - supporting data for assets including stations, commodities, routes, etc.
Assets (rail cars) are at the center of almost all of Railinc's applications. We keep the inventory of the nearly 2 million rail cars in North America. For the most part, data we receive either has an asset component or in some way supports asset-based applications. The analytics that we are currently creating from this data falls into three main categories: 1) logistics, 2) management/utilization, and 3) health.

Logistics is an easy one because movement information encompasses the bulk of the data we receive on a daily basis. If there is a question about the last reported location of a rail car, we can answer it. The key there is "last reported location." Currently we receive notifications from the industry whenever a predefined event occurs. These events tend to occur at particular locations (e.g., stations). In between those locations is a black hole for us. At least for now, that is. More and more rail cars are being equipped with GPS devices that can pin point a car's exact location at any point in time. We are now working with the industry to start receiving such data to fill in that black hole.

Management/utilization requires more information than just location, however. If a car is currently moving and is loaded with a commodity then it is making money for its owner; if it is sitting empty somewhere then it is not. Using information provided by Railinc, car owners and the industry as a whole can get a better view into how fleets of cars are being used.

Finally, asset health analytics provide another dimension into the view of the North American fleet. Railinc, through its sister organization TTCI, has access to events recorded by track-side detectors. These devices can detect, among others, wheel problems at speed. TTCI performs some initial analysis on these events before forwarding them on to Railinc which then creates alert messages that are sent to subscribers. Railinc will also collect data on repairs that are performed on rail cars. With a history of such events we can perform degradation analytics to help the industry better understand the life-cycle of assets and asset components.

Railinc is unique in the rail industry in that it can be viewed as a data store of various information. We are just now starting to tap into this data to get a unique view into the industry. Future posts will examine some of these efforts.

Friday, June 11, 2010

Geospatial Analytics using Teradata: Part I

In October, I (along with a co-worker) will be giving a presentation at the Teradata PARTNERS conference. The topic will be on how Railinc uses Teradata for geospatial analytics. Since I did not propose the paper, write the abstract, or even work on geospatial analytics, I will be learning a lot during this process. So, to help with that education I will be sharing some thoughts in a series of blog posts.

To kick the series off, let me share the abstract that was originally proposed:
Linking location to information provides a new data dimension, a new precision, unlocking a huge potential in analytics. Geospatial data enables entirely new industry metrics, new insights, and better decision making. Railinc, as a trusted provider of IT services to the Rail Freight industry, is responsible for accurate and timely dissemination of more than 10 million rail events per day. This session provides an overview of how Railinc Business Analytics’ group has implemented Active Data Warehouse and Teradata GeoSpatial technologies to bring an unprecedented amount of new Rail Network insight. The real-time calculation of Geospatial metrics from rail events, has enabled Railinc to better assess; 1) Rail equipment utilization 2) repair patterns 3) geographic usage patterns, and other factors. All of which, afford insights that impact maintenance program decisions, component deployments, service designs and industry policy decisions.
Below is a first pass at an outline for the talk. It is preliminary and will most likely change over the coming months.
  1. Describe Railinc's Teradata installation
  2. Describe Railinc's source systems
    1. Rail car movement events
    2. Rail car Inventory
    3. Rail car health
    4. Commodity
  3. Describe our ETL process
  4. Explain the FRA geospatial rail track data
    1. Track ownership complexity
  5. Tie 1-4 together
    1. Current state of car portal
    2. Car Utilization analytics
    3. Traffic pattern analytics
  6. Lessons learned
    1. Study of different routing algorithms
    2. Data quality issues
Item 5 is the problem - how can we tie our source systems together with geospatial data in a compelling way? One idea is a portal that provides information about the current state of a rail car. How would geospatial data fit into this portal? Location is the most obvious answer, but is there something more interesting? What about an odometer reading for a rail car? Outside of the portal there are ideas around car utilization and traffic patterns. I like the last two but I need to learn more about them.

These are some issues/questions I need to answer over the coming months. Along the way I plan on sharing information about implementation details, possible business cases, and any problems I come across.