Monday, August 30, 2010

NFJS 2010 in Raleigh

I attended the No Fluff Just Stuff tour in Raleigh this past weekend with a bunch of others from Railinc. After the event on Sunday I tweeted that I wasn’t all that impressed with this year’s sessions. Matthew McCullough responded asking for some details on my concerns. Instead of being terse in a tweet, I thought I’d be fair with a more lengthy response.

First off, I really wasn’t planning on going this year. I took a look at the sessions and didn’t see enough relevant content that interested me. In 2007 and 2008 I went with a group from Railinc and had a pretty good time while learning about some new things that were going on in the industry. (We didn’t go in 2009 for economic reasons.) This year, however, I felt that there wasn’t enough new content on the schedule that interested me. (I have seen/read/heard enough about REST, Grails, JRuby, Scala, etc.).

What changed my mind about going was the interest expressed by some other developers at Railinc. Since I coordinated the 2007 and 2008 trips, I thought I’d get this one coordinated, and since there was a good amount of interest, I figured I’d give it a shot as well. So, to be fair, I wasn’t going in expecting much anyway.

Here were the key issues for me:
  1. Some of the sessions did not go in the direction that I expected. To be fair, though, I was warned ahead of time to review the slides before making a decision on a session. The problem here is that some presenters relied more on demos and less on slides, so in some cases it was hard to judge by just the slide deck.
  2. Like I said above, I wasn’t planning on going in the first place because of the dearth of sessions that seemed interesting to me. I ended up going to some sessions because it was the least non-relevant session at that time. There were actually two sessions that I bailed on in the middle because I wasn’t getting any value from them.
  3. Finally, and this is completely subjective, some of the speakers just didn't do it for me. While you could tell that most (if not all) of the speakers were passionate about what they were talking about, some were just annoying about it. For instance, some of the attendees I spoke to felt that the git snobbery was a bit overkill. Some of it was just speaker style - some click with me some don't.
Some things I heard from the other Railinc attendees were
  • Too much duplication across speakers
  • Not enough detail along tracks
  • Some of the session were too introductory - could have gotten same information from a bit of googling.
Granted, some of my concerns are subjective and specific to my own oddities. But I do remember that I had enjoyed the '07 and '08 events much more.

I did, however, enjoy Matthew's first session on Hadoop. I knew very little about the technology going in and Matthew helped crystallize some things for me. I also got some good information from Neal Ford's talks on Agile engineering practices and testing the entire stack.

I really like the No Fluff Just Stuff concept in general. I think it is an important event in the technology industry. The speakers are knowledgeable and passionate which is great to see. My mind is still open about going next year, but it will be a harder sell.

Wednesday, August 25, 2010

Not so Stimulating

I sent the following to the Raleigh News & Observer:
E. Wayne Stewart says that “enormous fiscal stimulus ... to finance World War II led the U.S. out of the Depression.” While it is true that aggregate economic indicators (e.g., unemployment and GDP) improved during the war, it was not a time of economic prosperity.

During World War II the U.S. produced a lot of war material, not consumer goods. It was a time when citizens went without many goods and raw materials due to war-time rationing. It was also a time when wages and prices were set by government planning boards. In short, it was a time of economic privation for the general public. It wasn't until after the war, when spending was drastically reduced, that the economy returned to a sense of normalcy.

The lesson we should learn is that, yes, it is possible for government to spend enough money to improve aggregate economic indicators. That same spending, however, can distort the fundamentals of the economic structure in ways that are not wealth-producing as determined by consumer preferences.
This argument, that government spending during WWII got us out of the Depression, is used by many to justify economic stimulus. The argument I use above comes from Robert Higgs and his analysis of the economy during the Depression and WWII.

For me, though, the biggest problem with the "just spend" argument is that it ignores the nuances and subtly of a market-based, consumer-driven economy. It is like saying that to get a 1000 word essay to a 2000 word essay all you need to do is add 1000 words. There is no thought into the idea that those extra words need to fit into the overall essay in a coherent manner. A productive economy needs spending to occur in the proper places at the proper times, and it is the market process that does this most efficiently (not completely efficiently, but better than the alternatives).

Prediction Markets at a Small Company

Railinc has recently started a prediction market venture using Inkling software. We have been using it internally to predict various events including monthly revenue projections and rail industry traffic volume. In July, we also had markets to predict World Cup results. While this experience has been fun and interesting, I can't claim it has been a success.

The biggest problem we've had is with participation. There is a core but small group of people who participate regularly, while most of the company hasn't even asked for an account to access the software. When I first suggested this venture I was skeptical that it would work at such a small company (just under 200 staff) primarily because of this problem. From the research I saw, other companies using prediction markets only had a small percentage of employees participate as well. However, those companies were much larger than Railinc, so the total number participating was much greater.

Another problem that is related to participation is the number of questions being asked. Since we officially started this venture I've proposed all but one of the questions/markets. While I know a lot about the company, I don't know everything that is needed to make important business decisions. Which brings up another problem - in such a small company do you really need such a unique mechanism to gather actionable information from such a limited collective?

Even considering these problems we venture forward and look for ways to make prediction markets relevant at Railinc. One way to do this is through a contest. Starting on September 1 we will have a contest to determine the best predictor. At the Railinc holiday party in December we will give an award to the person with the largest portfolio as calculated by Inkling. (The award will be similar to door prizes we've given out at past holiday parties.) I've spent some time recently with the CIO of Railinc to discuss some possible questions we can ask during this contest. We came up with several categories of questions including financial, headcount, project statistics, and sales. While I am still somewhat skeptical, we will see how it plays out.

We are also looking to work with industry economists to see if Railinc could possibly host an industry prediction market. This area could be a bit more interesting, in part, because of the potential size of the population. If we can get just a small percentage of the rail industry participating in prediction markets we could tap into a sizable collective.

Over the coming months we'll learn a lot about the viability of prediction markets at Railinc. Even if the venture fails internally, my hope is to make some progress with the rail industry.

Thursday, August 12, 2010

Geospatial Analytics using Teradata: Part II - Railinc Source Systems

[This is Part II in a series of posts on Railinc's venture into geospatial analytics. See Part I.]

Before getting into details of the various analytics that Railinc is working on, I need to explain the source data behind these analytics. Ralinc is basically a large data store of information that is received from various parties in the North American rail industry. We process, distribute, and store large volumes of data on a daily basis. Roughly 3 million messages are received from the industry which can translate into 9 million records to process daily. The data is categorized in four ways:
  • Asset - rail cars and all attributes for those rail cars
  • Asset health - damage and repair information for assets
  • Movement - location and logistic information for assets
  • Industry reference - supporting data for assets including stations, commodities, routes, etc.
Assets (rail cars) are at the center of almost all of Railinc's applications. We keep the inventory of the nearly 2 million rail cars in North America. For the most part, data we receive either has an asset component or in some way supports asset-based applications. The analytics that we are currently creating from this data falls into three main categories: 1) logistics, 2) management/utilization, and 3) health.

Logistics is an easy one because movement information encompasses the bulk of the data we receive on a daily basis. If there is a question about the last reported location of a rail car, we can answer it. The key there is "last reported location." Currently we receive notifications from the industry whenever a predefined event occurs. These events tend to occur at particular locations (e.g., stations). In between those locations is a black hole for us. At least for now, that is. More and more rail cars are being equipped with GPS devices that can pin point a car's exact location at any point in time. We are now working with the industry to start receiving such data to fill in that black hole.

Management/utilization requires more information than just location, however. If a car is currently moving and is loaded with a commodity then it is making money for its owner; if it is sitting empty somewhere then it is not. Using information provided by Railinc, car owners and the industry as a whole can get a better view into how fleets of cars are being used.

Finally, asset health analytics provide another dimension into the view of the North American fleet. Railinc, through its sister organization TTCI, has access to events recorded by track-side detectors. These devices can detect, among others, wheel problems at speed. TTCI performs some initial analysis on these events before forwarding them on to Railinc which then creates alert messages that are sent to subscribers. Railinc will also collect data on repairs that are performed on rail cars. With a history of such events we can perform degradation analytics to help the industry better understand the life-cycle of assets and asset components.

Railinc is unique in the rail industry in that it can be viewed as a data store of various information. We are just now starting to tap into this data to get a unique view into the industry. Future posts will examine some of these efforts.