One of the things that sucked up a week of my evenings this summer was the VAST contest. VAST is a visualization conference, and I had decided to submit to a contest they were hosting. The premise of the contest is to use or develop tools to analyze a dataset and discover the threat. They provided sample datasets, and our job was to look at them and find out what was going on, who were the suspects, and how the social network was organized.
It happens that two of the major pieces of software that I’ve been developing at the lab do exactly that, and I figured it would be a great opportunity to show off the tool and see how well it works. So I downloaded the dataset and promptly forgot about it until one week before the entries were due. Then I worked like mad and submitted at the last possible minute. I was at the lab till 1 or 2am each day that week working on the datasets and my software and tweaking and exploring and writing up my results and putting together video explanations.
The contest was divided into four completely separate challenges. The first had to do with edits to a wiki page. We were given a fake wiki page and all the edits to it and were told to look at the edits and determine who was on what team, and if any of the teams had any malicious intent. I used one of my programs first to filter out a lot of the junk edits and grammar fixes and spam, then filtered by number of contributions to find out who were the key players. Then I read through the conversations and split the teams up by who was arguing with each other, eventually coming up with a pair of teams. It was a lot more complex than that, but that was the gist.
The second challenge was migrant boats. We were given an XML file that contained fake coast guard interdictions, where boats bound for the Florida coast were stopped by the Coast Guard. There was a lot of metadata associated with the interdictions. For this one, I used a custom Google Map to plot the interdictions, then had a slider bar that showed me where they were taking place over time. I also used color coded markers to show me the kinds of boats used, the number of deaths, where they landed, and other interesting statistics.
The third challenge was cell phone calls. For this one we were given a list of cell phone records that included from, to, tower, date, and duration. We had to figure out who was who by the calls they made, and determine the whole network and who was doing what just from that data. I came up with some interesting results using color-coded tables and my network graphing tool. I also was able to plot the calls on a timeline and showed how some people appeared to be on conference calls because they overlapped their calls a lot.
The final challenge was my favorite, and the one on which I spent the most time. I had to write a lot more software for this one, too. We were given a fake building and fake locations of the occupants of the building over time. We had to look at the data to determine what happened when, who was a suspect, who was a witness, who was a casualty, and anything else interesting. I wrote software that let the user choose which people to watch and over what time period, so you could scroll around and see interesting things. Here’s a picture of it:
If you want to see my whole entry for the contest, you can go here: http://www.bobbaddeley.com/vast08/. Each of the sections has my evaluation as well as a video of me describing how I approached the problem. In the end I didn’t win any awards, but I was the only applicant from PNNL, and I think I was the only team that was a single person. I think I’ll be a lot more prepared for next year, and I fully intend to win some awards.