I don't often watch American football, but when I do, I couldn't have picked a better, or more exciting game ... for Twitter (The game itself wasn't too bad either).
Given our recent trend of sports analytics, you might be inclined to think we're a bunch of jocks -- but while we might be lacking in physical agility, we make up for that in spades with Xcalar, our cloud analytics platform.
Stats-wise, last night was the sort of matchup even casual football fans seem to perk up at the mention of. Going in, the 49ers were set to be the next undefeated team in the NFL (sorry Pats) so this showdown between the Seahawks' top-ranking offense against the 49ers top-ranked defense is Monday Night Football at it's best. And what better way to measure the excitement of the fans than everyone's favorite soapbox sportsdesk -- Twitter.
Because we only had 3 hours to put this together, the dataflow ended up being fairly simple.
Bring in Twitter JSON data every 10 minutes via Twitter into S3
Convert Greenwich Mean Time into Pacific Standard Time via Python User Defined Function (UDF)
Pull out location from Twitter JSON data
Parse Tweets for whether they are Pro 49ers or Pro Hawks (we do this by searching up specific hashtags)
Sort Tweets by Quarter (this is to accommodate Tableau's data volume limitations)
Tweets by Quarter (Red - 49ers, Blue - Seahawks)
After processing, we passed the data to our partner, Tableau, for the nice visuals. We divided up the Tweets into the four quarters and overtime. We plotted twitter location on Tableau by quarter to see if there were any changes in activity around the country as the game moved from a SF lead (Q1, Q2) to a Seattle lead (Q3, Q4,OT).
Finally, we took the data, grouped it by 5 minute intervals, calculated Twitter support of either the 49ers or the Seahawks, and mapped it with scoring events.
A little after lunchtime yesterday, our colleague Tim let us know that the game was that evening(we honestly had no idea). With such little time, we wanted to demonstrate a few key things about why Xcalar is such a powerful platform for data analytics:
- NO INGESTION: There is no ingestion of data (no insert, no create table) - we simply point to where the data resides on the data lake.
- JSON DATA EXPANSION: Xcalar can intelligently parse out and expand data found inside JSON files without any special languages - just point and expand.
- SEPARATION OF STORAGE FROM COMPUTE: With Xcalar, the datamodel represents the computation and dataflow. This is separate from where the data resides. This is useful because during the game, we were able to drop new Tweet files into S3 without having to restart or reattach the data flow to the new data. Imagine if, on a turnover fumble, the team in possession had to start at the end of the field -- seems pretty wack to me.
- INTUITIVE UI: With Xcalar's intuitive and powerful IDE, we were able to do all the joins, filters, group-bys, and even Python UDFs within a single framework.
- SCALE: Scaling is not even an issue as we build these analytics. 200,000+ rows is just a warmup for Xcalar - not having to worry or tune DevOps accelerates time-to-insight and reduces the need for highly specialized talent. It's like Moneyball for analytics teams.