Twitter Reactions to the Candidates: Democratic Debate Edition
During political debates, there is always talk about trending topics on Twitter – but most of this commentary is no more than grand hand-waving gestures. So during Tuesday’s #DemocraticDebate on Tuesday evening, the team here at Xcalar wanted to put real numbers behind the talk – to truly measure sentiment and twitter activity in real-time as it relates to the various candidates comments.
To do so, we grabbed Twitter feeds as they were coming in during the Democratic debates and we mapped them to the various points in the debate when the candidates were speaking and their responses. So the results of our analytics on 2.6 million tweets are as follows for each of the twelve candidates.
Not a Democratic Debate fan, but interested in the tech? Jump to read how we did it.
***** Drop us a note (firstname.lastname@example.org) if you want the data, or better yet, a free Xcalar-instance with the data and dataflows to play/analyze at your leisure.” *****
Though Amy’s twitter engagement was modest it spiked up when she debated Warren on her details for #MedicareforAll and when she commented on the #wealthtax. Her comments related to foreign policy or tech company oversight yielded the weakest Twitter velocity.
Andrew Yang’s Twitter following and response was strongest when he spoke on topics for which he is associated; specifically #automation #freedomdividend #UBI and on #tech. On other domestic and foreign affairs, his responses garnered no meaningful reaction.
As expected Bernie Sanders had not only more than average Twitter activity, but the peaks were high and distinct. Not surprisingly, Bernie’s comments on #Medicare #IncomeInequality riled the Twittersphere. This was particularly evident when he and Biden exchanged heated words later in the debate. His stance on foreign affairs did not elicit any significant Twitter activity
Entering this debate as one of the top 2 candidates, it was no surprise that Elizabeth Warren was able to garner significant Twitter activity throughout the three hour long debate. As expected, her comments on #MedicareForAll #WealthTax #TechCompanies sparked significant Tweets. Along with Bernie Sanders, Warren’s heated response to Biden’s claim of her “being vague” sent the Twittersphere into a frenzy. Less impact for Warren that evening was her stance and commentary on assault weapons and questions on her age were non-events.
Aside from a lull in Twitter activity at the start of the third hour, Joe Biden’s comment generated significant and regular Twitter activity. With an initial spike from an expected response to his son’s business dealings, Joe Biden’s ability to spark Twitter conversations peaked when he spoke of #wealthtax #foreignpolicy #syria and when he took on @POTUS head-on. Like Warren, Biden did not elicit any meaningful Twitter reaction when asked about his age and/or health. Twitter went into overdrive when he accused Warren and Sanders of “being vague”.
Julian Castro’s debate commentary elicited the weakest Twittersphere response. Most of his responses to queries did not elicit any measurable Twitter response – #impeachment #wealthtax #supremecourt #opioid #techcompanies. The only time topic that was able to demonstrate a meaningful “spike” in Twitter activity can be attributed to his response on #assaultweapons and #buyback
Kamala Harri’s Twitter activity was respectable. Her responses on #abortion #healthcare #internationaltrust #tech elicited the greatest Twitter activity. Less exciting for the Harris Twittersphere was her views on #wealthtax.
Mayor Pete generated peaks in his Twittersphere activity when he directly took on @ewarren @tulsi2020 @betoorourke and @biden. Less impactful were his comments on #supremecourt #GM #wealthtax
Tom Steyer had very peaky Twitter reaction – with his commentary on #incomeinequality #wealthtax #putin eliciting the greatest acceleration of Twittervolume. And surprisingly was his comment that he had many friends in South Carolina…. (read what you will into that…)
Tulsi Gabbard had a fairly moderate Twitter activity evening. The interaction that garnered the most Twitter activity was that around her head-to-head debate with Buttigieg.
The Tweet Process
We pulled data from Twitter API in real-time. These were taken every 1-2 minutes and stored as JSON files. Over the course of the 3 hour event, 100+ JSON files were created and downloaded.
As each JSON file was created, it was uploaded into S3. (see above)
The Xcalar Process
Once the data was safely ensconced in S3, Xcalar started it’s magic. There are two things that make Xcalar different:
- We don’t ingest… We are not copying the data into a data warehouse; we are not storing it in a database; there is no “insert into”. For this #DemocraticDebate exercise we simply pointed to our S3 bucket filled with 100+ JSON files.
- We build our models off #BIGDATA . Many data platforms, ETL tools and the like build models off data samples – not Xcalar. We bring in the whole schbang. And why is that a good thing? Because
- 1) you are getting real insight as you build your model,
- 2) easier to catch anomalies,
- 3) why settle for amuse bouches when you can enjoy the a 5-star meal.
Our Xcalar Dataflow
- On the left hand side of the data flow is the pointer to the S3 folder with the 100+ JSON files (this is marked as a DATASET) – you’ll see the 2,646,352 rows (see, we weren’t lying!)
- Next, we traversed down each JSON record to pull out the HASHTAGs and LOCATION data for each Tweet.
- Tweets store created_date in Greenwich mean time. We had to convert the Greenwich mean time into West Coast time format (because that’s is how we were tracking the debate timeline).
- Using SQL, we then assigned each Tweet with the corresponding DEBATETIMESEGMENT table (there were 100 time segments (roughly every 2 minutes)
Once we completed this step, we applied various python scripts to parse each 280 character tweet, looking for key words associated with each candidate. The key words associated with each candidate were stored in a list. For example, for Tulsi Gabbard the list is something like candidate_list = [“TULSI2020
Associated lists were created for each candidate and comprised part of PYTHON code that ran on each row (see image to the left).
So now the data was ready, all 2.6 million rows.
We work every day with millions, billions, petabytes of data – and we take that scale for granted here at Xcalar. So on Tuesday night as we attempted to visualize the transformed data, we ran into a problem… a #BIGDATA problem. We wanted to build a geolocation visualization dashboard that would show Tweet locations throughout the debate, but 2.6 million wide records was more than it could take… it was crashing.
So to get around this problem, we did two things:
- we had to reduce the number of records to a volume that could be performant in Tableau – this was about 146,000+ records. To do this, we filtered only to those tweets generated by iPads (yeah, I know, it’s problematic).
- we aggregated the data by candidates by segments. So we preprocessed the aggregation in Xcalar and so Tableau only had to process 100 or so records (wide records, but nonetheless only 100 rows).
So with these two pre-aggregated datasets, we published the table which were then directly read from Tableau.
This is our geo-location map of Tweets throughout the #DemocraticDebate (remember this is only 146,000 of our 2.6M tweets)
and these are all the 2.6M tweets associated with all the candidates throughout the Democratic Debate (preaggregated)
Despite being Twitter newbies, we were able to put this together in less than 24 hours – most of which we were learning how to get tweets out of Twitter at volume. But once that was learned, building the data sets was relatively straightforward.
If you are interested in getting access to this data, drop us a note (but beware processing it in raw Python or in a cloud database will likely either crash or take forever). Better yet, drop us a note (email@example.com) and we’ll set you up with your own instance with the data and dataflows so you can build/analyze and play without worrying about performance.