Deconstructing Bechdel with Big Data
About 10 years ago, a New York Times article on parenting introduced me to the “Bechdel Test.” As a mother of a young daughter, a feminist and an avid movie goer, I was intrigued at this seemingly loose, yet insightful test that measured the portrayal of women in film.
The Bechdel Test measures three things
- Does the movie have two women in the main cast
- Do these two women talk to each other, and
- Do they talk about something besides a man
But as a big data junkie, I was frustrated with the the high-level, Bechdel Test-related headlines… What are the stories and details behind the headline? What was the data saying about the various inputs into the movies (i.e. actors, directors, producers, etc…) … So armed with Xcalar, I was off on an analytics mission to DECONSTRUCT BECHDEL. This post chronicles my data journey… (tl:dr? – jump to the Bechdel insight….)
THE DATA + TECH STACK
In addition to Bechdel data, I also needed movie data, actor data, and gender data. This supplemental data came from a variety of sources in a variety of forms.
- Data.world (movies from 2010-2014)
- Bechdeldata.com (movies 2015-present)
- IMDB – supplemental actor, gender data
- Kaggle – generic movie data
- OMDb – generic movie data
STEP 1: INGEST DIFFERENT FORMS OF DATA (csv, txt, xls, JSON)
There were over 9 data sets that had to be integrated represented by CSVs, Excel spreadsheets, text files and JSON files. Fortunately because of Xcalar’s smart semantics ability, Xcalar was able to detect the fields and data types automatically. So ingestion of data, which is normally tedious (no more
CREATE TABLE, INSERT TABLE)!!, was a non-event.
Why was this great? Because it was easy to quickly test and see if the data brought in was relevant or useful – I must’ve connected to over 40+ datasets in building up the data model (can you imagine doing 40+ “Create Table” statements…. kill me)
STEP 2: EXPLODE THE DATA
“The devil is the details.” What we find is that data is rarely well-formed, normalized, and clean. And the Bechdel and movie data was no different. The otherwise seemingly straight forward MAIN_ACTORS column was actually an embedded list of multiple actors…
- MAIN_ACTORS comprised multiple actors…
- DIRECTORS also comprised multiple directors
- PRODUCERS, GENRES…. all packed multiple values within a single column.
So for each of these packed columns, we had to explode them into individual columns within Xcalar.
STEP 2B: MAP TO GENDER
We also wrote a separate python program to find determine the genders of the actors from data found on IMDB. We then took this gender table and mapped it to MAIN_ACTORS appearing in each movie.
STEP 3: RUN THE ANALYTICS
Once we cleaned up and normalized the data, I mapped the top 4 actors in each movie to the Bechdel rating for the movie. I repeated the mapping of Bechdel rating to also
- MPAA ratings (G, PG, PG-13, R, NR),
- movie genre,
- directors, and
The goal of these mappings is to see if there was any correlation between these Hollywood players and the created movie and to see if there was any trend over time.
STEP 4: THE RESULTS
So let’s start with some easy insight. Below we can see the percentage of movies by MPAA rating who pass the Bechdel test. As a whole, 63% of movies pass the Bechdel Test (which means 37% do NOT pass).
Next I was interested in seeing if the gender of the director impacted the types of movies that were made. And to no surprise, though women make up less than 10% of all directors, almost 90% of the movies they make pass the Bechdel Test.
Next, we looked at how movie genre was correlated to Bechdel test results. As we looked across the 1,700 movies we analyzed, there were some surprises in the pass rate by genre. With young children, I watch more than my share of animated films – thinking that they would portray women in a positive light – I was surprised that as a category, animation films have a Bechdel pass rate less than industry average.
At the same time, as a child of the 80s, irrevocably scarred by slasher horror movies with hapless female victims, I was surprised to see that Horror movies pass the Bechdel test 4 out 5 times.
Building upon the former chart that looked at director and gender and how that correlated to the making of Bechdel Test-passing movies, I then looked at all the directors and all the movies they made from 2010 to now and discovered that of the more prolific directors, a few directors stood out. Specifically Ben Falcone and Anthony Russo, whose movies always passed the Bechdel Test. Surprisingly, only 1/2 of renowned director Steve Spielberg’s movies passed the Bechdel Test.
I also applied this same analysis to the top 4 actors of each movie, to see how often they starred in Bechdel passing movies. There were again surprises. On the graph below, the horizontal axis shows how many films that actor starred in. The vertical axis indicates what percent of their movies passed the Bechdel test. Find any surprises – I certainly did!
Curious for more?
Hollywood and the Bechdel Test is a gift that keeps on giving – more movies, new actors… If you are interested in getting the data and running your own analytics – drop us a note at firstname.lastname@example.org and we’ll share with you our data sets.