Deconstructing Bechdel with Big Data

About 10 years ago, a New York Times article on parenting introduced me to the “Bechdel Test.” As a mother of a young daughter, a feminist and an avid movie goer, I was intrigued at this seemingly loose, yet insightful test that measured the portrayal of women in film.

 

Allison Bechdel illustration

Allison Bechdel, an American illustrator, who created the Bechdel Test back in 1985.

The Bechdel Test measures three things

  1. Does the movie have two women in the main cast
  2. Do these two women talk to each other, and
  3. Do they talk about something besides a man

But as a big data junkie, I was frustrated with the the high-level, Bechdel Test-related headlines… What are the stories and details behind the headline? What was the data saying about the various inputs into the movies (i.e. actors, directors, producers, etc…) … So armed with Xcalar, I was off on an analytics mission to DECONSTRUCT BECHDEL. This post chronicles my data journey…  (tl:dr? – jump to the Bechdel insight….)

THE DATA + TECH STACK

In addition to Bechdel data, I also needed movie data, actor data, and gender data. This supplemental data came from a variety of sources in a variety of forms.

The tech stack was mostly on the cloud, with the exception of Excel.

STEP 1: INGEST DIFFERENT FORMS OF DATA (csv, txt, xls, JSON)

There were over 9 data sets that had to be integrated represented by CSVs, Excel spreadsheets, text files and JSON files. Fortunately because of Xcalar’s smart semantics ability, Xcalar was able to detect the fields and data types automatically. So ingestion of data, which is normally tedious (no more CREATE TABLE, INSERT TABLE)!!, was a non-event.

Xcalar automatically determines the column names and data types which makes data ingestion a breeze!

Why was this great? Because it was easy to quickly test and see if the data brought in was relevant or useful – I must’ve connected to over 40+ datasets in building up the data model (can you imagine doing 40+ “Create Table” statements…. kill me)

STEP 2: EXPLODE THE DATA

MAIN_ACTORS column filled w/multiple actors.

“The devil is the details.” What we find is that data is rarely well-formed, normalized, and clean. And the Bechdel and movie data was no different.  The otherwise seemingly straight forward MAIN_ACTORS column was actually an embedded list of multiple actors…

 

  • MAIN_ACTORS comprised multiple actors…
  • DIRECTORS also comprised multiple directors
  • PRODUCERS, GENRES…. all packed multiple values within a single column.

So for each of these packed columns, we had to explode them into individual columns within Xcalar.

Partial Dataflow of Movie Related Data in Xcalar

STEP 2B: MAP TO GENDER

We also wrote a separate python program to find determine the genders of the actors from data found on IMDB. We then took this gender table and mapped it to MAIN_ACTORS appearing in each movie.

 

STEP 3: RUN THE ANALYTICS

Once we cleaned up and normalized the data, I mapped the top 4 actors in each movie to the Bechdel rating for the movie. I repeated the mapping of Bechdel rating to also

  • MPAA ratings (G, PG, PG-13, R, NR),
  • movie genre,
  • directors, and
  • producers.

The goal of these mappings is to see if there was any correlation between these Hollywood players and the created movie and to see if there was any trend over time.

STEP 4: THE RESULTS

So let’s start with some easy insight. Below we can see the percentage of movies by MPAA rating who pass the Bechdel test. As a whole, 63% of movies pass the Bechdel Test (which means 37% do NOT pass).

 

Next I was interested in seeing if the gender of the director impacted the types of movies that were made. And to no surprise, though women make up less than 10% of all directors, almost 90% of the movies they make pass the Bechdel Test.

Gender of Director and the Likelihood of Passing the Bechdel Test

 

Next, we looked at how movie genre was correlated to Bechdel test results. As we looked across the 1,700 movies we analyzed, there were some surprises in the pass rate by genre. With young children, I watch more than my share of animated films – thinking that they would portray women in a positive light – I was surprised that as a category, animation films have a Bechdel pass rate less than industry average.

At the same time, as a child of the 80s, irrevocably scarred by slasher horror movies with hapless female victims, I was surprised to see that Horror movies pass the Bechdel test 4 out 5 times.

Movie Genre and Likelihood to Pass Bechdel

Building upon the former chart that looked at director and gender and how that correlated to the making of Bechdel Test-passing movies, I then looked at all the directors and all the movies they made from 2010 to now and discovered that of the more prolific directors, a few directors stood out. Specifically Ben Falcone and Anthony Russo, whose movies always passed the Bechdel Test. Surprisingly, only 1/2 of renowned director Steve Spielberg’s movies passed the Bechdel Test.

Directors & % of Movies that Passed Bechdel

I also applied this same analysis to the top 4 actors of each movie, to see how often they starred in Bechdel passing movies. There were again surprises.  On the graph below, the horizontal axis shows how many films that actor starred in. The vertical axis indicates what percent of their movies passed the Bechdel test. Find any surprises – I certainly did!

On one hand not surprised by where many of these actors ended up; And on the hand, very surprised by others. Does your favorite actor consistently make the Bechdel grade?

Curious for more?

Hollywood and the Bechdel Test is a gift that keeps on giving – more movies, new actors… If you are interested in getting the data and running your own analytics – drop us a note at info@xcalar.com and we’ll share with you our data sets.