With the popularity of Wonder Woman, Jessica Jones, Black Widow, and Captain Marvel, I wonder if there is any indication that shows a transformation of gender equality in the film industry. I am going to analyze gender ratio in actors across 2000 films in the recent two decades. We will be able to see the changes in female roles in the film industry and the relation of female social status. Then we will look closely into the gender of celebrities in Marvel movies since they are representative among heroic movies.
The idea is to use gender analysis on celebrity names and count the gender occurrence from each movie each year using Python.
First, scrape the movie information using Octoparse.
I have been using it for many times since it’s fully free with unlimited scrapable pages. The URLs of the yearly box office in Mojo follow a constant pattern with a fixed hostname and a year tag at the end. For example, the URLs of the box office is https://www.boxofficemojo.com/yearly/chart/?yr=2019&p=.htm in 2019 and https://www.boxofficemojo.com/yearly/chart/?yr=2018&p=.htm in 2018. That said, if we follow this pattern we should be able to get a list of URLs from 2000 to 2019 like this:

Load this list with Octoparse. It will automatically create a loop extraction list. Octoparse will guide you to create another extraction list of movies in a year, and click to extract data including Title, Actors, Distributors, Domestic_Total_Gross, and Foreign_Gross. About 20 minutes later we get all the details of 2000 films in 20 years.

Second, massage the data using Python so the text gets tokenized.

Third, get the numbers of female and male actors in the movie of a year. To do this, I loaded a list of gender dictionary which analyzes the first name and returns the gender.

After getting the list, I visualized the data as below.

Two lines move in the same direction. Both lines move upward before 2010, reaching their peaks in 2011, and moved downward since then. The number of actors is shrinking in general. It might indicate a downfall in the film industry. The gap between the two is showing a tendency of closing in general, yet the space between 2011 and 2015 has widened. This said, gender disparity is entrenched in the film industry. The number of male actors is more than doubled that of female actors, even though it shows the disposition towards equality in number of female and male actors.
What about the Marvel?

In contrast, both lines move upward since 2012, and there is a steep increase between 2012 and 2013. Heroic movies are getting popular during the economic recovery. Moreover, female actors show an instant increase compared with the numbers before 2012. It may speak to the fact that the film industry attempt of introducing more female actors into the hero series. The resurrection of the economy in 2012 plays an important role in balancing the numbers in heroic movies. The figure of the Hero represents the national identity which contains the idea of “Freedom” and “Democracy.” Women start to move the plot forward rather than support the leading actor. Divergent (2014) and Rogue One: A Star Wars Story (2016), The Hunger Games (2012), Lucy (2014), Mad Max: Fury Road (2015), Wonder Woman, we are having a different types of superhero women on the screen, the popularity of superheroines clearly speaks women’s social status shift towards an imperative role of redemption.
Superhero movies have become the icon of crime-fighting, social righteousness, self-sacrificing, and most importantly — male empowerment since their introduction in the 1930s. The figure of a superhero is so successful that people are implanted with this idea of a man is born to be a lifesaver. I can’t tell how appreciated I am that there are fewer Marry Jane, who was such a delicate, beautiful but weak lady who was meant to be caught by the villain and saved by Spiderman. I expect more women of color as strong, and independent like Furiosa in Mad Max: Fury Road and Captain Marvel who are their own heroes.