How Web Scraping Helps in the News MediaFriday, May 31, 2019
In the era of digital, people who work in news media have to face the increasing pressure of competition. Good content brings attention. Attentions bring ads. Ads mean cash. The revenues generated from digital advertising have been climbed sharply these years. Such profit-oriented practice distorts the definition of being good news media. Thinking about Youtube influencers. There is nothing bad to be an influencer, in fact, influencers foster freedom of expression. However, if the influencer passes the wrong message to his audience, there are social consequences and backlashes.
Today, let’s use web scraping to extract the news content from news media, and then we will analyze the speech and language using Python. Finally, we are able to find how news media is politically imposed that would mislead the audience.
Let’s take 5G as an example. In the beginning, it is just a technology, then the news media steps in. A potentially revolutionary change triggered the international conflicts between two superpower countries. From the most advancing technology to “threats” and “theft”, what does the news media portray HUAWEI and lead Americans homogeneously to align with such inception?
What is 5G and how fast is it?
5G is the next generation of cellular network connection. We all hear that 5G is a lot faster than 4G LTE, but how fast is it really? Let’s be more concrete. 4G LTE uses a rather low-frequency band, in comparison, 5G uses extremely higher frequencies ranged between 30-300 GHz. That said, 5G can support 1,000 more devices within one meter than what 4G can. Speedwise, 5G can be 20 times faster than 4G. With the amount of time it takes for you to download one movie with 4G, you can download 20 movies with 5G.
What does the Media say about HUAWEI?
I scraped the news content related to HUAWEI from Reuters and CNN and analyzed the attitude and choice of words to see how biased a news media company can be.
These word diagrams are clustered by the number of occurrences. The most frequently used negative words in Reuters are “fell”, “concerns” and “risk”. In comparison, “concerns” “risk” and “death” are the most frequently used negative words on CNN. It is understandable since HUAWEI is depicted and considered as a “security threat” to America. When you look closer at the word diagram and pay attention to the difference, it’s not hard to find that CNN is a bit more biased. Other most frequent words in CNN news are like “bad”, “fears”, ”criminal” and “fraud”. In contrast, Reuter is more neutral. Words used by Reuters are “dispute”, “fall”, “losses”, “lost” and “difficult”.
CNN’s audience pool is much larger than that of Reuter. According to Columbia Journalism View,after reviewing over 1.23 million articles published and shared, the most frequently shared news articles for both Twitter and Facebook users are New York Times, CNN and Baribart. However, many news media that hold a neutral point of view like Reuters are shared by a small fraction of the audience pool in comparison with CNN. As a result, the words used by CNN can make a huge difference in how such an international issue perceived by a large population. That said, if CNN relates HUAWEI with more critically biased words like “bad” and “criminal”, the American population will be likely shifted to similar attitudes.
The chart above shows the most frequently shared media sources for Twitter users that retweeted either Trump or Clinton. There are more Twitter users retweet Clinton than Donald Trump from CNN. Fox News shows even more distinct favor on Donald Trump. However, for news media with a neutral point of view who share smaller pies over the audience pool doesn't appear in the chart.
I also scraped Trish Regan, who hosts the prime time on Fox Business Network. She was accused of being biased as a famous media person. Inspired by Trish Regan’s attitude of denial, I scraped her tweets about HUAWEI using Octoparse:
I got 800 tweets, the words used are shown below:
Frequently used words like are “wrong”, “brutal”, “hypocrisy” and “bad” are emotional and biased. Some of the noteworthy comments and tweets are like these:
It’s nothing wrong to show emotion and feelings, but it is wrong when a social media host becomes an agitator by sharing biased ideas.
Social media platforms open up the opportunity for freedom of speech. However, filter bubbles incubate language polarization and hate speech like below
Twitter has restricted rules regarding presented messages. When I scan through the comments I can see a lot of comments are deleted due to explicit vulgarity. considering it's impossible to wipe out all insinuating messages with defamation, we should walk away rather joining such a meaningless fight.
What to do to stop hate speech in social media?
- Understand the rationale behind the acts: knowing the what languages and speech which stir up the fire would help us prevent the situation of dissemination.
- Counter-speech research: we are looking for the perpetrator of bad speech, not combat against it.
Author: Ashley Ng
Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction
Si desea ver el contenido en español, por favor haga clic en: 5 Razones por El Web Scraping Puede Beneficiar a Su Negocio