Machine learning in the fight against propaganda

A Data Pro and Data Science Society hack the news!

The global Hack the News Datathon, which kicked off on 21 January, gathered together more than 250 AI and data science academics, professionals and aficionados from over 50 countries to help develop a tool that can automatically identify propaganda in news. A Data Pro’s involvement in the project as a co-organiser brought technical expertise and years of experience in the field, with four of our data scientists working on machine learning and automating the identification process.

The event was co-organized by the Data Science Society, the Qatar Computing Research Institute (QCRI) and Hamad bin Khalifa University, and was hosted onsite in Sofia, Doha, Bangalore, and Riyadh as well as online via a dedicated platform, which focuses on detecting the use of propaganda and specific propagandistic techniques, thus promoting actionable, reliable AI. 

The Datathon sparked the Propaganda Analysis project, which employs machine learning to train models to automatically recognize bias in reporting. To do so, a reference point of flawlessly annotated articles known as a Gold Standard Corpus (GSC) was created by our very own data scientists. The GSC is a dataset of news articles, where A Data Pro’s experts have identified elements of propaganda or bias, according to specific criteria and guidelines from our partners at QCRI.

Once identified, these elements of biased reporting or speech are classified and fed to an algorithm, which can make predictions and decisions, thus learning to automatically recognize similar fallacies in the future. This machine learning process is at the core of the project and the end goal is to automate the natural language processing (NLP) of prejudiced articles, written text, forum posts and even social media posts, and counter the phenomenon.

“I expect that this datathon will start a very promising research direction in the fight against propaganda”, says Dr. Preslav Nakov, datathon co-organizer from QCRI. “Similarly to fighting spam, fighting disinformation is an adversarial problem, where malicious actors constantly change and improve their strategies.”

Focusing on several topics such as the Qatar blockade of 2017, the so called “red scare” in 50’s USA, the Las Vegas shootings of 2017, articles pertaining to religion, the Russian involvement in the US elections, and the Cambridge Analytica scandal, among others, the project.

The winners of the Hack the News Datathon Finals were announced on 29 January after three days of intense competition, with a prize fund of USD 10,000 divided among the winners. Third place went to team Lama from Turkey, with second place going to Astea-Wombats from Bulgaria. The prestigious first place was awarded to team PIG (Propaganda Identification Group) from Germany who used Convolutional neural network (CNN) and Long short-term memory (LSTM) to identify propaganda.

The results of A Data Pro, QCRI and Data Science Society’s project and its accomplishments will be presented at a dedicated workshop in Hong Kong in November 2019 as part of the annual Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).