Clickbait Classifier ‒ LSIR ‐ EPFL

Project Details :

Clickbait Classifier

Laboratory : LSIR

Semester

Proposal

KEYWORDS
Fake news; Machine Learning; Natural Language Processing; Python
CONTEXT
“Fake news” was not a term many people used four years ago, but it is now seen as one of the greatest threats to democracy and free debate. Fake news is news, stories or hoaxes created to deliberately misinform or deceive readers. Usually, these stories are created to either influence people’s views, push a political agenda or cause confusion and can often be a profitable business for online publishers. There are differing opinions when it comes to identifying types of fake news. However, when it comes to evaluating content online, the clickbait type of fake news is the most common and apparent one.

Content marketing is all about attract visitors in order to generate traffic. Yet over the past years, there’s been a surge of marketers trying to find an easier route to bolster traffic by producing and promoting so-called ‘clickbait’. These are the stories that are deliberately fabricated to gain more website visitors and increase advertising revenue for websites. Clickbait stories use sensationalist headlines to grab attention and drive click-through to the publisher website, normally at the expense of truth or accuracy.
GOAL
The goal of this project is the implementation of a machine learning approach to identify the clickbait problem. There should be built a text classifier that takes as an input the title of an article and returns a probabilistic or a single outcome answer on whether the title is considered clickbait or not (binary classification problem). The input features of the model should refer on the implicit characteristics of the article title and not on explicit factors such as click analytics and social media reaction on it.

This is a machine learning and engineering project in the context of natural language processing and fake news detection. The candidate will build and train a machine learning model, getting familiarized with text classification, neural networks, word embeddings and text engineering. The final deliverable should be the programming implementation of the data parsing, feature preprocessing, model training/tuning and model evaluation, as well as a detailed report on the design choices (in terms of model and its parameters), the interpretation of the results and the comparison of the implemented model with a simple baseline.
WORK PLAN
Study the state-of-the-art in text classification and word embeddings and decide the appropriate approach.
Collect/create an annotated dataset of both click-bait and non click-bait article titles.
Implement any preprocessing and feature engineering needed in order to train your classifier.
Create a simple baseline classifier.
Build the classifier model and fine tune it.
Evaluate the performance of the trained model on unseen test data.

Contact:

Angelika Romanou

Panayiotis Smeros