A Search Engine for Journalists to Find Newsworthy Tweets

Project Details :

A Search Engine for Journalists to Find Newsworthy Tweets

Laboratory : LSIR Semester Proposal

Description:

Today’s mainstream media empowers itself with social media! Many news articles have now features embedded tweets. However, these tweets are found purely by manual methods. In this cool project, we will build a search engine for journalists so they will have less busy time finding relevant and cool tweets to the news they are going to write.
 
The project has been covered before (See: https://www.epfl.ch/labs/lsir/teaching/completed-projects/whats-news-in-your-tweets/) This time we will try a new approach and possibly integrate the old approach. TL;DR is you will find news (from public datasets) that has embedded tweets, use these tweets as positive examples, the rest of the tweets (which are relevant to news) as negative examples and build a classifier to learn attributes of such tweets so we can automize the process.
 
Your work does not include collecting tweets (already covered by us) or building an actual search engine with full-stack development. (future work)
 
Activities include:
  • Building a dataset of news that have embedded tweets.
  • Develop a method to find keywords that could be used to search for tweets related to a specific news article.
  • Build a binary classifier to find embedded tweets candidates and also a regression model for a search engine to rank them.
  • Set up an annotation task to evaluate classifiers / search engine.
 
Requirements: Python & Pandas brothers.
   
   
Contact: Tuğrulcan Elmas