Assess credibility of web pages

Project Details

Assess credibility of web pages

Laboratory : LSIR Semester / Master Completed

Description:

In their day-to-day online operations users rely a lot on the results offered by the most popular search engines (e.g., Google, Bing), giving more credit to the web pages higher up in the given ranking. Yet, when users consider the information found on these web pages without carefully evaluating its credibility, they could be easily misled by incorrect web content. As a result, it’s becoming increasingly important to rank pages according to their credibility and to support users in evaluating the credibility of web content.

In this context, we consider given a set of known web pages for which the credibility scores are specified (a credibility score is a value or a n-tuple of values that is assigned to each page in order to indicate how credible that web page is). We then want to:

  1. infer the credibility of another set of unknown web pages based on: (a) a set of defined web page features and (b) the set of know web pages;
  2. experimentally find the subset of features that have the highest impact on assessing the credibility of web pages.

Prerequisites

  • familiarity with supervised/unsupervised learning methods
  • familiarity with machine learning tools (e.g., WEKA, PyML, PyBrain)
  • C/C++, JAVA, Python, SQL (at least proficient in one of them and familiar with the others)
Site:
Contact: Alexandra Olteanu