In order for AI to become a more effective tool for detecting hate speech, it must be able to understand content the way people do: holistically. When viewing a meme, for example, we don’t think about the words and photo independently of each other; we understand the combined meaning. This is extremely challenging for machines, however, because it means they can’t analyze the text and the image separately. They must combine these different modalities and understand how the meaning changes when they are presented together.
To address this challenge, the research community is focused on building tools that take the different modalities present in a particular piece of content and then fuse them early in the classification process. This approach enables the system to analyze the different modalities together, like people do. In this project, the student will explore different fusion models for combining text and image together using the Hateful Memes dataset that contains 10,000+ new multimodal examples.
Deliverables: codebase with documentation
REFERENCE PAPERS
- Multimodal Classification for Analysing Social Media, Chi Thang Duong, Rémi Lebret, Karl Aberer
- The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes, Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine [code]
PREREQUISITES
- Familiar with Python
- Creativity, spirit, initiative and pro-active
- Knowledge of Linux and related tools
PREFERRED, BUT NOT REQUIRED
- Experience in Machine Learning
- Experience in Natural Language Processing
- Experience in Computer Vision
Send me your CV: [email protected]