Building a text embeddings similarity search engine

In the framework of the AI Data Platform project, this semester project aims at building a text embeddings similarity search engine for enhancing image similarity search. The goal is to search images from text queries. Images being indexed with meta-tags or metadata such as tweets, we want to rank images based on the similarity between a given text query and the images metatags and/or metadata. BERT, introduced by Google in 2018, provides embeddings for words as well as sentences. In this project, the student would develop a semantics-oriented search engine using BERT embeddings that can encode the text query and rank the images’ meta-tags/metadata in the order of the most meaningful to least meaningful.

Deliverables: codebase with documentation

PREREQUISITES
  • Familiar with Python
  • Creativity, spirit, initiative and pro-active
  • Knowledge of Linux and related tools
PREFERRED, BUT NOT REQUIRED
  • Experience in Machine Learning
  • Experience in Natural Language Processing

Send me your CV: [email protected].