Detecting latent training needs from digital traces

It is a well-known fact that many professions, especially nowadays, undergo changes that change the skill sets required for them. With the rapid pace of automation and digitalization in recent years, these changes have become increasingly quick in many industries (e.g. software development, manufacturing), to various degrees.

Traditionally, the task of identifying the missing skills in a workforce in order to address them is called Training Needs Analysis, and is based on questionnaires and performance evaluations at different levels of an organisation or an industry, whose results are then aggregated.

However, this process is costly and time-consuming, and by the time the “new skills” are identified, they may already be obsolete. Our aim in this project is to use large, continually updated datasets to understand the process through which new skills enter the horizon in an industry, and to be able to predict them in advance, therefore speeding up their identification as important skills that will be necessary for the workforce.

Currently, we are focused on the software industry, with Google Trends, Stack Overflow, job adverts and MOOCs as our data sources, attempting to answer questions such as the interaction between Q&A platforms, job postings and online courses, and the signals in each data source that could allow us to predict the emerging popularity – or lack thereof – of a new skill.