Ehsan Abedi – Characterizing the learning dynamics in the function space in a class of approximate inference methods
About: RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan
RIKEN is Japan’s largest and most comprehensive research institute for basic and applied science and it has several centers across Japan. I did a six-month internship (March-August 2019) at RIKEN Center for Advanced Intelligence Project (AIP) in Approximate Bayesian Inference Team, located in Tokyo. The center for AIP is a young yet active center, founded in 2016.
At first, I tried to learn different approximate inference methods and I also studied how training dynamics of neural networks in function space can be studied using kernel methods. There are limited work and understanding of the connection between training algorithms for deep neural networks (DNN) and Gaussian processes (GP). So, we formed a group and tried to facilitate further research on combining the strengths of DNNs and GPs. The outcome of our work appears in this paper, which has been accepted at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). The news also posted on the EPFL news website. My main contribution to the paper was on the theory side. But, I also did some numerical experiments in Python.
Other things that I also learned are how to better organize and write my daily work activities as well as weekly group meetings. This is very important to have an efficient knowledge accumulation over time. Sometimes things go wrong and you feel that the progress is stopped but every single day counts and you can still move forward. During my internship, I also had the opportunity to attend different talks by visitors and meet people in the filed. Last but not least, I enjoyed the Japanse art and culture a lot. 🙂
About : CSEM, Suisse
Nanophotonics is not only an active field of research but also of increasing importance in the photonics industry, with a broad range of applications (including imaging, display, sensors, security). The design of photonic structures at the nanoscale enables to determine the parameters and tolerances prior to fabrication of a prototype. CSEM benefits from a large portfolio of numerical methods for nanophotonics in order to cover a broad spectrum of devices. These methods are highly demanding in computational power and time: a thorough analysis of their efficiency and solutions for improvement are therefore required in order to better meet the needs of the industry. The image shown here provides insights on the accuracy reached simulating the frequency response of a plasmonic structure.
Gavin Lee – Data Analytics for Trading
About : Cargill, Suisse
Cargill is an American privately held agricultural company based in Minnesota, USA. Its main businesses are trading and distributing grain and other commodities such as animal feed, livestock and metals. It has its central trading operations in the Geneva office. In an effort to diversify commodities trades, this internship project considers a trading framework commonly known as ‘spread trading’ or ‘pairs trading’. This framework aims to find assets which tend to
‘move together’ in some way and exploit this relationship to detect when one is misaligned with respect to the other. We combine several distinct methods found in financial literature and leverage machine learning techniques to select commodity pairs which may be profitable. Along with a thorough historical data analysis, these techniques enable more robust predictions about expected profits and provides relative confidence in the predictions made.
Figure: Example of a spread trading strategy implemented in Python.
Servan Grüninger – Data Analyst for Automated Case Classification
Suva, the Swiss National Accident Insurance Fund, is the largest accident insurer in Switzerland and coordinates the statistical evaluation of all accident insurers in Switzerland. To provide these services in a reliable and effective way, Suva is dependent on consistent, valid and readily accessible data sources.
As such, Suva invests a lot of resources in establishing and curating high-quality data bases containing information about its insurance cases. Since a significant part of the available data sources are machine-readable, machine-learning techniques can be applied to build more efficient classification mechanisms that can assist human experts in the assessment of each case.
During my internship at Suva, I was tasked with the setup of a development environment based on the programming language Python.
Firstly, I provided a proof of concept for the use of the Python programming language as an analysis tool within the statistical analysis and consulting team.
Secondly, I implemented modern tools for natural language processing such as FastText and SpaCy to complement and improve the tools currently used for automated analysis of accident reports at Suva. This resulted in an easily accessible and modifiable toolkit for text embedding and in-depth text analysis using methods from natural language processing and machine-learning.Thirdly, I developed supervised classification models for specific use cases. The main goal was to explore automated case classification solutions that can be further developed and adopted within the statistical analysis and consulting team or the whole company.
The developed software solutions were shown to be competitive with currently employed methods within Suva with the added benefits of easier deployment, higher malleability and quicker development.
David Cleres – Estimated Time of Arrival (ETA) estimation using Machine Learning and Route Optimization algorithms to predict the local traffic in Swiss’ Cities Major Cities
About : Smood, Suisse
I accomplished my engineering internship at Smood.ch which is a Switzerland-based food delivery platform. Founded in 2012 in Geneva by Marc Aeschlimann, Smood.ch is today considered the leader in home delivery in French-speaking Switzerland with more than 500 partner restaurants and 200 000 clients across Switzerland. The young company delivers in the canton, in Lausanne, Montreux, Fribourg, Zurich, Luzern, Winterthur, Zug, and Lugano. It employs about 30 people regularly and also employs an extensive amount of delivery men and women for food delivery.
The project consisted of several tasks. First of all, came an extensive literature review. The information that we were looking for were experimental results from academia or industry to validate and optimize the simulation technique that we planned to implement. The themes to be explored were Deep Learning on graphs, Estimated Time of Arrival (ETA) estimation using Machine Learning and route optimization algorithms. The significant challenges of my project were the lack of quality and historical data. Therefore the cleaning of the data was costly in time. Secondly, I had a sub-project which consisted in determining if it is more useful to spend money on making an expensive advertising campaign on Geneva’s public transports or if by branding the cars of the drivers (see Figure) the visibility of the company would be even better. This very enriching since I learned how to make a market analysis and how to present my findings convincingly.
The Smood team was very open-minded, young, fresh, (sometimes funny 😉 ), and always interested for external advice or input, even if the topics were not necessarily related to the subject of my internship! I did not experience this to that extent before during my other internship. This made me feel exceptionally well integrated there. To summarise everything into one sentence: this internship has provided me an excellent opportunity to apply my acquired scientific knowledge in the CSE program to a real application but also to use the soft skills that I developed thanks to my engagement in student associations.
Sabir Oumaima – Modeling TV content consumption behavior
About : Nagra Kudelski, Suisse
I spent my CSE internship within the Insight team in Nagra Kudelski from 02/08/2018 to 31/01/2019. Kudelski Group is a world leader in content protection for major PayTV operators around the world and Insight is a data science initiative created by Kudelski in order to deliver data analytics applications to their PayTV clients.
During my internship, I had the opportunity to tackle different problems ranging from data analysis to machine learning. In the first part, I worked on analyzing the consumption data on an OTT platform (Over-The-Top, refers to streaming services) during the 2018 World Cup in order to study viewing patterns during this major sports event. The objective of this study was to understand viewing behavior during the World Cup on OTT platforms in order to enable PayTV operators to retain viewers and better place their advertisements. The main conclusion of this study is that viewing behavior depends strongly on the type of device used by the viewers (PC, tablet or phone) and on the importance of the game: teams playing, stage of the tournament, the level of competition between teams… etc. The second and the main task I accomplished in this internship is developing a script that quantifies automatically the performance of an existing predictive model of the number of subscribers in a PayTV service provided by a client of Nagra in different countries based on historical data. The outcome of my script is an HTML report that details different performance metrics for each country as shown in Figure 1. The third task I worked on is also related to the same project: using PyTorch, I implemented a Neural Networks model that accomplishes the same task as the existing model but with less mathematical modelling effort. In the last part of my internship, I worked with the data science team on their current project that consists in anomaly detection based on logs sent by set-top boxes and data provided by the costumer care and call center of a major PayTV client. My main task was to analyze the data and generate graphic reports in order to detect any correlations between the different logs’ messages and defects.
Figure 1– Screenshot of one of the HTML performance reports, each country name has a link that goes to another HTML page that contains detailed performance metrics presented in plots (the blurred part is confidential information).
Antoine Hoffmann – Dynamique d’étalement et solidification d’un film liquide sous centrifugation
About : Saint-Gobain, France
Reise Wojciech – Data Science applied to Finance
About : Dominicé & Co, Suisse
I did my internship at Dominicé&Co from July 2nd of 2018 to December 21st of 2018. Dominicé&Co is an asset management firm, administrating five funds, two of them being worldrenowned volatility funds. The first part of my internship consisted in acquiring the financial background necessary for the analysis of financial data. As an introduction, I was given a task in which the goal was to assess the
predictive power of an in-house risk assessment metric and devise new strategies for trading volatility futures, using an approach that would account for different regimes. This required reading about volatility models and choosing the appropriate statistical tools. In the second part of my internship, I have worked with tick data. My task was to devise and implement meaningful data treatment for this high-frequency data. Particular care needs to be paid to performance, especially if the treatment is then to be used on real-time data. This internship proved particularly interesting and challenging. First, I was given the chance to work in an industry I did not know much. Second, I was given much freedom and responsibility in the research I had to conduct, while, on the other hand, I have always had the opportunity to discuss my work and ask for feedback.