Pierre-Luc Vaudry – Feeding the Machine: Data Collection and Other Challenges of Machine Learning for Spam Detection

Presented at the SERENE-RISC Workshop, 2017 October

Spam detection software can use both handcrafted rules and machine learning techniques. At ZEROSPAM we are aiming at reducing the need to create or edit rules manually to adapt to constantly evolving email-borne threats. At the same time, the performance of our machine learning tools could be improved by supplementing their text input with existing rules and other metadata. This talk will address data collection, a key step in any applied machine learning project. We will present our approach to tackling the challenges posed by confidentiality and implementation in a live production environment. The performance metric definition will also be discussed especially considering the differing costs of discarded legitimate mail versus undetected spam. Real-life examples will be provided.

About the speaker
Pierre-Luc Vaudry holds degrees in both computer science and linguistics from Université de Montréal. His recently completed PhD thesis is in the field of natural language processing. He was hired as a researcher by ZEROSPAM in March 2017. His role is to investigate how to make best use of the latest developments in machine learning to improve their spam detection technology.

 

 

Running Time 12 minutes.