Synthetic Data Generation and Evaluation

Presented by Duc-Phong Le as a part of the 2020 Serene-risc Workshop on The State of Canadian Cybersecurity Conference: Human-Centric Cybersecurity.

About the presentation

Data privacy has recently become a hot topic in the news thanks to failures in security and concerns about how companies are using the personal data they collect about their customers or users. Facebook, for instance, faced scrutiny over its handling of consumer data both in the U.S. and in the U.K.
Facing the above issues, the generation of synthetic data is becoming a fundamental task in the daily life of any organization. Synthetic data is directly and separately generated from an original data. The generated data should be realistic in certain aspects, like format, distribution of attributes, relationship among attributes, etc; and could provide the similar results when performing data analytics on both datasets. In this presentation, we will first present recent research to generate synthetic data, and then empirical methods to evaluate the similarity of the generated data.

About the speaker

Dr. Le is currently a Research Fellow and Research Team Lead in the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick. Prior to that, he was a Scientist II in the Institute for Infocomm Research (I2R), Agency of Science, Technology and Research (A*STAR), Singapore from July 2017 to March 2019, Senior Security Analyst at Underwriters Laboratories from May 2016 – June 2017, Research Scientist at National University of Singapore (NUS), Singapore from October 2010 – May 2016, and a Postdoctoral Fellow at Algorithms research group, University of Caen – Base Normandie during 2009 – 2010. He received a PhD degree in Computer Science from the University of Pau et des Pays de l’Adour in August 2009. His research interests include applied cryptography, elliptic curve cryptography, secure and efficient implementations, applied machine learning to cybersecurity issues, and Blockchain.