Euvgeny Naumov Delve Labs presented at the SERENE-RISC Workshop in 2017
The rapid rise in the number and ubiquity of internet services and internet-facing devices has increased pressure to automate cybersecurity monitoring. However, vulnerabilities discovered by automated solutions per scan can number in the thousands and beyond, still placing a considerable burden on security teams to confirm and address each manually. To overcome this problem, the ever-growing repertoire of machine learning methods can be brought to bear. However, the complexity of the data, the often modest number of training examples, and the requirement for customer anonymity make this a challenging task for existing machine learning techniques. This talk describes the false positive problem outlined above, and gives an introductory overview of the principles and methods of machine learning that can be applied to solve it. In particular, it focuses on Bayesian methods and their specific applicability to challenging security data.