Up-high to down-low: Applying machine learning to an exploit database

Yisroel Mirsky, Noam Gross, Asaf Shabtai

Innovative Security Solutions for Information Technology and Communications …, 2015

Today machine learning is primarily applied to low level features such as machine code and measurable behaviors. However, a great asset for exploit type classifications is public exploit databases. Unfortunately, these databases contain only meta-data (high level or abstract data) of these exploits. Considering that classification depends on the raw measurements found in the field, these databases have been overlooked. In this study, we offer two usages for these high level datasets and evaluate their performance. The first usage is classification by using meta-data as a bridge (supervised), and the second usage is the study of exploits’ relations using clustering and Self Organizing Maps (unsupervised). Both offer insights into exploit detection and can be used as a means to better define exploit classes.