Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey

Asaf Shabtai, Robert Moskovitch, Yuval Elovici, Chanan Glezer

information security technical report 14 (1), 16-29, 2009

This research synthesizes a taxonomy for classifying detection methods of new malicious code by Machine Learning (ML) methods based on static features extracted from executables. The taxonomy is then operationalized to classify research on this topic and pinpoint critical open research issues in light of emerging threats. The article addresses various facets of the detection challenge, including: file representation and feature selection methods, classification algorithms, weighting ensembles, as well as the imbalance problem, active learning, and chronological evaluation. From the survey we conclude that a framework for detecting new malicious code in executable files can be designed to achieve very high accuracy while maintaining low false positives (i.e. misclassifying benign files as malicious). The framework should include training of multiple classifiers on various types of features (mainly OpCode and byte …