Unknown malcode detection using opcode representation

Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev, Yuval Elovici

Intelligence and Security Informatics: First European Conference, EuroISI …, 2008

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic ones. Today’s signature-based anti-viruses are very accurate, but cannot detect new malicious code. Recently, classification algorithms were employed successfully for the detection of unknown malicious code. However, most of the studies use byte sequence n-grams representation of the binary code of the executables. We propose the use of (Operation Code) OpCodes, generated by disassembling the executables. We then use n-grams of the OpCodes as features for the classification process. We present a full methodology for the detection of unknown malicious code, based on text categorization concepts. We performed an extensive evaluation of a test collection of more than 30,000 files, in which we evaluated extensively the OpCode n-gram representation and investigated …