SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods

Aviad Cohen, Nir Nissim, Lior Rokach, Yuval Elovici

Expert Systems with Applications 63, 324-343, 2016

Office documents are used extensively by individuals and organizations. Most users consider these documents safe for use. Unfortunately, Office documents can contain malicious components and perform harmful operations. Attackers increasingly take advantage of naive users and leverage Office documents in order to launch sophisticated advanced persistent threat (APT) and ransomware attacks. Recently, targeted cyber-attacks against organizations have been initiated with emails containing malicious attachments. Since most email servers do not allow the attachment of executable files to emails, attackers prefer to use of non-executable files (e.g., documents) for malicious purposes. Existing anti-virus engines primarily use signature-based detection methods, and therefore fail to detect new unknown malicious code which has been embedded in an Office document. Machine learning methods have been …