Transferable Cost-Aware Security Policy Implementation for Malware Detection Using Deep Reinforcement Learning

Yoni Birman, Shaked Hindi, Gilad Katz, Asaf Shabtai

arXiv preprint arXiv:1905.10517, 2019

Malware detection is an ever-present challenge for all organizational gatekeepers, who must maintain high detection rates while minimizing interruptions to the organization’s workflow. To improve detection rates, organizations often deploy an ensemble of detectors. While effective, this approach is computationally expensive, since every file – even clear-cut cases – needs to be analyzed by all detectors. Moreover, with an ever-increasing number of files to process, the use of ensembles may incur unacceptable processing times and costs (e.g., cloud resources). In this study, we propose SPIREL, a reinforcement learning-based method for cost-effective malware detection. Our method enables organizations to directly associate costs to correct/incorrect classification, computing resources and run-time, and then dynamically establishes a security policy. This security policy is then implemented, and for each inspected file, a different set of detectors is assigned and a different detection threshold is set. Our evaluation on two malware domains- Portable Executable (PE) and Android Application Package (APK)files – shows that SPIREL is both accurate and extremely resource-efficient: the proposed method either outperforms the best performing baselines while achieving a modest improvement in efficiency, or reduces the required running time by ~80% while decreasing the accuracy and F1-score by only 0.5%. We also show that our approach is both highly transferable across different datasets and adaptable to changes in individual detector performance.