Defense methods against adversarial examples for recurrent neural networks

Ishai Rosenberg, Asaf Shabtai, Yuval Elovici, Lior Rokach

arXiv preprint arXiv:1901.09963, 2019

Adversarial examples are known to mislead deep learning models to incorrectly classify them, even in domains where such models achieve state-of-the-art performance. Until recently, research on both attack and defense methods focused on image recognition, primarily using convolutional neural networks (CNNs). In recent years, adversarial example generation methods for recurrent neural networks (RNNs) have been published, demonstrating that RNN classifiers are also vulnerable to such attacks. In this paper, we present a novel defense method, termed sequence squeezing, to make RNN classifiers more robust against such attacks. Our method differs from previous defense methods which were designed only for non-sequence based models. We also implement four additional RNN defense methods inspired by recently published CNN defense methods. We evaluate our methods against state-of-the-art attacks in the cyber security domain where real adversaries (malware developers) exist, but our methods can be applied against other discrete sequence based adversarial attacks, e.g., in the NLP domain. Using our methods we were able to decrease the effectiveness of such attack from 99.9% to 15%.