A Bar, B Shapira, L Rokach, M Unger
Big Data (Big Data), 2016 IEEE International Conference on, 1130-1135
Attack propagation models within honeypot systems aim at providing insights about attack strategies that target multiple honeypots, rather than analyzing attacks on each honeypot separately. Traditional attack propagation models focus on building a single probabilistic model. This modeling approach may be misleading, since it does not take into consideration contextual information such as the country from which the attack is initiated. In addition, with the massive increase in the magnitude of attacks on honeypots, a scalable modeling approach is required. In this work we present a novel attack propagation model that can utilize contextual information about the attacks by training multiple Markov Chain models. Moreover, we add additional layers of analysis: first, we present a likelihood estimation procedure that can identify new and evolving attack patterns; and second, we introduce a method for generating simulated attack sequences that can be used for training or sensitivity analysis. Lastly, we present, in details, a MapReduce design for all suggested algorithms in order to address scalability issues. We evaluate our methods on a massive dataset which includes approximately 170 million attacks on an operational honeypot system. Results indicate that contextual modeling is important for explaining attack propagation that may vary by country. In addition, we show the effectiveness of the suggested method for generating simulated sequences by comparing the attack propagation patterns we learned in the generated dataset and the original one. Finally, we demonstrate the scalability of all of the proposed algorithms on real and synthetic datasets that include over a billion records.