Yehonatan Cohen, Daniel Gordon, Danny Hendler
Knowledge-Based Systems Volume 142, 15 February 2018, Pages 241-255
We present ErDOS — an algorithm for the Early Detection Of Spamming accounts. The detection approach implemented by ErDOS combines content-based labelling and features based on inter-account communication patterns. We define new account features, based on the ratio between the numbers of sent and received emails, the distribution of emails received from different accounts, and the topological features of the network induced by inter-account communication. We also present ErDOS-LVS — a variant of ErDOS that targets the detection of low-volume spammers. The empirical evaluation of both detectors is based on a real-life data set collected by an email service provider, much larger than data sets previously used for outgoing spam detection research. It establishes that both are able to provide effective early detection of the spammers population, that is, they identify these accounts as spammers before they are detected as such by a content-based detector. Moreover, both detectors require only a single day of training data for providing a high-quality list of suspect accounts.