SherLock vs Moriarty: A Smartphone Dataset for Cybersecurity

Mirsky, A Shabtai, L Rokach, B Shapira, Y Elovici

Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

In this paper we describe and share with the research community,
a significant smartphone dataset obtained from an
ongoing long-term data collection experiment. The dataset
currently contains 10 billion data records from 30 users collected
over a period of 1.6 years and an additional 20 users for
6 months (totaling 50 active users currently participating in
the experiment).
The experiment involves two smartphone agents: SherLock
and Moriarty. SherLock collects a wide variety of software and
sensor data at a high sample rate. Moriarty perpetrates various
attacks on the user and logs its activities, thus providing
labels for the SherLock dataset.
The primary purpose of the dataset is to help security professionals
and academic researchers in developing innovative
methods of implicitly detecting malicious behavior in smartphones.
Specifically, from data obtainable without superuser
(root) privileges. To demonstrate possible uses of the dataset,
we perform a basic malware analysis and evaluate a method
of continuous user authentication.