Data mining of online genealogy datasets for revealing lifespan patterns in human population

Michael Fire, Yuval Elovici

ACM Transactions on Intelligent Systems and Technology (TIST) 6 (2), 1-22, 2015

Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can help identify various patterns in the human population. In this study, we present methods and algorithms that can assist in identifying variations in lifespan distributions of the human population in the past centuries, in detecting social and genetic features that correlate with the human lifespan, and in constructing predictive models of human lifespan based on various features that can easily be extracted from genealogy datasets.We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 9 million connections, all of which were collected from the WikiTree website. Our findings indicate that significant but small positive correlations exist between the parents’ lifespan and their children’s …