FamiLinx is a scientific resource of curated genealogical and demographic data from tens of millions of people mostly from the last 500 years. Different from traditional studies, this resource is the product of an ultra crowd-sourcing approach and is based on the collaborative work of genealogy enthusiasts around the world who documented and shared their family stories.
The starting point of FamiLinx was the public information on Geni.com, a genealogy-driven social network that is operated by MyHeritage. Geni.com allows genealogists to enter their family trees into the website and to create profiles of family members with basic demographic information such as sex, birth date, marital status, and location. The genealogists decide whether they want the profiles in their trees to be public or private. New or modified family tree profiles are constantly compared to all existing profiles, and if there is high similarity to existing ones, the website offers the users the option to merge the profiles and connect the trees.
With permission from MyHeritage, the team downloaded the public profiles of individuals from Geni.com for future scientific studies. We used graph algorithms to clean the data and organize the pedigrees into fast accessible formats. We also employed natural language processing to tokenize birth, residence, death, and burial locations of individuals and converted this information into quantitative longitude and latitude. The format of the FamiLinx data consists of several text files. We encourage users to load these files into a database for ease of use. Users can create their own local copy with the download package.
For privacy purposes, the resource does not contain any names and any attempt to re-identify the users is strictly prohibited.
Green nodes denote individuals and red nodes denote marriages