SpeeDB: An Effective Filter for IBD Detection in Large Data Sets

Please contact Lin Huang <linhuang@cs.stanford.edu> for questions or comments

Why SpeeDB?

Identity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient population histories. The majority of recent work in the field has focused on improving the accuracy of inference, targeting shorter genomic segment that originate from a more ancient common ancestor. The accuracy of these methods, however, is achieved at the expense of high computational cost, resulting in a prohibitively long running time when applied to large cohorts.

SpeeDB significantly increases the efficiency of IBD detection in large-scale unphased genotype data sets by rapidly screening out genomic regions that are unlikely to be IBD. The remainder of genomic regions can be passed onto traditional IBD inference methods.

With HapMap data, SpeeDB reduces the total amount of work required to detect 4 cM IBD segments by downstream accurate IBD detection tools by 99.5% at a 99% sensitivity level; the time overhead of running SpeeDB is negligible compared to that of downstream applications. This means that SpeeDB provides a 200x speedup with only 1% sensitivity loss. For close relatives, fourth cousins for example, the speedup is as high as 10,000x with 99% sensitivity.

Highlighted features

  • Ultrafast

  • High pruning power

  • High sensitivity

Latest release

  • The initial release of SpeeDB is now available in a public GitHub repository. Please see the manual page to get started.

Publication

Related tools

  • PARENTE & PARENTE2: relatedness inference in large datasets of unphased genotypes

  • CARROT: a framework for relationship inference