Skip to content

C++ API: Build Status Release Release License

R: Release License

Python3: Release License

Fast calculation of LD in large-scale cohorts

Tomahawk is a machine-optimized library for computing linkage-disequilibrium from population-sized datasets. Tomahawk permits close to real-time analysis of regions-of-interest in datasets of many millions of diploid individuals on a standard laptop. All algorithms are embarrassingly parallel and have been successfully tested on chromosome-sized datasets with up to 10 million individuals.

Tomahawk uniquely constructs complete haplotype/genotype contigency matrices for each comparison, perform statistical tests on the output data, and provide a framework for querying the resulting data.

CLI Commands

Command Description
aggregate data rasterization framework for TWO files
calc calculate linkage disequilibrium
scalc calculate linkage disequilibrium for a single site
concat concatenate TWO files from the same set of samples
import import VCF/VCF.gz/BCF to TWK
sort sort TWO file
view TWO->LD/TWO view, TWO subset and filter
haplotype extract per-sample haplotype strings in FASTA/binary format
decay compute LD-decay over distance