Fast calculation of LD in large-scale cohorts¶
Tomahawk is a machine-optimized library for computing linkage-disequilibrium from population-sized datasets. Tomahawk permits close to real-time analysis of regions-of-interest in datasets of many millions of diploid individuals on a standard laptop. All algorithms are embarrassingly parallel and have been successfully tested on chromosome-sized datasets with up to 10 million individuals.
Tomahawk uniquely constructs complete haplotype/genotype contigency matrices for each comparison, perform statistical tests on the output data, and provide a framework for querying the resulting data.
CLI Commands¶
Command | Description |
---|---|
aggregate |
data rasterization framework for TWO files |
calc |
calculate linkage disequilibrium |
scalc |
calculate linkage disequilibrium for a single site |
concat |
concatenate TWO files from the same set of samples |
import |
import VCF /VCF.gz /BCF to TWK |
sort |
sort TWO file |
view |
TWO ->LD /TWO view, TWO subset and filter |
haplotype |
extract per-sample haplotype strings in FASTA /binary format |
decay |
compute LD-decay over distance |