Introduction¶
The primary goal of tomahawk
is to efficiently and conveniently compute
linkage-disequilibrium for a single site vs its local neighbourhood, a sliding
window across an interval, or all-vs-all pairwise genome-wide. Tomahawk is a C++
library with a standard CLI divided into various subroutines, as is the standard
for most bioinformatics tools.
-
tomahawk
is much more efficient than existing solutions, both in terms of memory usage and compute time.tomahawk
can easily compute genome-wide LD for cohorts of millions of samples over chromosome-scaled regions. -
tomahawk
is designed as a C++ API to simplify LD-based workflow: either directly by using the C++ API or using any of the available language bindings. Currently there are R bindings and Python3 bindings. -
tomahawk
has its own human-readable (.ld
) interchange format and a highly compressed binary (.two
) format for expedient analysis of the generated output data. Other text-based systems are extremely inefficient and do not support basic operations such as subsetting, searching, summarizing, and visualizing. -
tomahawk
accepts any valid htslib-compatible input variant call format file for import into the internal binarytomahawk
(.twk
) file format.