Introduction¶
The primary goal of tomahawk is to efficiently and conveniently compute
linkage-disequilibrium for a single site vs its local neighbourhood, a sliding
window across an interval, or all-vs-all pairwise genome-wide. Tomahawk is a C++
library with a standard CLI divided into various subroutines, as is the standard
for most bioinformatics tools.
-
tomahawkis much more efficient than existing solutions, both in terms of memory usage and compute time.tomahawkcan easily compute genome-wide LD for cohorts of millions of samples over chromosome-scaled regions. -
tomahawkis designed as a C++ API to simplify LD-based workflow: either directly by using the C++ API or using any of the available language bindings. Currently there are R bindings and Python3 bindings. -
tomahawkhas its own human-readable (.ld) interchange format and a highly compressed binary (.two) format for expedient analysis of the generated output data. Other text-based systems are extremely inefficient and do not support basic operations such as subsetting, searching, summarizing, and visualizing. -
tomahawkaccepts any valid htslib-compatible input variant call format file for import into the internal binarytomahawk(.twk) file format.