Introduction
gtars
is a high-performance toolkit for genomic tools and algorithms in Rust. Built with Rust for speed and reliability, gtars provides core utilities for machine learning on genomic intervals for the geniml Python package. It also provides lots of utility as a standalone library for alternative downstream use cases.
Installation
Rust Library
Gtars uses a feature-flag system to allow you to include only the modules you need. Add to your Cargo.toml
:
[dependencies]
# Install specific features
gtars = { version = "0.5", features = ["overlaprs", "tokenizers"] }
# Or install from GitHub
gtars = { git = "https://github.com/databio/gtars", features = ["overlaprs", "tokenizers"] }
Modules:
core
- Core functionality and data structurestokenizers
- Genomic region tokenizersio
- I/O utilitiesrefget
- Reference sequence accessoverlaprs
- Overlap operationsuniwig
- Coverage computationigd
- Interval searchbbcache
- BED file cachingscoring
- Fragment scoringfragsplit
- Fragment splitting
Example combinations:
# For machine learning tasks
gtars = { version = "0.5", features = ["tokenizers", "core"] }
# For genomic analysis
gtars = { version = "0.5", features = ["overlaprs", "uniwig", "scoring"] }
# For data access
gtars = { version = "0.5", features = ["refget", "bbcache", "io"] }
Python Package
pip install gtars
See further documentation under Python bindings.
Command-Line Interface
Install from source:
git clone https://github.com/databio/gtars
cd gtars
cargo install --path gtars-cli
Or download precompiled binaries from the releases page.
Development
Run tests with cargo test
from the workspace root. Please see CONTRIBUTING.md for development guidelines.
Module organization
gtars
is organized into modules. The modules section gives an overview of each module.