Skip to content

gtars Module Vision

Conceptual Organization

The gtars modules follow a clear separation of concerns:

1. gtars-overlaprs

Core overlap computation infrastructure - All interval overlap algorithms live here - Single source of truth for overlap operations - Other modules should never reimplement overlap logic

2. gtars-scoring

Wraps overlaprs to produce X-by-peak matrices - Handles all matrix generation use cases: - Cell-by-peak (single-cell) - Sample-by-peak (bulk) - Pseudobulk-by-peak (aggregated) - Focuses on matrix data structures and I/O formats

3. gtars-tokenizers

Wraps overlaprs to produce tokens for ML - Converts genomic regions to vocabulary tokens - Designed for transformer models and ML pipelines - Manages token vocabularies and encoding strategies

Key Principle

All overlap computation flows through gtars-overlaprs. Higher-level modules add domain-specific value: - scoring adds matrix generation - tokenizers adds ML tokenization - Both delegate overlap computation to overlaprs