Skip to content

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.2.3]

  • Improved RegionSet, by adding a multiple new methods: to_bed, to_bed_gz, to_bigbed, identifier(), and others.
  • Fixed allowed fasta_digest to accept Path or bytes #93

[0.2.2]

[0.2.1]

  • allow comments at the beginning of fragment files
  • bump bigtools to 0.5.5, fixing #74 and #77

[0.2.0]

  • add position shift workflow for bam to bw (was previously added for bam to bed)
  • add scaling argument for bam to bw workflow #53
  • fix accumulation issue for bam workflow #56
  • fix wiggle file (core) beginning at 0 #43
  • fix npy file (end) using start instead of end #61
  • force zoom to 1 for bed/narrowPeak to bw #34
  • fix IGD overlap issue #45
  • add ga4gh refget digest functionality #58
  • fix wig and npy inconsistency #64
  • fix narrowPeak to bw zoom #34
  • fix bed to bw fileheader consistency issue #52
  • change npy metadata file structure #65

[0.1.2]

  • add position shift workflow for bam to bw (was previously added for bam to bed)
  • add scaling argument for bam to bw workflow #53
  • fix accumulation issue for bam workflow #56
  • fix wiggle file (core) beginning at 0 #43
  • fix npy file (end) using start instead of end #61
  • force zoom to 1 for bed/narrowPeak to bw #34
  • fix IGD overlap issue #45
  • add ga4gh refget digest functionality #58

[0.1.1]

  • hot fix for broken python bindings; remove IGD from the python bindings for now

[0.1.0]

  • Rust implementation of uniwig that expands on the C++ version
  • Uniwig now accepts a single sorted .bed file, .narrowPeak file, or .bam file.
  • Outputs now include .wig, .npy, .bedGraph, and .bw
  • Accumulations can now be counted via .narrowPeak scoring
  • Rust implementation of igd ported from the C version (experimental).
  • Region scoring matrix calculation for region clustering
  • Fragment file splitter for pseudobulking

[0.0.15]

  • added meta tokenization tools and a new MetaTokenizer struct that can be used to tokenize regions using the meta-token strategy.
  • added some annotations to the pyo3 #[pyclass] and #[pymethods] attributes to make the python bindings more readable.

[0.0.14]

  • renamed repository to gtars to better reflect the project's goals.

[0.0.13]

  • implemented a fragment file tokenizer that will generate .gtok files directly from fragments.tsv.gz files.
  • fix an off-by-one error in the region-to-id maps in the Universe structs. This was leading to critical bugs in our models.

[0.0.12]

  • optimize creation of PyRegionSet to reduce expensive cloning of Universe structs.

[0.0.11]

  • redesigned API for the tokenizers to better emulate the huggingface tokenizers API.
  • implemented new traits for tokenizers to allow for more flexibility when creating new tokenizers.
  • bumped the version pyo3 to 0.21.0
  • added rust-numpy dependency to the python bindings for exporting tokenized regions as numpy arrays.
  • overall stability improvements to the tokenizers and the python bindings.

[0.0.10]

  • update file format specifications

[0.0.9]

  • start working on the concept of a .gtok file-format to store tokenized regions
  • added basic readers and writers for this format

[0.0.8]

  • add a new ids_as_strs getter to the TokenizedRegionSet struct so that we can get the ids as strings quickly, this is meant mostly for interface with geniml.

[0.0.7]

  • move things around based on rust club feedback

[0.0.6]

  • update python bindings to support the module/submodule structure (https://github.com/PyO3/pyo3/issues/759#issuecomment-1828431711)
  • change name of some submodules
  • remove consts submodule, just add to base
  • expose a __version__ attribute in the python bindings

[0.0.5]

  • add many "core utils"
  • move gtokenizers into this package inside gtars::tokenizers
  • create tokenize cli
  • add tests for core utils and tokenizers
  • RegionSet is now backed by a polars DataFrame
  • new python bindings for core utils and tokenizers

[0.0.4]

  • add type annotations to the python bindings

[0.0.3]

  • work on python bindings initialization

[0.0.2]

  • prepare for first release

[0.0.1]

  • initial setup of repository
  • two main wrappers: 1) wrapper binary crate, and 2) wrapper library crate
  • gtars can be used as a library crate. or as a command line tool