Gtars digests using python bindings
Introduction
You can use this to compute md5 or GA4GH sha512t24u digests of strings, or FASTA files.
Tutorial
Computing digests for all sequences in a FASTA file:
import os
import tempfile
from gtars.refget import digest_fasta
with tempfile.TemporaryDirectory() as temp_dir:
fasta_content = (
">chr1 description\n"
"ATGCATGCATGC\n"
">chr2\n"
"GGGGAAAA\n"
)
fasta_path = os.path.join(temp_dir, "example.fa")
with open(fasta_path, "w") as f:
f.write(fasta_content)
collection = digest_fasta(fasta_path)
print(f"Collection-level digest: {collection.digest}")
print(f"Number of sequences in collection: {len(collection)}")
print(f"Metadata for first sequence: {collection[0].metadata.name}, Length: {collection[0].metadata.length}")
Compute a digest for a given sequence:
from gtars.refget import sha512t24u_digest
sequence_data = "AGCT"
digest = sha512t24u_digest(sequence_data)
print(f"SHA512t24u digest for '{sequence_data}': {digest}")