Gtars digests
Introduction
You can use this to compute md5 or GA4GH sha512t24u digests of strings, or FASTA files.
Tutorial
Computing digests for all sequences in a FASTA file:
from gtars.digests import digest_fasta
path_to_fasta_file = "../gtars/gtars/tests/data/base.fa"
df = digest_fasta(path_to_fasta_file)
View results:
for chr in df:
print(chr)
# DigestResult for sequence chrX
# length: 8
# sha512t24u: iYtREV555dUFKg2_agSJW6suquUyPpMw
# md5: 5f63cfaa3ef61f88c9635fb9d18ec945
# DigestResult for sequence chr1
# length: 4
# sha512t24u: YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj
# md5: 31fc6ca291a32fb9df82b85e5f077e31
# DigestResult for sequence chr2
# length: 4
# sha512t24u: AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6
# md5: 92c6a56c9e9459d8a42b96f7884710bc
Acccess a particular digest type:
df[0].sha512t24u
# 'iYtREV555dUFKg2_agSJW6suquUyPpMw'
Compute a digest for a given sequence:
from gtars.digests import sha512t24u_digest, md5_digest
sha512t24u_digest("TCGA")
# 'ORLd3OQy8whca09ypkTExMc_ByFalnnO'
md5_digest("TCGA")
# '45d0ff9f1a9504cf2039f89c1ffb4c32'