Skip to content

Refget Module API Reference

refget

Classes

AlphabetType

Bases: Enum

Represents the type of alphabet for a sequence.

Attributes
Dna2bit instance-attribute
Dna2bit: int
Dna3bit instance-attribute
Dna3bit: int
DnaIupac instance-attribute
DnaIupac: int
Protein instance-attribute
Protein: int
Ascii instance-attribute
Ascii: int
Unknown instance-attribute
Unknown: int
Functions
__str__
__str__() -> str

SequenceMetadata

Metadata for a biological sequence.

Attributes
name instance-attribute
name: str
length instance-attribute
length: int
sha512t24u instance-attribute
sha512t24u: str
md5 instance-attribute
md5: str
alphabet instance-attribute
alphabet: AlphabetType
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str

SequenceRecord

A record representing a biological sequence, including its metadata and optional data.

Attributes
metadata instance-attribute
metadata: SequenceMetadata
data instance-attribute
data: Optional[bytes]
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str

SeqColDigestLvl1

Level 1 digests for a sequence collection.

Attributes
sequences_digest instance-attribute
sequences_digest: str
names_digest instance-attribute
names_digest: str
lengths_digest instance-attribute
lengths_digest: str
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str

SequenceCollection

A collection of biological sequences.

Attributes
sequences instance-attribute
sequences: List[SequenceRecord]
digest instance-attribute
digest: str
lvl1 instance-attribute
lvl1: SeqColDigestLvl1
file_path instance-attribute
file_path: Optional[str]
has_data instance-attribute
has_data: bool
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str

RetrievedSequence

Represents a retrieved sequence segment with its metadata. Exposed from the Rust PyRetrievedSequence struct.

Attributes
sequence instance-attribute
sequence: str
chrom_name instance-attribute
chrom_name: str
start instance-attribute
start: int
end instance-attribute
end: int
Functions
__init__
__init__(sequence: str, chrom_name: str, start: int, end: int) -> None
__repr__
__repr__() -> str
__str__
__str__() -> str

StorageMode

Bases: Enum

Defines how sequence data is stored in the Refget store.

Attributes
Raw instance-attribute
Raw: int
Encoded instance-attribute
Encoded: int

GlobalRefgetStore

A global store for refget sequences, allowing import, retrieval, and storage operations.

Functions
__init__
__init__(mode: StorageMode) -> None
import_fasta
import_fasta(file_path: Union[str, PathLike]) -> None

Import a fasta into the GlobalRefgetStore

get_sequence_by_id
get_sequence_by_id(digest: str) -> Optional[SequenceRecord]

Retrieves a sequence record by its SHA512t24u or MD5 digest.

get_sequence_by_collection_and_name
get_sequence_by_collection_and_name(collection_digest: str, sequence_name: str) -> Optional[SequenceRecord]

Retrieve a SequenceRecord from the store by its collection digest and name

get_substring
get_substring(seq_digest: str, start: int, end: int) -> Optional[str]

Retrieves a substring from an encoded sequence by its SHA512t24u digest. Args: seq_digest - str - the path to import from start - int - The start index of the substring (inclusive) end - int - The end index of the substring (exclusive) Returns: substring - str - returns substring if found, None if not found

write_store_to_directory
write_store_to_directory(root_path: Union[str, PathLike], seqdata_path_template: str) -> None

Write a GlobalRefgetStore object to a directory

load_from_directory classmethod
load_from_directory(root_path: Union[str, PathLike]) -> GlobalRefgetStore

Load a GlobalRefgetStore from a directory path

__str__
__str__() -> str
__repr__
__repr__() -> str

Functions

sha512t24u_digest

sha512t24u_digest(readable: Union[str, bytes]) -> str

Computes the GA4GH SHA512t24u digest for a given string or bytes.

md5_digest

md5_digest(readable: Union[str, bytes]) -> str

Computes the MD5 digest for a given string or bytes.

digest_fasta

digest_fasta(fasta: Union[str, PathLike]) -> SequenceCollection

Digests a FASTA file and returns a SequenceCollection object.