Refget Module API Reference
refget
Classes
AlphabetType
Bases: Enum
Represents the type of alphabet for a sequence.
Attributes
Dna2bit
instance-attribute
Dna2bit: int
Dna3bit
instance-attribute
Dna3bit: int
DnaIupac
instance-attribute
DnaIupac: int
Protein
instance-attribute
Protein: int
Ascii
instance-attribute
Ascii: int
Unknown
instance-attribute
Unknown: int
Functions
__str__
__str__() -> str
SequenceMetadata
Metadata for a biological sequence.
Attributes
name
instance-attribute
name: str
length
instance-attribute
length: int
sha512t24u
instance-attribute
sha512t24u: str
md5
instance-attribute
md5: str
alphabet
instance-attribute
alphabet: AlphabetType
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str
SequenceRecord
A record representing a biological sequence, including its metadata and optional data.
Attributes
metadata
instance-attribute
metadata: SequenceMetadata
data
instance-attribute
data: Optional[bytes]
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str
SeqColDigestLvl1
Level 1 digests for a sequence collection.
Attributes
sequences_digest
instance-attribute
sequences_digest: str
names_digest
instance-attribute
names_digest: str
lengths_digest
instance-attribute
lengths_digest: str
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str
SequenceCollection
A collection of biological sequences.
Attributes
sequences
instance-attribute
sequences: List[SequenceRecord]
digest
instance-attribute
digest: str
lvl1
instance-attribute
lvl1: SeqColDigestLvl1
file_path
instance-attribute
file_path: Optional[str]
has_data
instance-attribute
has_data: bool
Functions
__repr__
__repr__() -> str
__str__
__str__() -> str
RetrievedSequence
Represents a retrieved sequence segment with its metadata.
Exposed from the Rust PyRetrievedSequence struct.
Attributes
sequence
instance-attribute
sequence: str
chrom_name
instance-attribute
chrom_name: str
start
instance-attribute
start: int
end
instance-attribute
end: int
Functions
__init__
__init__(sequence: str, chrom_name: str, start: int, end: int) -> None
__repr__
__repr__() -> str
__str__
__str__() -> str
StorageMode
Bases: Enum
Defines how sequence data is stored in the Refget store.
Attributes
Raw
instance-attribute
Raw: int
Encoded
instance-attribute
Encoded: int
GlobalRefgetStore
A global store for refget sequences, allowing import, retrieval, and storage operations.
Functions
__init__
__init__(mode: StorageMode) -> None
import_fasta
import_fasta(file_path: Union[str, PathLike]) -> None
Import a fasta into the GlobalRefgetStore
get_sequence_by_id
get_sequence_by_id(digest: str) -> Optional[SequenceRecord]
Retrieves a sequence record by its SHA512t24u or MD5 digest.
get_sequence_by_collection_and_name
get_sequence_by_collection_and_name(collection_digest: str, sequence_name: str) -> Optional[SequenceRecord]
Retrieve a SequenceRecord from the store by its collection digest and name
get_substring
get_substring(seq_digest: str, start: int, end: int) -> Optional[str]
Retrieves a substring from an encoded sequence by its SHA512t24u digest. Args: seq_digest - str - the path to import from start - int - The start index of the substring (inclusive) end - int - The end index of the substring (exclusive) Returns: substring - str - returns substring if found, None if not found
write_store_to_directory
write_store_to_directory(root_path: Union[str, PathLike], seqdata_path_template: str) -> None
Write a GlobalRefgetStore object to a directory
load_from_directory
classmethod
load_from_directory(root_path: Union[str, PathLike]) -> GlobalRefgetStore
Load a GlobalRefgetStore from a directory path
__str__
__str__() -> str
__repr__
__repr__() -> str
Functions
sha512t24u_digest
sha512t24u_digest(readable: Union[str, bytes]) -> str
Computes the GA4GH SHA512t24u digest for a given string or bytes.
md5_digest
md5_digest(readable: Union[str, bytes]) -> str
Computes the MD5 digest for a given string or bytes.
digest_fasta
digest_fasta(fasta: Union[str, PathLike]) -> SequenceCollection
Digests a FASTA file and returns a SequenceCollection object.