Refget Module API Reference

refget

Type stubs and documentation for the gtars.refget module.

This file serves two purposes:

Type Hints: Provides type annotations for IDE autocomplete and static type checking tools like mypy.
Documentation: Contains Google-style docstrings that mkdocstrings uses to generate the API reference documentation website.

Note: The actual implementation is in Rust (gtars-python/src/refget/mod.rs) and compiled via PyO3. This stub file provides the Python interface definition and structured documentation that tools can parse properly.

Classes

AlphabetType

Bases: Enum

Represents the type of alphabet for a sequence.

Attributes

Dna2bit `instance-attribute`

Dna2bit: int

Dna3bit `instance-attribute`

Dna3bit: int

DnaIupac `instance-attribute`

DnaIupac: int

Protein `instance-attribute`

Protein: int

Ascii `instance-attribute`

Ascii: int

Unknown `instance-attribute`

Unknown: int

Functions

str

__str__() -> str

FaiMetadata

FASTA index (FAI) metadata for a sequence.

Contains the information needed to quickly seek to a sequence in a FASTA file, compatible with samtools faidx format.

Attributes:

offset (int) –

Byte offset of the first base in the FASTA file.
line_bases (int) –

Number of bases per line.
line_bytes (int) –

Number of bytes per line (including newline).

Attributes

offset `instance-attribute`

offset: int

line_bases `instance-attribute`

line_bases: int

line_bytes `instance-attribute`

line_bytes: int

Functions

repr

__repr__() -> str

str

__str__() -> str

FaiRecord

A FASTA index record for a single sequence.

Represents one line of a .fai index file with sequence name, length, and FAI metadata for random access.

Attributes:

name (str) –

Sequence name.
length (int) –

Sequence length in bases.
fai (Optional[FaiMetadata]) –

FAI metadata (None for gzipped files).

Attributes

name `instance-attribute`

name: str

length `instance-attribute`

length: int

fai `instance-attribute`

fai: Optional[FaiMetadata]

Functions

repr

__repr__() -> str

str

__str__() -> str

SequenceMetadata

Metadata for a biological sequence.

Contains identifying information and computed digests for a sequence, without the actual sequence data.

Attributes:

name (str) –

Sequence name (first word of FASTA header).
description (Optional[str]) –

Description from FASTA header (text after first whitespace).
length (int) –

Length of the sequence in bases.
sha512t24u (str) –

GA4GH SHA-512/24u digest (32-char base64url).
md5 (str) –

MD5 digest (32-char hex string).
alphabet (AlphabetType) –

Detected alphabet type (DNA, protein, etc.).
fai (Optional[FaiMetadata]) –

FASTA index metadata if available.

Attributes

name `instance-attribute`

name: str

description `instance-attribute`

description: Optional[str]

length `instance-attribute`

length: int

sha512t24u `instance-attribute`

sha512t24u: str

md5 `instance-attribute`

md5: str

alphabet `instance-attribute`

alphabet: AlphabetType

fai `instance-attribute`

fai: Optional[FaiMetadata]

Functions

repr

__repr__() -> str

str

__str__() -> str

SequenceRecord

A record representing a biological sequence, including its metadata and optional data.

SequenceRecord can be either a "stub" (metadata only) or "full" (metadata + data). Stubs are used for lazy-loading where sequence data is fetched on demand.

Attributes:

metadata (SequenceMetadata) –

Sequence metadata (name, length, digests).
sequence (Optional[bytes]) –

Raw sequence data if loaded, None for stubs.
is_loaded (bool) –

Whether sequence data is loaded (True) or just metadata (False).

Attributes

metadata `instance-attribute`

metadata: SequenceMetadata

sequence `instance-attribute`

sequence: Optional[bytes]

is_loaded `property`

is_loaded: bool

Whether sequence data is loaded (true) or just metadata (false).

Functions

decode

decode() -> Optional[str]

Decode and return the sequence data as a string.

For Full records with sequence data, returns the decoded sequence. For Stub records without sequence data, returns None.

Returns:

Optional[str] –

Decoded sequence string if data is available, None otherwise.

repr

__repr__() -> str

str

__str__() -> str

SeqColDigestLvl1

Level 1 digests for a sequence collection.

Attributes

sequences_digest `instance-attribute`

sequences_digest: str

names_digest `instance-attribute`

names_digest: str

lengths_digest `instance-attribute`

lengths_digest: str

Functions

repr

__repr__() -> str

str

__str__() -> str

SequenceCollectionMetadata

Metadata for a sequence collection.

Contains the collection digest and level 1 digests for names, sequences, and lengths. This is a lightweight representation of a collection without the actual sequence list.

Attributes:

digest (str) –

The collection's SHA-512/24u digest.
n_sequences (int) –

Number of sequences in the collection.
names_digest (str) –

Level 1 digest of the names array.
sequences_digest (str) –

Level 1 digest of the sequences array.
lengths_digest (str) –

Level 1 digest of the lengths array.
name_length_pairs_digest (Optional[str]) –

Ancillary digest (if computed).
sorted_name_length_pairs_digest (Optional[str]) –

Ancillary digest (if computed).
sorted_sequences_digest (Optional[str]) –

Ancillary digest (if computed).

Attributes

digest `instance-attribute`

digest: str

n_sequences `instance-attribute`

n_sequences: int

names_digest `instance-attribute`

names_digest: str

sequences_digest `instance-attribute`

sequences_digest: str

lengths_digest `instance-attribute`

lengths_digest: str

name_length_pairs_digest `instance-attribute`

name_length_pairs_digest: Optional[str]

sorted_name_length_pairs_digest `instance-attribute`

sorted_name_length_pairs_digest: Optional[str]

sorted_sequences_digest `instance-attribute`

sorted_sequences_digest: Optional[str]

Functions

repr

__repr__() -> str

str

__str__() -> str

SequenceCollection

A collection of biological sequences (e.g., a genome assembly).

SequenceCollection represents a set of sequences with collection-level digests following the GA4GH seqcol specification. Supports iteration, indexing, and len().

Attributes:

sequences (List[SequenceRecord]) –

List of sequence records.
digest (str) –

Collection-level SHA-512/24u digest (Level 2).
lvl1 (SeqColDigestLvl1) –

Level 1 digests for names, lengths, sequences.
file_path (Optional[str]) –

Source file path if loaded from FASTA.

Examples:

Iterate over sequences::

for seq in collection:
    print(f"{seq.metadata.name}: {seq.metadata.length} bp")

Access by index::

first_seq = collection[0]
last_seq = collection[-1]

Get length::

n = len(collection)

Attributes

sequences `instance-attribute`

sequences: List[SequenceRecord]

digest `instance-attribute`

digest: str

lvl1 `instance-attribute`

lvl1: SeqColDigestLvl1

file_path `instance-attribute`

file_path: Optional[str]

Functions

write_fasta

write_fasta(file_path: str, line_width: Optional[int] = None) -> None

Write the collection to a FASTA file.

Parameters:

file_path (str) –

Path to the output FASTA file.
line_width (Optional[int], default: None ) –

Number of bases per line (default: 70).

Raises:

IOError –

If any sequence doesn't have data loaded.

Example::

collection = load_fasta("genome.fa")
collection.write_fasta("output.fa")
collection.write_fasta("output.fa", line_width=60)

len

__len__() -> int

getitem

__getitem__(idx: int) -> SequenceRecord

iter

__iter__() -> Iterator[SequenceRecord]

repr

__repr__() -> str

str

__str__() -> str

RetrievedSequence

Represents a retrieved sequence segment with its metadata.

Returned by methods that extract subsequences from specific regions, such as substrings_from_regions().

Attributes:

sequence (str) –

The extracted sequence string.
chrom_name (str) –

Chromosome/sequence name (e.g., "chr1").
start (int) –

Start position (0-based, inclusive).
end (int) –

End position (0-based, exclusive).

Attributes

sequence `instance-attribute`

sequence: str

chrom_name `instance-attribute`

chrom_name: str

start `instance-attribute`

start: int

end `instance-attribute`

end: int

Functions

init

__init__(sequence: str, chrom_name: str, start: int, end: int) -> None

repr

__repr__() -> str

str

__str__() -> str

StorageMode

Bases: Enum

Defines how sequence data is stored in the Refget store.

Variants

Raw: Store sequences as raw bytes (1 byte per base). Encoded: Store sequences with 2-bit encoding (4 bases per byte).

Attributes

Raw `instance-attribute`

Raw: int

Encoded `instance-attribute`

Encoded: int

FhrMetadata

FAIR Headers Reference genome metadata for a sequence collection.

Fields match the FHR 1.0 specification. All fields are optional. Note: schema_version is a number (int or float) per spec, passed as a Python numeric type and stored as serde_json::Number internally.

Attributes

genome `instance-attribute`

genome: Optional[str]

version `instance-attribute`

version: Optional[str]

masking `instance-attribute`

masking: Optional[str]

genome_synonym `instance-attribute`

genome_synonym: Optional[list[str]]

voucher_specimen `instance-attribute`

voucher_specimen: Optional[str]

documentation `instance-attribute`

documentation: Optional[str]

identifier `instance-attribute`

identifier: Optional[list[str]]

scholarly_article `instance-attribute`

scholarly_article: Optional[str]

funding `instance-attribute`

funding: Optional[str]

Functions

init

__init__(**kwargs: Any) -> None

from_json `staticmethod`

from_json(path: str) -> FhrMetadata

to_dict

to_dict() -> dict[str, Any]

to_json

to_json(path: str) -> None

repr

__repr__() -> str

RefgetStore

A global store for GA4GH refget sequences with lazy-loading support.

RefgetStore provides content-addressable storage for reference genome sequences following the GA4GH refget specification. Supports both local and remote stores with on-demand sequence loading.

Attributes:

cache_path (Optional[str]) –

Local directory path where the store is located or cached. None for in-memory stores.
remote_url (Optional[str]) –

Remote URL of the store if loaded remotely, None otherwise.
quiet (bool) –

Whether the store suppresses progress output.
storage_mode (StorageMode) –

Current storage mode (Raw or Encoded).

Note

Boolean evaluation: RefgetStore follows Python container semantics, meaning bool(store) is False for empty stores (like list, dict, etc.). To check if a store variable is initialized (not None), use if store is not None: rather than if store:.

Example::

store = RefgetStore.in_memory()  # Empty store
bool(store)  # False (empty container)
len(store)   # 0

# Wrong: checks emptiness, not initialization
if store:
    process(store)

# Right: checks if variable is set
if store is not None:
    process(store)

Examples:

Create a new store and import sequences::

from gtars.refget import RefgetStore
store = RefgetStore.in_memory()
store.add_sequence_collection_from_fasta("genome.fa")

Open an existing local store::

store = RefgetStore.open_local("/data/hg38")
seq = store.get_substring("chr1_digest", 0, 1000)

Open a remote store with caching::

store = RefgetStore.open_remote(
    "/local/cache",
    "https://example.com/hg38"
)

Attributes

cache_path `instance-attribute`

cache_path: Optional[str]

remote_url `instance-attribute`

remote_url: Optional[str]

quiet `property`

quiet: bool

Whether the store is in quiet mode.

storage_mode `property`

storage_mode: StorageMode

Current storage mode (Raw or Encoded).

is_persisting `property`

is_persisting: bool

Whether the store is currently persisting to disk.

Example::

store = RefgetStore.in_memory()
print(store.is_persisting)  # False
store.enable_persistence("/data/store")
print(store.is_persisting)  # True

Functions

in_memory `classmethod`

in_memory() -> RefgetStore

Create a new in-memory RefgetStore.

Creates a store that keeps all sequences in memory. Use this for temporary processing or when you don't need disk persistence.

Returns:

RefgetStore –

New empty RefgetStore with Encoded storage mode.

Example::

store = RefgetStore.in_memory()
store.add_sequence_collection_from_fasta("genome.fa")

store_exists `classmethod`

store_exists(path: Union[str, PathLike]) -> bool

Check whether a valid RefgetStore exists at the given path.

Returns True if the path contains a store manifest file, indicating the store has been initialized. Returns False if the path does not exist or does not contain a store.

This avoids hardcoding knowledge of the store's internal file format in calling code.

Parameters:

path (Union[str, PathLike]) –

Path to the store directory.

Returns:

bool –

True if a store exists at the path, False otherwise.

Example::

from gtars.refget import RefgetStore
RefgetStore.store_exists("/data/hg38_store")  # True
RefgetStore.store_exists("/tmp/empty")  # False

on_disk `classmethod`

on_disk(cache_path: Union[str, PathLike]) -> RefgetStore

Create or load a disk-backed RefgetStore.

If the directory contains an existing store (rgstore.json), loads it. Otherwise creates a new store with Encoded mode.

Parameters:

cache_path (Union[str, PathLike]) –

Directory path for the store. Created if it doesn't exist.

Returns:

RefgetStore –

RefgetStore (new or loaded from disk).

Example::

store = RefgetStore.on_disk("/data/my_store")
store.add_sequence_collection_from_fasta("genome.fa")
# Store is automatically persisted to disk

open_local `classmethod`

open_local(path: Union[str, PathLike]) -> RefgetStore

Open a local RefgetStore from a directory.

Loads only lightweight metadata and stubs. Collections and sequences remain as stubs until explicitly accessed with get_collection()/get_sequence().

Expects: rgstore.json, sequences.rgsi, collections.rgci, collections/*.rgsi

Parameters:

path (Union[str, PathLike]) –

Local directory containing the refget store.

Returns:

RefgetStore –

RefgetStore with metadata loaded, sequences lazy-loaded.

Raises:

IOError –

If the store directory or index files cannot be read.

Example::

store = RefgetStore.open_local("/data/hg38_store")
seq = store.get_substring("chr1_digest", 0, 1000)

open_remote `classmethod`

open_remote(cache_path: Union[str, PathLike], remote_url: str) -> RefgetStore

Open a remote RefgetStore with local caching.

Loads only lightweight metadata and stubs from the remote URL. Data is fetched on-demand when get_collection()/get_sequence() is called.

By default, persistence is enabled (sequences are cached to disk). Call disable_persistence() after loading to keep only in memory.

Parameters:

cache_path (Union[str, PathLike]) –

Local directory to cache downloaded metadata and sequences. Created if it doesn't exist.
remote_url (str) –

Base URL of the remote refget store (e.g., "https://example.com/hg38" or "s3://bucket/hg38").

Returns:

RefgetStore –

RefgetStore with metadata loaded, sequences fetched on-demand.

Raises:

IOError –

If remote metadata cannot be fetched or cache cannot be written.

Example::

store = RefgetStore.open_remote(
    "/data/cache/hg38",
    "https://refget-server.com/hg38"
)
# First access fetches from remote and caches
seq = store.get_substring("chr1_digest", 0, 1000)
# Second access uses cache
seq2 = store.get_substring("chr1_digest", 1000, 2000)

set_encoding_mode

set_encoding_mode(mode: StorageMode) -> None

Change the storage mode, re-encoding/decoding existing sequences as needed.

When switching from Raw to Encoded, all Full sequences in memory are encoded (2-bit packed). When switching from Encoded to Raw, all Full sequences in memory are decoded back to raw bytes.

Parameters:

mode (StorageMode) –

The storage mode to switch to (StorageMode.Raw or StorageMode.Encoded).

Example::

store = RefgetStore.in_memory()
store.set_encoding_mode(StorageMode.Raw)

enable_encoding

enable_encoding() -> None

Enable 2-bit encoding for space efficiency.

Re-encodes any existing Raw sequences in memory.

Example::

store = RefgetStore.in_memory()
store.disable_encoding()  # Switch to Raw
store.enable_encoding()   # Back to Encoded

disable_encoding

disable_encoding() -> None

Disable encoding, use raw byte storage.

Decodes any existing Encoded sequences in memory.

Example::

store = RefgetStore.in_memory()
store.disable_encoding()  # Switch to Raw mode

set_quiet

set_quiet(quiet: bool) -> None

Set whether to suppress progress output.

When quiet is True, operations like add_sequence_collection_from_fasta will not print progress messages.

Parameters:

quiet (bool) –

Whether to suppress progress output.

Example::

store = RefgetStore.in_memory()
store.set_quiet(True)
store.add_sequence_collection_from_fasta("genome.fa")  # No output

enable_persistence

enable_persistence(path: Union[str, PathLike]) -> None

Enable disk persistence for this store.

Sets up the store to write sequences to disk. Any in-memory Full sequences are flushed to disk and converted to Stubs.

Parameters:

path (Union[str, PathLike]) –

Directory for storing sequences and metadata.

Raises:

IOError –

If the directory cannot be created or written to.

Example::

store = RefgetStore.in_memory()
store.add_sequence_collection_from_fasta("genome.fa")
store.enable_persistence("/data/store")  # Flush to disk

disable_persistence

disable_persistence() -> None

Disable disk persistence for this store.

New sequences will be kept in memory only. Existing Stub sequences can still be loaded from disk if local_path is set.

Example::

store = RefgetStore.open_remote("/cache", "https://example.com")
store.disable_persistence()  # Stop caching new sequences

add_sequence_collection_from_fasta

add_sequence_collection_from_fasta(file_path: Union[str, PathLike], force: bool = False, namespaces: Optional[List[str]] = None) -> tuple[SequenceCollectionMetadata, bool]

Add a sequence collection from a FASTA file.

Reads a FASTA file, digests the sequences, creates a SequenceCollection, and adds it to the store along with all its sequences.

Parameters:

file_path (Union[str, PathLike]) –

Path to the FASTA file to import.
force (bool, default: False ) –

If True, overwrite existing collections/sequences. If False (default), skip duplicates.
namespaces (Optional[List[str]], default: None ) –

Optional list of namespace prefixes to extract aliases from FASTA headers. For example, ["ncbi", "refseq"] will scan headers for tokens like ncbi:NC_000001.11 and register them as aliases.

Returns:

tuple[SequenceCollectionMetadata, bool] –

A tuple containing: - SequenceCollectionMetadata: Metadata for the collection. - bool: True if the collection was newly added, False if it already existed.

Raises:

IOError –

If the file cannot be read or processed.

Example::

store = RefgetStore.in_memory()
metadata, was_new = store.add_sequence_collection_from_fasta("genome.fa")
print(f"{'Added' if was_new else 'Skipped'}: {metadata.digest}")

# Extract aliases from FASTA headers
metadata, was_new = store.add_sequence_collection_from_fasta(
    "genome.fa", namespaces=["ncbi", "refseq"]
)

add_sequence_collection

add_sequence_collection(collection: SequenceCollection, force: bool = False) -> None

Add a pre-built SequenceCollection to the store.

Adds a SequenceCollection (created via digest_fasta() or programmatically) directly to the store without reading from a FASTA file.

Parameters:

collection (SequenceCollection) –

A SequenceCollection to add.
force (bool, default: False ) –

If True, overwrite existing collections/sequences. If False (default), skip duplicates.

Raises:

IOError –

If the collection cannot be stored.

Example::

from gtars.refget import RefgetStore, digest_fasta
store = RefgetStore.in_memory()
collection = digest_fasta("genome.fa")
store.add_sequence_collection(collection)

add_sequence

add_sequence(sequence: SequenceRecord, force: bool = False) -> None

Add a sequence to the store without collection association.

The sequence can be created using digest_sequence() and later retrieved by its digest via get_sequence().

Parameters:

sequence (SequenceRecord) –

A SequenceRecord created by digest_sequence().
force (bool, default: False ) –

If True, overwrite existing. If False (default), skip duplicates.

Raises:

IOError –

If the sequence cannot be stored.

Example::

from gtars.refget import RefgetStore, digest_sequence
store = RefgetStore.in_memory()
seq = digest_sequence(b"ACGTACGT")
store.add_sequence(seq)
retrieved = store.get_sequence(seq.metadata.sha512t24u)

list_collections

list_collections(page: int = 0, page_size: int = 100, filters: Optional[Dict[str, str]] = None) -> Dict[str, Any]

List collections with pagination and optional attribute filtering.

Parameters:

page (int, default: 0 ) –

0-indexed page number.
page_size (int, default: 100 ) –

Number of results per page.
filters (Optional[Dict[str, str]], default: None ) –

Optional attribute filters (AND logic). Keys are attribute names (names, lengths, sequences, name_length_pairs, sorted_name_length_pairs, sorted_sequences), values are digests.

Returns:

Dict[str, Any] –

Dict with "results" (list of SequenceCollectionMetadata) and
Dict[str, Any] –

"pagination" (dict with page, page_size, total).

Example::

# Get first page of all collections
result = store.list_collections()
for meta in result["results"]:
    print(f"{meta.digest}: {meta.n_sequences} sequences")
print(f"Total: {result['pagination']['total']}")

# Filter by names digest
result = store.list_collections(filters={"names": "abc123"})

remove_collection

remove_collection(digest: str, remove_orphan_sequences: bool = False) -> bool

Remove a collection from the store.

Parameters:

digest (str) –

The collection's SHA-512/24u digest string.
remove_orphan_sequences (bool, default: False ) –

If True, also remove sequences no longer referenced by any remaining collection. Default: False.

Returns:

bool –

True if the collection was found and removed, False if not found.

get_collection_metadata

get_collection_metadata(collection_digest: str) -> Optional[SequenceCollectionMetadata]

Get metadata for a collection by digest.

Returns lightweight metadata without loading the full collection. Use this for quick lookups of collection information.

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest.

Returns:

Optional[SequenceCollectionMetadata] –

Collection metadata if found, None otherwise.

Example::

meta = store.get_collection_metadata("uC_UorBNf3YUu1YIDainBhI94CedlNeH")
if meta:
    print(f"Collection has {meta.n_sequences} sequences")

get_collection

get_collection(collection_digest: str) -> SequenceCollection

Get a collection by digest with all sequences loaded.

Loads the collection and all its sequence data into memory. Use this when you need full access to sequence content.

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest.

Returns:

SequenceCollection –

The collection with all sequence data loaded.

Raises:

IOError –

If the collection cannot be loaded.

Example::

collection = store.get_collection("uC_UorBNf3YUu1YIDainBhI94CedlNeH")
for seq in collection.sequences:
    print(f"{seq.metadata.name}: {seq.decode()[:20]}...")

iter_collections

iter_collections() -> List[SequenceCollection]

Iterate over all collections with their sequences loaded.

This loads all collection data upfront and returns a list of SequenceCollection objects with full sequence data.

For browsing without loading data, use list_collections() instead.

Returns:

List[SequenceCollection] –

List of all collections with loaded sequences.

Example::

for coll in store.iter_collections():
    print(f"{coll.digest}: {len(coll.sequences)} sequences")

is_collection_loaded

is_collection_loaded(collection_digest: str) -> bool

Check if a collection is fully loaded.

Returns True if the collection's sequence list is loaded in memory, False if it's only metadata (stub).

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest.

Returns:

bool –

True if loaded, False otherwise.

list_sequences

list_sequences() -> List[SequenceMetadata]

List all sequence metadata in the store.

Returns metadata for all sequences without loading sequence data. Use this for browsing/inventory operations.

Returns:

List[SequenceMetadata] –

List of metadata for all sequences in the store.

Example::

for meta in store.list_sequences():
    print(f"{meta.name}: {meta.length} bp")

get_sequence_metadata

get_sequence_metadata(seq_digest: str) -> Optional[SequenceMetadata]

Get metadata for a sequence by digest (no data loaded).

Use this for lightweight lookups when you don't need the actual sequence. Automatically strips "SQ." prefix from digest if present.

Parameters:

seq_digest (str) –

The sequence's SHA-512/24u digest, optionally with "SQ." prefix.

Returns:

Optional[SequenceMetadata] –

Sequence metadata if found, None otherwise.

get_sequence

get_sequence(digest: str) -> SequenceRecord

Retrieve a sequence record by its digest (SHA-512/24u or MD5).

Loads the sequence data if not already in memory. Supports lookup by either SHA-512/24u (preferred) or MD5 digest. Automatically strips "SQ." prefix if present (case-insensitive).

Parameters:

digest (str) –

Sequence digest (SHA-512/24u base64url or MD5 hex string), optionally with "SQ." prefix.

Returns:

SequenceRecord –

The sequence record with data.

Raises:

KeyError –

If the sequence is not found.

Example::

record = store.get_sequence("aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2")
print(f"Found: {record.metadata.name}")
# Also works with SQ. prefix
record = store.get_sequence("SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2")

get_sequence_by_name

get_sequence_by_name(collection_digest: str, sequence_name: str) -> SequenceRecord

Retrieve a sequence by collection digest and sequence name.

Looks up a sequence within a specific collection using its name (e.g., "chr1", "chrM"). Loads the sequence data if needed. Automatically strips "SQ." prefix from collection digest if present.

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest, optionally with "SQ." prefix.
sequence_name (str) –

Name of the sequence within that collection.

Returns:

SequenceRecord –

The sequence record with data.

Raises:

KeyError –

If the sequence is not found.

Example::

record = store.get_sequence_by_name(
    "uC_UorBNf3YUu1YIDainBhI94CedlNeH",
    "chr1"
)
print(f"Sequence: {record.decode()[:50]}...")

iter_sequences

iter_sequences() -> List[SequenceRecord]

Iterate over all sequences with their data loaded.

This ensures all sequence data is loaded and returns a list of SequenceRecord objects with full sequence data.

For browsing without loading data, use list_sequences() instead.

Returns:

List[SequenceRecord] –

List of all sequences with loaded data.

Example::

for seq in store.iter_sequences():
    print(f"{seq.metadata.name}: {seq.decode()[:20]}...")

get_substring

get_substring(seq_digest: str, start: int, end: int) -> str

Extract a substring from a sequence.

Retrieves a specific region from a sequence using 0-based, half-open coordinates [start, end). Automatically loads sequence data if not already cached (for lazy-loaded stores). Automatically strips "SQ." prefix from digest if present.

Parameters:

seq_digest (str) –

Sequence digest (SHA-512/24u), optionally with "SQ." prefix.
start (int) –

Start position (0-based, inclusive).
end (int) –

End position (0-based, exclusive).

Returns:

str –

The substring sequence.

Raises:

KeyError –

If the sequence is not found.

Example::

# Get first 1000 bases of chr1
seq = store.get_substring("chr1_digest", 0, 1000)
print(f"First 50bp: {seq[:50]}")

stats

stats() -> dict

Returns statistics about the store.

Returns:

dict –

dict with keys: - 'n_sequences': Total number of sequences (Stub + Full) - 'n_sequences_loaded': Number of sequences with data loaded (Full) - 'n_collections': Total number of collections (Stub + Full) - 'n_collections_loaded': Number of collections with sequences loaded (Full) - 'storage_mode': Storage mode ('Raw' or 'Encoded')

Note

n_collections_loaded only reflects collections fully loaded in memory. For remote stores, collections are loaded on-demand when accessed.

Example::

stats = store.stats()
print(f"Store has {stats['n_sequences']} sequences")
print(f"Collections: {stats['n_collections']} total, {stats['n_collections_loaded']} loaded")

write

write() -> None

Write the store using its configured paths.

Convenience method for disk-backed stores. Uses the store's own local_path and seqdata_path_template.

Raises:

IOError –

If the store cannot be written.

write_store_to_dir

write_store_to_dir(root_path: Union[str, PathLike], seqdata_path_template: Optional[str] = None) -> None

Write the store to a directory on disk.

Persists the store with all sequences and metadata to disk using the RefgetStore directory format.

Parameters:

root_path (Union[str, PathLike]) –

Directory path to write the store to.
seqdata_path_template (Optional[str], default: None ) –

Optional path template for sequence files (e.g., "sequences/%s2/%s.seq" where %s2 = first 2 chars of digest, %s = full digest). Uses default if not specified.

Example::

store.write_store_to_dir("/data/my_store")
store.write_store_to_dir("/data/my_store", "sequences/%s2/%s.seq")

get_collection_level1

get_collection_level1(digest: str) -> dict

Get level 1 representation (attribute digests) for a collection.

Parameters:

digest (str) –

Collection digest.

Returns:

dict –

dict with spec-compliant field names (names, lengths, sequences,
dict –

plus optional name_length_pairs, sorted_name_length_pairs, sorted_sequences).

get_collection_level2

get_collection_level2(digest: str) -> dict

Get level 2 representation (full arrays, spec format) for a collection.

Parameters:

digest (str) –

Collection digest.

Returns:

dict –

dict with names (list[str]), lengths (list[int]), sequences (list[str]).

compare

compare(digest_a: str, digest_b: str) -> dict

Compare two collections by digest.

Parameters:

digest_a (str) –

First collection digest.
digest_b (str) –

Second collection digest.

Returns:

dict –

dict with keys: digests, attributes, array_elements.

find_collections_by_attribute

find_collections_by_attribute(attr_name: str, attr_digest: str) -> List[str]

Find collections by attribute digest.

Parameters:

attr_name (str) –

Attribute name (names, lengths, sequences, name_length_pairs, sorted_name_length_pairs, sorted_sequences).
attr_digest (str) –

The digest to search for.

Returns:

List[str] –

List of collection digests that have the matching attribute.

get_attribute

get_attribute(attr_name: str, attr_digest: str) -> Optional[list]

Get attribute array by digest.

Parameters:

attr_name (str) –

Attribute name (names, lengths, or sequences).
attr_digest (str) –

The digest to search for.

Returns:

Optional[list] –

The attribute array, or None if not found.

enable_ancillary_digests

enable_ancillary_digests() -> None

Enable computation of ancillary digests.

disable_ancillary_digests

disable_ancillary_digests() -> None

Disable computation of ancillary digests.

has_ancillary_digests

has_ancillary_digests() -> bool

Returns whether ancillary digests are enabled.

has_attribute_index

has_attribute_index() -> bool

Returns whether the on-disk attribute index is enabled.

enable_attribute_index

enable_attribute_index() -> None

Enable indexed attribute lookup (not yet implemented).

disable_attribute_index

disable_attribute_index() -> None

Disable indexed attribute lookup, using brute-force scan instead.

export_fasta_from_regions

export_fasta_from_regions(collection_digest: str, bed_file_path: Union[str, PathLike], output_file_path: Union[str, PathLike]) -> None

Export sequences from BED file regions to a FASTA file.

Reads a BED file defining genomic regions and exports the sequences for those regions to a FASTA file.

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest.
bed_file_path (Union[str, PathLike]) –

Path to BED file defining regions.
output_file_path (Union[str, PathLike]) –

Path to write the output FASTA file.

Raises:

IOError –

If files cannot be read/written or sequences not found.

Example::

store.export_fasta_from_regions(
    "uC_UorBNf3YUu1YIDainBhI94CedlNeH",
    "regions.bed",
    "output.fa"
)

substrings_from_regions

substrings_from_regions(collection_digest: str, bed_file_path: Union[str, PathLike]) -> List[RetrievedSequence]

Get substrings for BED file regions as a list.

Reads a BED file and returns a list of sequences for each region.

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest.
bed_file_path (Union[str, PathLike]) –

Path to BED file defining regions.

Returns:

List[RetrievedSequence] –

List of retrieved sequence segments.

Raises:

IOError –

If files cannot be read or sequences not found.

Example::

sequences = store.substrings_from_regions(
    "uC_UorBNf3YUu1YIDainBhI94CedlNeH",
    "regions.bed"
)
for seq in sequences:
    print(f"{seq.chrom_name}:{seq.start}-{seq.end}")

export_fasta

export_fasta(collection_digest: str, output_path: Union[str, PathLike], sequence_names: Optional[List[str]] = None, line_width: Optional[int] = None) -> None

Export sequences from a collection to a FASTA file.

Parameters:

collection_digest (str) –

Collection to export from.
output_path (Union[str, PathLike]) –

Path to write FASTA file.
sequence_names (Optional[List[str]], default: None ) –

Optional list of sequence names to export. If None, exports all sequences in the collection.
line_width (Optional[int], default: None ) –

Optional line width for wrapping sequences. If None, uses default of 80.

export_fasta_by_digests

export_fasta_by_digests(seq_digests: List[str], output_path: Union[str, PathLike], line_width: Optional[int] = None) -> None

Export sequences by their digests to a FASTA file.

Parameters:

seq_digests (List[str]) –

List of sequence digests to export.
output_path (Union[str, PathLike]) –

Path to write FASTA file.
line_width (Optional[int], default: None ) –

Optional line width for wrapping sequences. If None, uses default of 80.

add_sequence_alias

add_sequence_alias(namespace: str, alias: str, digest: str) -> None

Add a sequence alias: namespace/alias maps to sequence digest.

get_sequence_metadata_by_alias

get_sequence_metadata_by_alias(namespace: str, alias: str) -> Optional[SequenceMetadata]

Resolve a sequence alias to sequence metadata (no data loading).

get_sequence_by_alias

get_sequence_by_alias(namespace: str, alias: str) -> Optional[SequenceRecord]

Resolve a sequence alias and return the loaded sequence record.

Returns None if the alias is not found.

get_aliases_for_sequence

get_aliases_for_sequence(digest: str) -> list[tuple[str, str]]

Reverse lookup: find all (namespace, alias) pairs pointing to this sequence digest.

list_sequence_alias_namespaces

list_sequence_alias_namespaces() -> list[str]

List all sequence alias namespaces.

list_sequence_aliases

list_sequence_aliases(namespace: str) -> Optional[list[str]]

List all aliases in a sequence alias namespace.

remove_sequence_alias

remove_sequence_alias(namespace: str, alias: str) -> bool

Remove a single sequence alias. Returns True if it existed.

load_sequence_aliases

load_sequence_aliases(namespace: str, path: str) -> int

Load sequence aliases from a TSV file (alias\tdigest per line).

add_collection_alias

add_collection_alias(namespace: str, alias: str, digest: str) -> None

Add a collection alias: namespace/alias maps to collection digest.

get_collection_metadata_by_alias

get_collection_metadata_by_alias(namespace: str, alias: str) -> Optional[SequenceCollectionMetadata]

Resolve a collection alias to collection metadata (no data loading).

get_collection_by_alias

get_collection_by_alias(namespace: str, alias: str) -> Optional[SequenceCollection]

Resolve a collection alias and return the loaded collection.

Returns None if the alias is not found.

get_aliases_for_collection

get_aliases_for_collection(digest: str) -> list[tuple[str, str]]

Reverse lookup: find all (namespace, alias) pairs pointing to this collection digest.

list_collection_alias_namespaces

list_collection_alias_namespaces() -> list[str]

List all collection alias namespaces.

list_collection_aliases

list_collection_aliases(namespace: str) -> Optional[list[str]]

List all aliases in a collection alias namespace.

remove_collection_alias

remove_collection_alias(namespace: str, alias: str) -> bool

Remove a single collection alias. Returns True if it existed.

load_collection_aliases

load_collection_aliases(namespace: str, path: str) -> int

Load collection aliases from a TSV file (alias\tdigest per line).

set_fhr_metadata

set_fhr_metadata(collection_digest: str, metadata: FhrMetadata) -> None

Set FHR metadata for a collection.

get_fhr_metadata

get_fhr_metadata(collection_digest: str) -> Optional[FhrMetadata]

Get FHR metadata for a collection. Returns None if missing.

remove_fhr_metadata

remove_fhr_metadata(collection_digest: str) -> bool

Remove FHR metadata for a collection.

list_fhr_metadata

list_fhr_metadata() -> list[str]

List all collection digests that have FHR metadata.

load_fhr_metadata

load_fhr_metadata(collection_digest: str, path: str) -> None

Load FHR metadata from a JSON file and attach it to a collection.

into_readonly

into_readonly() -> ReadonlyRefgetStore

Convert to a ReadonlyRefgetStore for concurrent read access.

Consumes this store (replacing it with an empty in-memory store) and returns a ReadonlyRefgetStore whose read methods all use &self (no mutable borrow), making it suitable for Arc<ReadonlyRefgetStore> in servers.

Call load_all_collections() or load_collection() before converting, since ReadonlyRefgetStore cannot lazy-load.

Returns:

ReadonlyRefgetStore ( ReadonlyRefgetStore ) –

An immutable store suitable for concurrent access.

Example::

store = RefgetStore.open_remote("/cache", "https://example.com")
store.load_all_collections()
readonly = store.into_readonly()
coll = readonly.get_collection("abc123")

len

__len__() -> int

iter

__iter__() -> Iterator[SequenceMetadata]

str

__str__() -> str

repr

__repr__() -> str

ReadonlyRefgetStore

An immutable RefgetStore for concurrent read access.

All read methods use immutable references, making this suitable for concurrent access patterns (e.g., shared across threads in a server).

This type has NO write methods and NO constructors -- it is only obtainable via RefgetStore.into_readonly().

Read methods that require preloaded data (e.g., get_collection()) will error if the data was not loaded before conversion.

Attributes:

cache_path (Optional[str]) –

Local directory path where the store is located or cached. None for in-memory stores.
remote_url (Optional[str]) –

Remote URL of the store if loaded remotely, None otherwise.
storage_mode (StorageMode) –

Current storage mode (Raw or Encoded).

Example::

store = RefgetStore.open_remote("/cache", "https://example.com")
store.load_all_collections()
readonly = store.into_readonly()
coll = readonly.get_collection("abc123")

Attributes

cache_path `instance-attribute`

cache_path: Optional[str]

remote_url `instance-attribute`

remote_url: Optional[str]

storage_mode `property`

storage_mode: StorageMode

Current storage mode (Raw or Encoded).

Functions

list_collections

list_collections(page: int = 0, page_size: int = 100, filters: Optional[Dict[str, str]] = None) -> Dict[str, Any]

List collections with pagination and optional attribute filtering.

get_collection_metadata

get_collection_metadata(collection_digest: str) -> Optional[SequenceCollectionMetadata]

Get metadata for a collection by digest.

get_collection

get_collection(collection_digest: str) -> SequenceCollection

Get a collection by digest with all sequences loaded.

Requires that the collection was preloaded before conversion.

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest.

Returns:

SequenceCollection –

The collection with all sequence data loaded.

Raises:

IOError –

If the collection was not preloaded.

is_collection_loaded

is_collection_loaded(collection_digest: str) -> bool

Check if a collection is fully loaded.

get_collection_level1

get_collection_level1(digest: str) -> dict

Get level 1 representation (attribute digests) for a collection.

get_collection_level2

get_collection_level2(digest: str) -> dict

Get level 2 representation (full arrays) for a collection.

compare

compare(digest_a: str, digest_b: str) -> dict

Compare two collections by digest.

find_collections_by_attribute

find_collections_by_attribute(attr_name: str, attr_digest: str) -> List[str]

Find collections by attribute digest.

get_attribute

get_attribute(attr_name: str, attr_digest: str) -> Optional[list]

Get attribute array by digest.

has_ancillary_digests

has_ancillary_digests() -> bool

Returns whether ancillary digests are enabled.

has_attribute_index

has_attribute_index() -> bool

Returns whether the on-disk attribute index is enabled.

list_sequences

list_sequences() -> List[SequenceMetadata]

List all sequence metadata in the store.

get_sequence_metadata

get_sequence_metadata(seq_digest: str) -> Optional[SequenceMetadata]

Get metadata for a sequence by digest.

get_sequence

get_sequence(digest: str) -> SequenceRecord

Retrieve a sequence record by its digest.

Parameters:

digest (str) –

Sequence digest (SHA-512/24u or MD5).

Returns:

SequenceRecord –

The sequence record with data.

Raises:

KeyError –

If the sequence is not found.

get_sequence_by_name

get_sequence_by_name(collection_digest: str, sequence_name: str) -> SequenceRecord

Retrieve a sequence by collection digest and sequence name.

Parameters:

collection_digest (str) –

The collection's SHA-512/24u digest.
sequence_name (str) –

Name of the sequence within that collection.

Returns:

SequenceRecord –

The sequence record with data.

Raises:

KeyError –

If the sequence is not found.

get_substring

get_substring(seq_digest: str, start: int, end: int) -> str

Extract a substring from a sequence.

Parameters:

seq_digest (str) –

Sequence digest (SHA-512/24u).
start (int) –

Start position (0-based, inclusive).
end (int) –

End position (0-based, exclusive).

Returns:

str –

The substring sequence.

Raises:

KeyError –

If the sequence is not found.

stats

stats() -> dict

Returns statistics about the store.

get_sequence_metadata_by_alias

get_sequence_metadata_by_alias(namespace: str, alias: str) -> Optional[SequenceMetadata]

Resolve a sequence alias to sequence metadata.

get_sequence_by_alias

get_sequence_by_alias(namespace: str, alias: str) -> Optional[SequenceRecord]

Resolve a sequence alias and return the loaded sequence record.

get_aliases_for_sequence

get_aliases_for_sequence(digest: str) -> list[tuple[str, str]]

Reverse lookup: find all (namespace, alias) pairs for this sequence.

list_sequence_alias_namespaces

list_sequence_alias_namespaces() -> list[str]

List all sequence alias namespaces.

list_sequence_aliases

list_sequence_aliases(namespace: str) -> Optional[list[str]]

List all aliases in a sequence alias namespace.

get_collection_metadata_by_alias

get_collection_metadata_by_alias(namespace: str, alias: str) -> Optional[SequenceCollectionMetadata]

Resolve a collection alias to collection metadata.

get_collection_by_alias

get_collection_by_alias(namespace: str, alias: str) -> Optional[SequenceCollection]

Resolve a collection alias and return the loaded collection.

get_aliases_for_collection

get_aliases_for_collection(digest: str) -> list[tuple[str, str]]

Reverse lookup: find all (namespace, alias) pairs for this collection.

list_collection_alias_namespaces

list_collection_alias_namespaces() -> list[str]

List all collection alias namespaces.

list_collection_aliases

list_collection_aliases(namespace: str) -> Optional[list[str]]

List all aliases in a collection alias namespace.

get_fhr_metadata

get_fhr_metadata(collection_digest: str) -> Optional[FhrMetadata]

Get FHR metadata for a collection.

list_fhr_metadata

list_fhr_metadata() -> list[str]

List all collection digests that have FHR metadata.

len

__len__() -> int

str

__str__() -> str

repr

__repr__() -> str

Functions

sha512t24u_digest

sha512t24u_digest(readable: Union[str, bytes]) -> str

Compute the GA4GH SHA-512/24u digest for a sequence.

This function computes the GA4GH refget standard digest (truncated SHA-512, base64url encoded) for a given sequence string or bytes.

Parameters:

readable (Union[str, bytes]) –

Input sequence as str or bytes.

Returns:

str –

The SHA-512/24u digest (32 character base64url string).

Raises:

TypeError –

If input is not str or bytes.

Example:: from gtars.refget import sha512t24u_digest digest = sha512t24u_digest("ACGT") print(digest) # Output: 'aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2'

md5_digest

md5_digest(readable: Union[str, bytes]) -> str

Compute the MD5 digest for a sequence.

This function computes the MD5 hash for a given sequence string or bytes. MD5 is supported for backward compatibility with legacy systems.

Parameters:

readable (Union[str, bytes]) –

Input sequence as str or bytes.

Returns:

str –

The MD5 digest (32 character hexadecimal string).

Raises:

TypeError –

If input is not str or bytes.

Example:: from gtars.refget import md5_digest digest = md5_digest("ACGT") print(digest) # Output: 'f1f8f4bf413b16ad135722aa4591043e'

digest_fasta

digest_fasta(fasta: Union[str, PathLike]) -> SequenceCollection

Digest all sequences in a FASTA file and compute collection-level digests.

This function reads a FASTA file and computes GA4GH-compliant digests for each sequence, as well as collection-level digests (Level 1 and Level 2) following the GA4GH refget specification.

Parameters:

fasta (Union[str, PathLike]) –

Path to FASTA file (str or PathLike).

Returns:

SequenceCollection –

Collection containing all sequences with their metadata and computed digests.

Raises:

IOError –

If the FASTA file cannot be read or parsed.

Example:: from gtars.refget import digest_fasta collection = digest_fasta("genome.fa") print(f"Collection digest: {collection.digest}") print(f"Number of sequences: {len(collection)}")

compute_fai

compute_fai(fasta: Union[str, PathLike]) -> List[FaiRecord]

Compute FASTA index (FAI) metadata for all sequences in a FASTA file.

This function computes the FAI index metadata (offset, line_bases, line_bytes) for each sequence in a FASTA file, compatible with samtools faidx format. Only works with uncompressed FASTA files.

Parameters:

fasta (Union[str, PathLike]) –

Path to FASTA file (str or PathLike). Must be uncompressed.

Returns:

List[FaiRecord] –

List of FAI records, one per sequence, containing name, length,
List[FaiRecord] –

and FAI metadata (offset, line_bases, line_bytes).

Raises:

IOError –

If the FASTA file cannot be read or is compressed.

Example:: from gtars.refget import compute_fai fai_records = compute_fai("genome.fa") for record in fai_records: print(f"{record.name}: {record.length} bp")

load_fasta

load_fasta(fasta: Union[str, PathLike]) -> SequenceCollection

Load a FASTA file with sequence data into a SequenceCollection.

This function reads a FASTA file and loads all sequences with their data into memory. Unlike digest_fasta(), this includes the actual sequence data, not just metadata.

Parameters:

fasta (Union[str, PathLike]) –

Path to FASTA file (str or PathLike).

Returns:

SequenceCollection –

Collection containing all sequences with their metadata and sequence data loaded.

Raises:

IOError –

If the FASTA file cannot be read or parsed.

Example:: from gtars.refget import load_fasta collection = load_fasta("genome.fa") first_seq = collection[0] print(f"Sequence: {first_seq.decode()[:50]}...")

digest_sequence

digest_sequence(data: bytes, name: Optional[str] = None, description: Optional[str] = None) -> SequenceRecord

Create a SequenceRecord from raw data, computing all metadata.

This is the sequence-level parallel to digest_fasta() for collections. It computes the GA4GH sha512t24u digest, MD5 digest, detects the alphabet, and returns a SequenceRecord with computed metadata and the original data.

The input data is automatically uppercased to ensure consistent digest computation (matching FASTA processing behavior).

Parameters:

data (bytes) –

The raw sequence bytes (e.g., b"ACGTACGT").
name (Optional[str], default: None ) –

Optional sequence name (e.g., "chr1"). Defaults to "" if not provided.
description (Optional[str], default: None ) –

Optional description text for the sequence.

Returns:

SequenceRecord –

A SequenceRecord with computed metadata and the original data (uppercased).

Example:: from gtars.refget import digest_sequence seq = digest_sequence(b"ACGTACGT") print(seq.metadata.length) # Output: 8

seq = digest_sequence(b"ACGT", name="chr1")
print(seq.metadata.name, seq.metadata.length)
# Output: chr1 4

# With description
seq2 = digest_sequence(b"ACGT", name="chr1", description="Chromosome 1")
print(seq2.metadata.description)
# Output: Chromosome 1

Refget Module API Reference

refget

Classes

AlphabetType

Attributes

Dna2bit instance-attribute

Dna3bit instance-attribute

DnaIupac instance-attribute

Protein instance-attribute

Ascii instance-attribute

Unknown instance-attribute

Functions

__str__

FaiMetadata

Attributes

offset instance-attribute

line_bases instance-attribute

line_bytes instance-attribute

Functions

__repr__

__str__

FaiRecord

Attributes

name instance-attribute

length instance-attribute

fai instance-attribute

Functions

__repr__

__str__

SequenceMetadata

Attributes

name instance-attribute

description instance-attribute

length instance-attribute

sha512t24u instance-attribute

md5 instance-attribute

alphabet instance-attribute

fai instance-attribute

Functions

__repr__

__str__

SequenceRecord

Attributes

metadata instance-attribute

sequence instance-attribute

is_loaded property

Functions

decode

__repr__

__str__

SeqColDigestLvl1

Attributes

sequences_digest instance-attribute

names_digest instance-attribute

lengths_digest instance-attribute

Functions

__repr__

__str__

SequenceCollectionMetadata

Attributes

digest instance-attribute

n_sequences instance-attribute

names_digest instance-attribute

sequences_digest instance-attribute

lengths_digest instance-attribute

name_length_pairs_digest instance-attribute

sorted_name_length_pairs_digest instance-attribute

sorted_sequences_digest instance-attribute

Functions

__repr__

__str__

SequenceCollection

Attributes

sequences instance-attribute

digest instance-attribute

lvl1 instance-attribute

file_path instance-attribute

Functions

write_fasta

__len__

Dna2bit `instance-attribute`

Dna3bit `instance-attribute`

DnaIupac `instance-attribute`

Protein `instance-attribute`

Ascii `instance-attribute`

Unknown `instance-attribute`

str

offset `instance-attribute`

line_bases `instance-attribute`

line_bytes `instance-attribute`

repr

str

name `instance-attribute`

length `instance-attribute`

fai `instance-attribute`

repr

str

name `instance-attribute`

description `instance-attribute`

length `instance-attribute`

sha512t24u `instance-attribute`

md5 `instance-attribute`

alphabet `instance-attribute`

fai `instance-attribute`

repr

str

metadata `instance-attribute`

sequence `instance-attribute`

is_loaded `property`

repr

str

sequences_digest `instance-attribute`

names_digest `instance-attribute`

lengths_digest `instance-attribute`

repr

str

digest `instance-attribute`

n_sequences `instance-attribute`

names_digest `instance-attribute`

sequences_digest `instance-attribute`

lengths_digest `instance-attribute`

name_length_pairs_digest `instance-attribute`

sorted_name_length_pairs_digest `instance-attribute`

sorted_sequences_digest `instance-attribute`

repr

str

sequences `instance-attribute`

digest `instance-attribute`

lvl1 `instance-attribute`

file_path `instance-attribute`

len

getitem

iter

repr

str

sequence `instance-attribute`

chrom_name `instance-attribute`

start `instance-attribute`

end `instance-attribute`

init

repr

str

Raw `instance-attribute`

Encoded `instance-attribute`

genome `instance-attribute`

version `instance-attribute`

masking `instance-attribute`

genome_synonym `instance-attribute`

voucher_specimen `instance-attribute`

documentation `instance-attribute`

identifier `instance-attribute`

scholarly_article `instance-attribute`

funding `instance-attribute`

init

from_json `staticmethod`

repr

cache_path `instance-attribute`

remote_url `instance-attribute`

quiet `property`

storage_mode `property`