Skip to content

BBClient practical example

This tutorial walks through a practical Python workflow using bbclient to download, cache, and work with BED files from BEDbase.

Installation

pip install geniml

1. Create a BBClient instance

from geniml.bbclient import BBClient

# Use the default cache folder (~/.bbcache), or specify a custom path
bbc = BBClient(cache_folder="my_cache")

2. Load a BED file from BEDbase

load_bed checks the local cache first. If the file is not cached, it downloads it from BEDbase, caches it, and returns a RegionSet object.

bed_id = "dcc005e8761ad5599545cc538f6a2a4d"

regionset = bbc.load_bed(bed_id)
print(regionset)
RegionSet with 42193 regions.

3. Inspect the RegionSet

Once you have a RegionSet, you can inspect its contents and compute basic statistics.

# Number of regions
print(len(regionset))

# Mean width of regions
print(regionset.mean_region_width())

# Total nucleotide length covered
print(regionset.get_nucleotide_length())

# Last base pair position per chromosome
print(regionset.get_max_end_per_chr())

# Unique identifier (digest) of the region set
print(regionset.identifier)

4. Iterate over regions

You can iterate over individual regions in the RegionSet:

for region in regionset:
    print(region.chr, region.start, region.end)
    break  # print just the first region
chr1 778544 778794

5. Find the cached file path

After loading, the file is stored locally. Use seek to get its path:

path = bbc.seek(bed_id)
print(path)
/home/user/.bbcache/bedfiles/d/c/dcc005e8761ad5599545cc538f6a2a4d.bed.gz

6. Save the RegionSet to a local file

# Save as plain BED
regionset.to_bed("output.bed")

# Save as gzipped BED
regionset.to_bed_gz("output.bed.gz")

7. Cache a local BED file

If you have a BED file on disk and want to add it to the cache:

local_id = bbc.add_bed_to_cache("path/to/local.bed.gz")
print(local_id)  # returns the computed identifier

8. Remove a file from cache

bbc.remove_bedfile_from_cache(bed_id)

Full example

from geniml.bbclient import BBClient

# Initialize client
bbc = BBClient(cache_folder="my_cache")

bed_id = "dcc005e8761ad5599545cc538f6a2a4d"

# Download from BEDbase and cache locally
regionset = bbc.load_bed(bed_id)

# Inspect
print(f"Regions:          {len(regionset)}")
print(f"Mean width:       {regionset.mean_region_width():.1f} bp")
print(f"Total coverage:   {regionset.get_nucleotide_length()} bp")
print(f"Identifier:       {regionset.identifier}")

# Find cached file
print(f"Cached at:        {bbc.seek(bed_id)}")

# Save locally
regionset.to_bed_gz("dcc005e8761ad5599545cc538f6a2a4d.bed.gz")

Info

Full bbclient reference is available in the BBClient documentation.