Skip to content

BED classifier tutorial

BED classifier is a utility that allows you to classify BED files based on the number of columns and the types of data contained within those columns.

Information on various data formats can be found here:

Additional detailed specifications for the Browser Extensible Data (BED) format can be found here:


The function, get_bed_classification, takes a path to bed-like file or a dataframe and returns a BedClassification object with the following attributes:

class BedClassification(BaseModel):
    bed_compliance: str
    data_format: DATA_FORMAT
    compliant_columns: int
    non_compliant_columns: int

where DATA_FORMAT is defined as:

class DATA_FORMAT(str, Enum):
    UNKNOWN = "unknown_data_format"
    UCSC_BED = "ucsc_bed"
    BED_RS = "bed_rs"
    BED_LIKE = "bed_like"
    BED_LIKE_RS = "bed_like_rs"
    ENCODE_NARROWPEAK = "encode_narrowpeak"
    ENCODE_NARROWPEAK_RS = "encode_narrowpeak_rs"
    ENCODE_BROADPEAK = "encode_broadpeak"
    ENCODE_BROADPEAK_RS = "encode_broadpeak_rs"
    ENCODE_GAPPEDPEAK = "encode_gappedpeak"
    ENCODE_GAPPEDPEAK_RS = "encode_gappedpeak_rs"
    ENCODE_RNA_ELEMENTS = "encode_rna_elements"
    ENCODE_RNA_ELEMENTS_RS = "encode_rna_elements_rs"

Example usage of the BED classifier:

from bedboss.bedclassifier.bedclassifier import get_bed_classification

classification = get_bed_classification("path/to/bedfile.bed")

print(f"{classification.bed_compliance}, {classification.data_format}, {classification.compliant_columns}, {classification.non_compliant_columns}")

## Example 1
## > 'bed3+0', 'ucsc_bed', 3, 0

## Example 2
## > 'bed6+4', 'encode_narrowpeak', 6, 4

Data formats

Below rs refers to relaxed_score which indicates that a fifth column was present where the values are integers greater than 0. In constrast, a strict interpretation for column 5 is:

Column 5 - score - A score between 0 and 1000.


Classification was unable to determine the data format.


Conforms to ucsc bed


Conforms to ucsc bed but with a relaxed interpretation for the fifth column.


Data is tab delimited but contains columns that are not compliant with ucsc bed. Example: bedn+m where n are compliant columns, m are non-compliant columns and m > 0


Data is tab delimited but contains columns that are not compliant with ucsc bed but with a relaxed interpretation for the fifth column.

Example: bedn+m where n are compliant columns, m are non-compliant columns and m > 0, Column 5 = integer > 0


Conforms to ENCODE narrowPeak


Conforms to ENCODE narrowPeak but with a relaxed interpretation for the fifth column.


Conforms to ENCODE broadPeak


Conforms to ENCODE broadPeak but with a relaxed interpretation for the fifth column.


Conforms to ENCODE gappedPeak


Conforms to ENCODE gappedPeak but with a relaxed interpretation for the fifth column.


Conforms to ENCODE RNA elements


Conforms to ENCODE RNA elements but with a relaxed interpretation for the fifth column.