How to assess universe fit to collection of BED files
Introduction
If you have a potential universe file, and a collection of BED files, this module will help you assess the fit of the proposed universe to your collection of BED files. We can assess fit either from CLI, or from within Python.
Command-line usage
Both overlap and distance based assessments can be run using: geniml assess ...
with appropriate flags.
geniml assess --assessment-method1 \
--assessment-method2 \
--...
--raw-data-folder tests/consesnus/raw/ \
--file-list tests/consesnus/file_list.txt \
--universe tests/consenus/universe/universe.bed \
--save-to-file \
--folder-out tests/consesnus/results/intersection/ \
--pref test \
--no-workers 1
--raw-data-folder
, takes the path to folder with files from the collection--file-list
, takes the path to file with list of files--universe
, takes the path to file with the assessed universe--save-to-file
, is a flag that specifies whether to out put table with each row containing file name and results of chosen metrics--folder-out
, takes the path to folder in which put the output file--pref
, takes a prefix of output file name--no-workers
, takes the number of workers that should be used--save-each
, is a flag that specifies whether to save between the closest peaks to file
Base-level overlap measure
First test checks how much of our file is present in the universe and how much additional information is present in the universe. We can check that by adding --overlap
to geniml assess ...
. In the result files it will output columns with: number of bp in universe but not in file, number of bp in file but not the universe, and number of bp both in universe and file.
We can also use it directly from Python like this:
from geniml.assess.intersection import run_intersection
run_intersection("test/consensus/raw/",
"tests/consensus/file_list.txt",
"tests/consensus/universe/universe.bed",
no_workers=1)
Or, we can calculate F10 score of the universe using:
from geniml.assess.intersection import get_f_10_score
get_f_10_score("test/consensus/raw/",
"tests/consensus/file_list.txt",
"tests/consensus/universe/universe.bed",
no_workers=1)
Region boundary distance measure
Next, we can calculate the distance between query and universe. To do that we can choose from :
- distance
- calculates distance from region in query to the nearest region in the universe
- distance-universe-to-file
- calculates distance from region in query to the nearest region in the universe accounting for universe flexibility
- distance-flexible
- calculates distance from region in universe to the nearest region in the query
- distance-flexible-universe-to-file
- calculates distance from region in universe to the nearest region in the query accounting for universe flexibility
All presented distance measures can be done using python, which will result in matrix where first column is file names and the second one is median of distances.
from geniml.assess.distance import run_distance
d_median = run_distance("tests/consensus/raw",
"tests/consensus/file_list.txt",
"tests/consensus/universe/universe.bed",
npool=2)
from geniml.assess.distance import get_closeness_score
closeness_score = get_closeness_score("tests/consensus/raw",
"tests/consensus/file_list.txt",
"tests/consensus/universe/universe.bed",
no_workers=2)
Universe likelihood
We can also calculate the likelihood of universe given collection of file. For that we will need likelihood model. We can do it either for hard universe:
from geniml.assess.likelihood import hard_universe_likelihood
lh_hard = hard_universe_likelihood("tests/consensus/lh_model.tar",
"tests/consensus/universe/universe.bed",
"tests/consensus/coverage", "all")
or with taking into account universe flexibility:
from geniml.assess.likelihood import likelihood_flexible_universe
lh_flexible = likelihood_flexible_universe("tests/consensus/lh_model.tar",
"tests/consensus/universe/universe.bed",
"tests/consensus/coverage", "all")