Assess Module
assess
Functions
Modules
assess
Functions
run_all_assessment_methods
run_all_assessment_methods(raw_data_folder, file_list, universe, no_workers, folder_out, pref, save_each, overlap=False, distance_f_t_u=False, distance_f_t_u_flex=False, distance_u_t_f=False, distance_u_t_f_flex=False)
Assess universe fit to collection using overlap and distance metrics
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raw_data_folder
|
str
|
path to raw files from the collection |
required |
file_list
|
str
|
path to file with list of files in the collection |
required |
universe
|
str
|
path to universe that is being assessed |
required |
no_workers
|
int
|
number of workers for multiprocessing |
required |
folder_out
|
str
|
output folder |
required |
pref
|
str
|
prefixed used for creating output files |
required |
save_each
|
bool
|
if save output of distance metrics for each region |
required |
overlap
|
bool
|
if calculate overlap metrics |
False
|
distance_f_t_u
|
bool
|
if calculate distance from file to universe metrics |
False
|
distance_f_t_u_flex
|
bool
|
if calculate flexible distance from file to universe metrics |
False
|
distance_u_t_f
|
bool
|
if calculate distance from universes to file metrics |
False
|
distance_u_t_f_flex
|
bool
|
if calculate flexible distance from universes to file metrics |
False
|
get_rbs
get_rbs(f_t_u, u_t_f)
Calculate RBS
get_mean_rbs
get_mean_rbs(folder, file_list, universe, no_workers, flexible=False)
Calculate average RBS of the collection
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
path to folder with the collection |
required |
file_list
|
str
|
path to file with list of files in the collection |
required |
universe
|
str
|
path to the universe |
required |
no_workers
|
int
|
number of workers for multiprocessing |
required |
flexible
|
bool
|
if to calculate flexible version of the metric |
False
|
Returns:
Type | Description |
---|---|
average RBS |
get_rbs_from_assessment_file
get_rbs_from_assessment_file(file, cs_each_file=False, flexible=False)
Calculate RBS form file with results of metrics per file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str
|
path to file with assessment results |
required |
cs_each_file
|
bool
|
if report RBS for each file, not average for the collection |
False
|
flexible
|
bool
|
if use flexible version of the metric |
False
|
get_f_10_score
get_f_10_score(folder, file_list, universe, no_workers)
Get F10 score for a universes and collection of files
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
path to folder with the collection |
required |
file_list
|
str
|
path to file with list of files in the collection |
required |
universe
|
str
|
path to the universe |
required |
no_workers
|
int
|
number of workers for multiprocessing |
required |
Returns:
Type | Description |
---|---|
average F10 score |
get_f_10_score_from_assessment_file
get_f_10_score_from_assessment_file(file, f10_each_file=False)
Get F10 score from assessment output file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str
|
path to file with assessment results |
required |
f10_each_file
|
bool
|
if report F10 for each file, not average for the collection |
False
|
get_likelihood
get_likelihood(model_file, universe, cove_folder, cove_prefix='all', flexible=False, save_peak_input=False)
Calculate universe likelihood given collection
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_file
|
str
|
path to file with likelihood model |
required |
universe
|
str
|
path to the universe |
required |
cove_folder
|
str
|
path to the coverage folder |
required |
cove_prefix
|
str
|
prefixed used for generating coverage |
'all'
|
flexible
|
bool
|
if to calculate flexible likelihood |
False
|
save_peak_input
|
bool
|
if to save likelihood input of each region |
False
|
Returns:
Type | Description |
---|---|
|
filter_universe
filter_universe(universe, universe_filtered, min_size=0, min_coverage=0, filter_lh=False, model_file=None, cove_folder=None, cove_prefix=None, lh_cutoff=0)
Filter universe by region size, coverage by collection, likelihood
Parameters:
Name | Type | Description | Default |
---|---|---|---|
universe
|
str
|
path to input universe |
required |
universe_filtered
|
str
|
path to output filtered universe |
required |
min_size
|
int
|
minimum size of the region in the output universe |
0
|
min_coverage
|
int
|
minimum number coverage of universe region by collection |
0
|
filter_lh
|
bool
|
if use likelihood to filter universe |
False
|
model_file
|
str
|
path to collection likelihood model |
None
|
cove_folder
|
str
|
path to folder with coverage tracks |
None
|
cove_prefix
|
str
|
prefixed used for creating tracks |
None
|
lh_cutoff
|
int
|
minimum likelihood input |
0
|
cli
Functions
build_subparser
build_subparser(parser)
Builds argument parser.
Returns:
Type | Description |
---|---|
Argument parser |
distance
Functions
flexible_distance_between_two_regions
flexible_distance_between_two_regions(region, query)
Calculate distance between region and flexible region from flexible universe
Parameters:
Name | Type | Description | Default |
---|---|---|---|
region
|
region from flexible universe |
required | |
query
|
int
|
analyzed region |
required |
Returns:
Type | Description |
---|---|
distance |
distance_between_two_regions
distance_between_two_regions(region, query)
Calculate distance between region in database and region from the query
Parameters:
Name | Type | Description | Default |
---|---|---|---|
region
|
[int]
|
region from hard universe |
required |
query
|
int
|
analysed region |
required |
Returns:
Type | Description |
---|---|
distance |
distance_to_closest_region
distance_to_closest_region(db, db_queue, i, current_chrom, unused_db, pos_index, flexible, uni_to_file)
Calculate distance from given peak to the closest region in database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db
|
file
|
database file |
required |
db_queue
|
list
|
queue of three last positions in database |
required |
i
|
analyzed position from the query |
required | |
current_chrom
|
str
|
current analyzed chromosome from query |
required |
unused_db
|
list
|
list of positions from universe that were not compared to query |
required |
pos_index
|
list
|
which indexes from universe region use to calculate distance |
required |
flexible
|
bool
|
whether the universe if flexible |
required |
uni_to_file
|
bool
|
whether calculate distance from universe to file |
required |
Returns:
Type | Description |
---|---|
peak distance to universe |
read_in_new_universe_regions
read_in_new_universe_regions(db, q_chrom, current_chrom, unused_db, db_queue, waiting, pos_index)
Read in new universe regions closest to the peak
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db
|
file
|
universe file |
required |
q_chrom
|
str
|
new peak's chromosome |
required |
current_chrom
|
str
|
chromosome that was analyzed so far |
required |
unused_db
|
list
|
list of positions from universe that were not compared to query |
required |
db_queue
|
list
|
que of three last positions in universe |
required |
waiting
|
bool
|
whether iterating through file, without calculating distance, if present chromosome not present in universe |
required |
pos_index
|
list
|
which indexes from universe region use to calculate distance |
required |
Returns:
Type | Description |
---|---|
if iterating through chromosome not present in universe; current chromosome in query |
calc_distance_between_two_files
calc_distance_between_two_files(universe, q_folder, q_file, flexible, save_each, folder_out, pref, uni_to_file=False)
Maine function for calculating distance between regions in file query to regions in database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
universe
|
str
|
path to universe |
required |
q_folder
|
str
|
path to folder containing query files |
required |
q_file
|
str
|
query file |
required |
flexible
|
boolean
|
whether the universe if flexible |
required |
save_each
|
bool
|
whether to save calculated distances for each file |
required |
folder_out
|
str
|
output folder |
required |
pref
|
str
|
prefix used as the name of the folder containing calculated distance for each file |
required |
uni_to_file
|
whether to calculate distance from universe to file |
False
|
Returns:
Type | Description |
---|---|
file name; median od distance of starts to starts in universe; median od distance of ends to ends in universe |
run_distance
run_distance(folder, file_list, universe, no_workers, flexible=False, folder_out=None, pref=None, save_each=False, uni_to_file=False)
For group of files calculate distance to the nearest region in universe
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
path to folder containing query files |
required |
file_list
|
str
|
path to file containing list of query files |
required |
universe
|
str
|
path to universe file |
required |
no_workers
|
int
|
number of parallel processes |
required |
flexible
|
bool
|
whether the universe if flexible |
False
|
folder_out
|
str
|
output folder |
None
|
pref
|
str
|
prefix used for saving |
None
|
save_each
|
bool
|
whether to save calculated distances for each file |
False
|
uni_to_file
|
whether to calculate distance from universe to file |
False
|
Returns:
Type | Description |
---|---|
mean of median distances from starts in query to the nearest starts in universe; mean of median distances from ends in query to the nearest ends in universe |
intersection
Functions
chrom_cmp
chrom_cmp(a, b)
Return smaller chromosome name
relationship_helper
relationship_helper(region_a, region_b, only_in, overlap)
For two region calculate their overlap; for earlier region calculate how many base pair only in it
Parameters:
Name | Type | Description | Default |
---|---|---|---|
region_a
|
region that starts first |
required | |
region_b
|
region that starts second |
required | |
only_in
|
int
|
number of positions only in a so far |
required |
overlap
|
int
|
number of overlapping so far |
required |
two_region_intersection_diff
two_region_intersection_diff(region_d, region_q, only_in_d, only_in_q, inside_d, inside_q, overlap, start_d, start_q, waiting_d, waiting_q)
Check mutual position of two regions and calculate intersection and difference of two regions
Parameters:
Name | Type | Description | Default |
---|---|---|---|
region_d
|
list
|
region from universe |
required |
region_q
|
list
|
region from query |
required |
only_in_d
|
int
|
number of base pair only in universe |
required |
only_in_q
|
int
|
number of base pair only in query |
required |
inside_d
|
bool
|
whether there is still part of the region from universe to analyse |
required |
inside_q
|
bool
|
whether there is still part of the region from query to analyse |
required |
overlap
|
int
|
size of overlap |
required |
start_d
|
int
|
start position of currently analyzed universe region |
required |
start_q
|
int
|
start position of currently analyzed query region |
required |
waiting_d
|
bool
|
whether waiting for the query to finish chromosome |
required |
waiting_q
|
bool
|
whether waiting for the universe to finish chromosome |
required |
read_in_new_line
read_in_new_line(region, start, chrom, inside, waiting, lines, c_chrom, not_e)
Read in a new line from query or universe file
calc_diff_intersection
calc_diff_intersection(db, folder, query)
Difference and overlap of two files on base pair level
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db
|
str
|
path to universe file |
required |
folder
|
str
|
path to folder with query file |
required |
query
|
str
|
query file name |
required |
Returns:
Type | Description |
---|---|
file name; bp only in universe; bp only in query; overlap in bp |
run_intersection
run_intersection(folder, file_list, universe, no_workers)
Calculate the base pair intersection of universe and group of files
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
path to folder containing query files |
required |
file_list
|
str
|
path to file containing list of query files |
required |
universe
|
str
|
path to universe file |
required |
no_workers
|
int
|
number of parallel processes |
required |
save_to_file
|
str
|
whether to save median of calculated distances for each file |
required |
folder_out
|
str
|
output folder |
required |
pref
|
str
|
prefix used for saving |
required |
Returns:
Type | Description |
---|---|
mean of fractions of intersection of file and universe divided by universe size; mean of fractions of intersection of file and universe divided by file size |
likelihood
Classes
LhModel
LhModel(model, cove)
Object with combined information about lh model and coverage
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
ndarray
|
lh model array |
required |
cove
|
ndarray
|
coverage array |
required |
Functions
calc_likelihood_hard
calc_likelihood_hard(universe, chroms, model_lh, coverage_folder, coverage_prefix, name, s_index, e_index=None)
Calculate likelihood of universe for given type of model To be used with binomial model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
universe
|
path to universe file |
required | |
chroms
|
list
|
list of chromosomes present in model |
required |
model_lh
|
ModelLH
|
likelihood model |
required |
coverage_prefix
|
prefix used in uniwig for creating coverage |
required | |
coverage_folder
|
path to a folder with genome coverage by tracks |
required | |
name
|
str
|
suffix of model file name, which contains information about model type |
required |
s_index
|
int
|
from which position in universe line take assess region start position |
required |
e_index
|
int
|
from which position in universe line take assess region end position |
None
|
Returns:
Type | Description |
---|---|
likelihood of universe for given model |
hard_universe_likelihood
hard_universe_likelihood(model, universe, coverage_folder, coverage_prefix)
Calculate likelihood of hard universe based on core, start, end coverage model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
path to file containing model |
required |
universe
|
str
|
path to universe |
required |
coverage_prefix
|
prefix used in uniwig for creating coverage |
required | |
coverage_folder
|
path to a folder with genome coverage by tracks |
required |
Returns:
Type | Description |
---|---|
likelihood |
likelihood_only_core
likelihood_only_core(model_file, universe, coverage_folder, coverage_prefix)
Calculate likelihood of universe based only on core coverage model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_file
|
str
|
path to name containing model |
required |
universe
|
str
|
path to universe |
required |
coverage_prefix
|
prefix used in uniwig for creating coverage |
required | |
coverage_folder
|
path to a folder with genome coverage by tracks |
required |
Returns:
Type | Description |
---|---|
likelihood |
background_likelihood
background_likelihood(start, end, model_start, model_cove, model_end)
Calculate likelihood of background for given region
weigh_livelihood
weigh_livelihood(start, end, model_process, model_cove, model_out, reverse)
Calculate weighted likelihood of flexible part of the region
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start
|
int
|
start of the region |
required |
end
|
int
|
end of the region |
required |
model_process
|
array
|
model for analyzed type of flexible region |
required |
model_cove
|
array
|
model for coverage |
required |
model_out
|
array
|
model for flexible region that is not being analyzed |
required |
reverse
|
bool
|
if model_process corespondents to end we have to reverse the weighs |
required |
Returns:
Type | Description |
---|---|
likelihood of flexible part of the region |
likelihood_flexible_universe
likelihood_flexible_universe(model_file, universe, cove_folder, cove_prefix, save_peak_input=False)
Likelihood of given universe under the model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_file
|
str
|
path to file with lh model |
required |
universe
|
str
|
path to universe |
required |
cove_folder
|
path to a folder with genome coverage by tracks |
required | |
cove_prefix
|
prefix used in uniwig for creating coverage |
required | |
save_peak_input
|
bool
|
whether to save universe with each peak lh |
False
|
Returns:
Type | Description |
---|---|
lh of the flexible universe |
utils
Functions
prep_data
prep_data(folder, file, tmp_file)
File sort and merge
check_if_uni_sorted
check_if_uni_sorted(universe)
Check if regions in file are sorted
process_line
process_line(line)
Helper for reading in bed file line
chrom_cmp_bigger
chrom_cmp_bigger(a, b)
Natural check if chromosomes name is bigger
process_db_line
process_db_line(dn, pos_index)
Helper for reading in universe bed file line