Skip to content

Assess Module

assess

Functions

Modules

assess

Functions
run_all_assessment_methods
run_all_assessment_methods(raw_data_folder, file_list, universe, no_workers, folder_out, pref, save_each, overlap=False, distance_f_t_u=False, distance_f_t_u_flex=False, distance_u_t_f=False, distance_u_t_f_flex=False)

Assess universe fit to collection using overlap and distance metrics

Parameters:

Name Type Description Default
raw_data_folder str

path to raw files from the collection

required
file_list str

path to file with list of files in the collection

required
universe str

path to universe that is being assessed

required
no_workers int

number of workers for multiprocessing

required
folder_out str

output folder

required
pref str

prefixed used for creating output files

required
save_each bool

if save output of distance metrics for each region

required
overlap bool

if calculate overlap metrics

False
distance_f_t_u bool

if calculate distance from file to universe metrics

False
distance_f_t_u_flex bool

if calculate flexible distance from file to universe metrics

False
distance_u_t_f bool

if calculate distance from universes to file metrics

False
distance_u_t_f_flex bool

if calculate flexible distance from universes to file metrics

False
get_rbs
get_rbs(f_t_u, u_t_f)

Calculate RBS

get_mean_rbs
get_mean_rbs(folder, file_list, universe, no_workers, flexible=False)

Calculate average RBS of the collection

Parameters:

Name Type Description Default
folder str

path to folder with the collection

required
file_list str

path to file with list of files in the collection

required
universe str

path to the universe

required
no_workers int

number of workers for multiprocessing

required
flexible bool

if to calculate flexible version of the metric

False

Returns:

Type Description

average RBS

get_rbs_from_assessment_file
get_rbs_from_assessment_file(file, cs_each_file=False, flexible=False)

Calculate RBS form file with results of metrics per file

Parameters:

Name Type Description Default
file str

path to file with assessment results

required
cs_each_file bool

if report RBS for each file, not average for the collection

False
flexible bool

if use flexible version of the metric

False
get_f_10_score
get_f_10_score(folder, file_list, universe, no_workers)

Get F10 score for a universes and collection of files

Parameters:

Name Type Description Default
folder str

path to folder with the collection

required
file_list str

path to file with list of files in the collection

required
universe str

path to the universe

required
no_workers int

number of workers for multiprocessing

required

Returns:

Type Description

average F10 score

get_f_10_score_from_assessment_file
get_f_10_score_from_assessment_file(file, f10_each_file=False)

Get F10 score from assessment output file

Parameters:

Name Type Description Default
file str

path to file with assessment results

required
f10_each_file bool

if report F10 for each file, not average for the collection

False
get_likelihood
get_likelihood(model_file, universe, cove_folder, cove_prefix='all', flexible=False, save_peak_input=False)

Calculate universe likelihood given collection

Parameters:

Name Type Description Default
model_file str

path to file with likelihood model

required
universe str

path to the universe

required
cove_folder str

path to the coverage folder

required
cove_prefix str

prefixed used for generating coverage

'all'
flexible bool

if to calculate flexible likelihood

False
save_peak_input bool

if to save likelihood input of each region

False

Returns:

Type Description
filter_universe
filter_universe(universe, universe_filtered, min_size=0, min_coverage=0, filter_lh=False, model_file=None, cove_folder=None, cove_prefix=None, lh_cutoff=0)

Filter universe by region size, coverage by collection, likelihood

Parameters:

Name Type Description Default
universe str

path to input universe

required
universe_filtered str

path to output filtered universe

required
min_size int

minimum size of the region in the output universe

0
min_coverage int

minimum number coverage of universe region by collection

0
filter_lh bool

if use likelihood to filter universe

False
model_file str

path to collection likelihood model

None
cove_folder str

path to folder with coverage tracks

None
cove_prefix str

prefixed used for creating tracks

None
lh_cutoff int

minimum likelihood input

0

cli

Functions
build_subparser
build_subparser(parser)

Builds argument parser.

Returns:

Type Description

Argument parser

distance

Functions
flexible_distance_between_two_regions
flexible_distance_between_two_regions(region, query)

Calculate distance between region and flexible region from flexible universe

Parameters:

Name Type Description Default
region

region from flexible universe

required
query int

analyzed region

required

Returns:

Type Description

distance

distance_between_two_regions
distance_between_two_regions(region, query)

Calculate distance between region in database and region from the query

Parameters:

Name Type Description Default
region [int]

region from hard universe

required
query int

analysed region

required

Returns:

Type Description

distance

distance_to_closest_region
distance_to_closest_region(db, db_queue, i, current_chrom, unused_db, pos_index, flexible, uni_to_file)

Calculate distance from given peak to the closest region in database

Parameters:

Name Type Description Default
db file

database file

required
db_queue list

queue of three last positions in database

required
i

analyzed position from the query

required
current_chrom str

current analyzed chromosome from query

required
unused_db list

list of positions from universe that were not compared to query

required
pos_index list

which indexes from universe region use to calculate distance

required
flexible bool

whether the universe if flexible

required
uni_to_file bool

whether calculate distance from universe to file

required

Returns:

Type Description

peak distance to universe

read_in_new_universe_regions
read_in_new_universe_regions(db, q_chrom, current_chrom, unused_db, db_queue, waiting, pos_index)

Read in new universe regions closest to the peak

Parameters:

Name Type Description Default
db file

universe file

required
q_chrom str

new peak's chromosome

required
current_chrom str

chromosome that was analyzed so far

required
unused_db list

list of positions from universe that were not compared to query

required
db_queue list

que of three last positions in universe

required
waiting bool

whether iterating through file, without calculating distance, if present chromosome not present in universe

required
pos_index list

which indexes from universe region use to calculate distance

required

Returns:

Type Description

if iterating through chromosome not present in universe; current chromosome in query

calc_distance_between_two_files
calc_distance_between_two_files(universe, q_folder, q_file, flexible, save_each, folder_out, pref, uni_to_file=False)

Maine function for calculating distance between regions in file query to regions in database

Parameters:

Name Type Description Default
universe str

path to universe

required
q_folder str

path to folder containing query files

required
q_file str

query file

required
flexible boolean

whether the universe if flexible

required
save_each bool

whether to save calculated distances for each file

required
folder_out str

output folder

required
pref str

prefix used as the name of the folder containing calculated distance for each file

required
uni_to_file

whether to calculate distance from universe to file

False

Returns:

Type Description

file name; median od distance of starts to starts in universe; median od distance of ends to ends in universe

run_distance
run_distance(folder, file_list, universe, no_workers, flexible=False, folder_out=None, pref=None, save_each=False, uni_to_file=False)

For group of files calculate distance to the nearest region in universe

Parameters:

Name Type Description Default
folder str

path to folder containing query files

required
file_list str

path to file containing list of query files

required
universe str

path to universe file

required
no_workers int

number of parallel processes

required
flexible bool

whether the universe if flexible

False
folder_out str

output folder

None
pref str

prefix used for saving

None
save_each bool

whether to save calculated distances for each file

False
uni_to_file

whether to calculate distance from universe to file

False

Returns:

Type Description

mean of median distances from starts in query to the nearest starts in universe; mean of median distances from ends in query to the nearest ends in universe

intersection

Functions
chrom_cmp
chrom_cmp(a, b)

Return smaller chromosome name

relationship_helper
relationship_helper(region_a, region_b, only_in, overlap)

For two region calculate their overlap; for earlier region calculate how many base pair only in it

Parameters:

Name Type Description Default
region_a

region that starts first

required
region_b

region that starts second

required
only_in int

number of positions only in a so far

required
overlap int

number of overlapping so far

required
two_region_intersection_diff
two_region_intersection_diff(region_d, region_q, only_in_d, only_in_q, inside_d, inside_q, overlap, start_d, start_q, waiting_d, waiting_q)

Check mutual position of two regions and calculate intersection and difference of two regions

Parameters:

Name Type Description Default
region_d list

region from universe

required
region_q list

region from query

required
only_in_d int

number of base pair only in universe

required
only_in_q int

number of base pair only in query

required
inside_d bool

whether there is still part of the region from universe to analyse

required
inside_q bool

whether there is still part of the region from query to analyse

required
overlap int

size of overlap

required
start_d int

start position of currently analyzed universe region

required
start_q int

start position of currently analyzed query region

required
waiting_d bool

whether waiting for the query to finish chromosome

required
waiting_q bool

whether waiting for the universe to finish chromosome

required
read_in_new_line
read_in_new_line(region, start, chrom, inside, waiting, lines, c_chrom, not_e)

Read in a new line from query or universe file

calc_diff_intersection
calc_diff_intersection(db, folder, query)

Difference and overlap of two files on base pair level

Parameters:

Name Type Description Default
db str

path to universe file

required
folder str

path to folder with query file

required
query str

query file name

required

Returns:

Type Description

file name; bp only in universe; bp only in query; overlap in bp

run_intersection
run_intersection(folder, file_list, universe, no_workers)

Calculate the base pair intersection of universe and group of files

Parameters:

Name Type Description Default
folder str

path to folder containing query files

required
file_list str

path to file containing list of query files

required
universe str

path to universe file

required
no_workers int

number of parallel processes

required
save_to_file str

whether to save median of calculated distances for each file

required
folder_out str

output folder

required
pref str

prefix used for saving

required

Returns:

Type Description

mean of fractions of intersection of file and universe divided by universe size; mean of fractions of intersection of file and universe divided by file size

likelihood

Classes
LhModel
LhModel(model, cove)

Object with combined information about lh model and coverage

Parameters:

Name Type Description Default
model ndarray

lh model array

required
cove ndarray

coverage array

required
Functions
Functions
calc_likelihood_hard
calc_likelihood_hard(universe, chroms, model_lh, coverage_folder, coverage_prefix, name, s_index, e_index=None)

Calculate likelihood of universe for given type of model To be used with binomial model

Parameters:

Name Type Description Default
universe

path to universe file

required
chroms list

list of chromosomes present in model

required
model_lh ModelLH

likelihood model

required
coverage_prefix

prefix used in uniwig for creating coverage

required
coverage_folder

path to a folder with genome coverage by tracks

required
name str

suffix of model file name, which contains information about model type

required
s_index int

from which position in universe line take assess region start position

required
e_index int

from which position in universe line take assess region end position

None

Returns:

Type Description

likelihood of universe for given model

hard_universe_likelihood
hard_universe_likelihood(model, universe, coverage_folder, coverage_prefix)

Calculate likelihood of hard universe based on core, start, end coverage model

Parameters:

Name Type Description Default
model str

path to file containing model

required
universe str

path to universe

required
coverage_prefix

prefix used in uniwig for creating coverage

required
coverage_folder

path to a folder with genome coverage by tracks

required

Returns:

Type Description

likelihood

likelihood_only_core
likelihood_only_core(model_file, universe, coverage_folder, coverage_prefix)

Calculate likelihood of universe based only on core coverage model

Parameters:

Name Type Description Default
model_file str

path to name containing model

required
universe str

path to universe

required
coverage_prefix

prefix used in uniwig for creating coverage

required
coverage_folder

path to a folder with genome coverage by tracks

required

Returns:

Type Description

likelihood

background_likelihood
background_likelihood(start, end, model_start, model_cove, model_end)

Calculate likelihood of background for given region

weigh_livelihood
weigh_livelihood(start, end, model_process, model_cove, model_out, reverse)

Calculate weighted likelihood of flexible part of the region

Parameters:

Name Type Description Default
start int

start of the region

required
end int

end of the region

required
model_process array

model for analyzed type of flexible region

required
model_cove array

model for coverage

required
model_out array

model for flexible region that is not being analyzed

required
reverse bool

if model_process corespondents to end we have to reverse the weighs

required

Returns:

Type Description

likelihood of flexible part of the region

likelihood_flexible_universe
likelihood_flexible_universe(model_file, universe, cove_folder, cove_prefix, save_peak_input=False)

Likelihood of given universe under the model

Parameters:

Name Type Description Default
model_file str

path to file with lh model

required
universe str

path to universe

required
cove_folder

path to a folder with genome coverage by tracks

required
cove_prefix

prefix used in uniwig for creating coverage

required
save_peak_input bool

whether to save universe with each peak lh

False

Returns:

Type Description

lh of the flexible universe

utils

Functions
prep_data
prep_data(folder, file, tmp_file)

File sort and merge

check_if_uni_sorted
check_if_uni_sorted(universe)

Check if regions in file are sorted

process_line
process_line(line)

Helper for reading in bed file line

chrom_cmp_bigger
chrom_cmp_bigger(a, b)

Natural check if chromosomes name is bigger

process_db_line
process_db_line(dn, pos_index)

Helper for reading in universe bed file line