Skip to content

Assess Module

assess

Functions

Modules

assess

Functions
run_all_assessment_methods
run_all_assessment_methods(raw_data_folder, file_list, universe, no_workers, folder_out, pref, save_each, overlap=False, distance_f_t_u=False, distance_f_t_u_flex=False, distance_u_t_f=False, distance_u_t_f_flex=False)

Assess universe fit to collection using overlap and distance metrics.

Args: raw_data_folder (str): path to raw files from the collection file_list (str): path to file with list of files in the collection universe (str): path to universe that is being assessed no_workers (int): number of workers for multiprocessing folder_out (str): output folder pref (str): prefixed used for creating output files save_each (bool): if save output of distance metrics for each region overlap (bool): if calculate overlap metrics distance_f_t_u (bool): if calculate distance from file to universe metrics distance_f_t_u_flex (bool): if calculate flexible distance from file to universe metrics distance_u_t_f (bool): if calculate distance from universes to file metrics distance_u_t_f_flex (bool): if calculate flexible distance from universes to file metrics

get_rbs
get_rbs(f_t_u, u_t_f)

Calculate RBS

get_mean_rbs
get_mean_rbs(folder, file_list, universe, no_workers, flexible=False)

Calculate average RBS of the collection.

Args: folder (str): path to folder with the collection file_list (str): path to file with list of files in the collection universe (str): path to the universe no_workers (int): number of workers for multiprocessing flexible (bool): if to calculate flexible version of the metric

Returns: int: average RBS

get_rbs_from_assessment_file
get_rbs_from_assessment_file(file, cs_each_file=False, flexible=False)

Calculate RBS form file with results of metrics per file.

Args: file (str): path to file with assessment results cs_each_file (bool): if report RBS for each file, not average for the collection flexible (bool): if use flexible version of the metric

get_f_10_score
get_f_10_score(folder, file_list, universe, no_workers)

Get F10 score for a universes and collection of files.

Args: folder (str): path to folder with the collection file_list (str): path to file with list of files in the collection universe (str): path to the universe no_workers (int): number of workers for multiprocessing

Returns: int: average F10 score

get_f_10_score_from_assessment_file
get_f_10_score_from_assessment_file(file, f10_each_file=False)

Get F10 score from assessment output file.

Args: file (str): path to file with assessment results f10_each_file (bool): if report F10 for each file, not average for the collection

get_likelihood
get_likelihood(model_file, universe, cove_folder, cove_prefix='all', flexible=False, save_peak_input=False)

Calculate universe likelihood given collection.

Args: model_file (str): path to file with likelihood model universe (str): path to the universe cove_folder (str): path to the coverage folder cove_prefix (str): prefixed used for generating coverage flexible (bool): if to calculate flexible likelihood save_peak_input (bool): if to save likelihood input of each region

filter_universe
filter_universe(universe, universe_filtered, min_size=0, min_coverage=0, filter_lh=False, model_file=None, cove_folder=None, cove_prefix=None, lh_cutoff=0)

Filter universe by region size, coverage by collection, likelihood.

Args: universe (str): path to input universe universe_filtered (str): path to output filtered universe min_size (int): minimum size of the region in the output universe min_coverage (int): minimum number coverage of universe region by collection filter_lh (bool): if use likelihood to filter universe model_file (str): path to collection likelihood model cove_folder (str): path to folder with coverage tracks cove_prefix (str): prefixed used for creating tracks lh_cutoff (int): minimum likelihood input

cli

Functions
build_subparser
build_subparser(parser)

Builds argument parser.

Returns:

Type Description

Argument parser

distance

Functions
flexible_distance_between_two_regions
flexible_distance_between_two_regions(region, query)

Calculate distance between region and flexible region from flexible universe.

Args: region ([int, int]): region from flexible universe query (int): analyzed region

Returns: int: distance

distance_between_two_regions
distance_between_two_regions(region, query)

Calculate distance between region in database and region from the query.

Args: region ([int]): region from hard universe query (int): analysed region

Returns: int: distance

distance_to_closest_region
distance_to_closest_region(db, db_queue, i, current_chrom, unused_db, pos_index, flexible, uni_to_file)

Calculate distance from given peak to the closest region in database.

Args: db (file): database file db_queue (list): queue of three last positions in database i: analyzed position from the query current_chrom (str): current analyzed chromosome from query unused_db (list): list of positions from universe that were not compared to query pos_index (list): which indexes from universe region use to calculate distance flexible (bool): whether the universe if flexible uni_to_file (bool): whether calculate distance from universe to file

Returns: int: peak distance to universe

read_in_new_universe_regions
read_in_new_universe_regions(db, q_chrom, current_chrom, unused_db, db_queue, waiting, pos_index)

Read in new universe regions closest to the peak.

Args: db (file): universe file q_chrom (str): new peak's chromosome current_chrom (str): chromosome that was analyzed so far unused_db (list): list of positions from universe that were not compared to query db_queue (list): que of three last positions in universe waiting (bool): whether iterating through file, without calculating distance, if present chromosome not present in universe pos_index (list): which indexes from universe region use to calculate distance

Returns: tuple: (bool, str) - if iterating through chromosome not present in universe; current chromosome in query

calc_distance_between_two_files
calc_distance_between_two_files(universe, q_folder, q_file, flexible, save_each, folder_out, pref, uni_to_file=False)

Main function for calculating distance between regions in file query to regions in database.

Args: universe (str): path to universe q_folder (str): path to folder containing query files q_file (str): query file flexible (bool): whether the universe if flexible save_each (bool): whether to save calculated distances for each file folder_out (str): output folder pref (str): prefix used as the name of the folder containing calculated distance for each file uni_to_file (bool): whether to calculate distance from universe to file

Returns: tuple: (str, int, int) - file name; median od distance of starts to starts in universe; median od distance of ends to ends in universe

run_distance
run_distance(folder, file_list, universe, no_workers, flexible=False, folder_out=None, pref=None, save_each=False, uni_to_file=False)

For group of files calculate distance to the nearest region in universe.

Args: folder (str): path to folder containing query files file_list (str): path to file containing list of query files universe (str): path to universe file no_workers (int): number of parallel processes flexible (bool): whether the universe if flexible folder_out (str): output folder pref (str): prefix used for saving save_each (bool): whether to save calculated distances for each file uni_to_file (bool): whether to calculate distance from universe to file

Returns: tuple: (float, float) - mean of median distances from starts in query to the nearest starts in universe; mean of median distances from ends in query to the nearest ends in universe

intersection

Functions
chrom_cmp
chrom_cmp(a, b)

Return smaller chromosome name

relationship_helper
relationship_helper(region_a, region_b, only_in, overlap)

For two region calculate their overlap; for earlier region calculate how many base pair only in it.

Args: region_a ([int, int]): region that starts first region_b ([int, int]): region that starts second only_in (int): number of positions only in a so far overlap (int): number of overlapping so far

two_region_intersection_diff
two_region_intersection_diff(region_d, region_q, only_in_d, only_in_q, inside_d, inside_q, overlap, start_d, start_q, waiting_d, waiting_q)

Check mutual position of two regions and calculate intersection and difference of two regions.

Args: region_d (list): region from universe region_q (list): region from query only_in_d (int): number of base pair only in universe only_in_q (int): number of base pair only in query inside_d (bool): whether there is still part of the region from universe to analyse inside_q (bool): whether there is still part of the region from query to analyse overlap (int): size of overlap start_d (int): start position of currently analyzed universe region start_q (int): start position of currently analyzed query region waiting_d (bool): whether waiting for the query to finish chromosome waiting_q (bool): whether waiting for the universe to finish chromosome

read_in_new_line
read_in_new_line(region, start, chrom, inside, waiting, lines, c_chrom, not_e)

Read in a new line from query or universe file

calc_diff_intersection
calc_diff_intersection(db, folder, query)

Difference and overlap of two files on base pair level.

Args: db (str): path to universe file folder (str): path to folder with query file query (str): query file name

Returns: tuple: (str, int, int, int) - file name; bp only in universe; bp only in query; overlap in bp

run_intersection
run_intersection(folder, file_list, universe, no_workers)

Calculate the base pair intersection of universe and group of files.

Args: folder (str): path to folder containing query files file_list (str): path to file containing list of query files universe (str): path to universe file no_workers (int): number of parallel processes save_to_file (str): whether to save median of calculated distances for each file folder_out (str): output folder pref (str): prefix used for saving

Returns: tuple: (float, float) - mean of fractions of intersection of file and universe divided by universe size; mean of fractions of intersection of file and universe divided by file size

likelihood

Classes
LhModel
LhModel(model, cove)

Object with combined information about lh model and coverage.

Args: model (ndarray): lh model array cove (ndarray): coverage array

Functions
Functions
calc_likelihood_hard
calc_likelihood_hard(universe, chroms, model_lh, coverage_folder, coverage_prefix, name, s_index, e_index=None)

Calculate likelihood of universe for given type of model. To be used with binomial model.

Args: universe (str): path to universe file chroms (list): list of chromosomes present in model model_lh (ModelLH): likelihood model coverage_folder: path to a folder with genome coverage by tracks coverage_prefix: prefix used in uniwig for creating coverage name (str): suffix of model file name, which contains information about model type s_index (int): from which position in universe line take assess region start position e_index (int): from which position in universe line take assess region end position

Returns: float: likelihood of universe for given model

hard_universe_likelihood
hard_universe_likelihood(model, universe, coverage_folder, coverage_prefix)

Calculate likelihood of hard universe based on core, start, end coverage model.

Args: model (str): path to file containing model universe (str): path to universe coverage_folder: path to a folder with genome coverage by tracks coverage_prefix: prefix used in uniwig for creating coverage

Returns: float: likelihood

likelihood_only_core
likelihood_only_core(model_file, universe, coverage_folder, coverage_prefix)

Calculate likelihood of universe based only on core coverage model.

Args: model_file (str): path to name containing model universe (str): path to universe coverage_folder: path to a folder with genome coverage by tracks coverage_prefix: prefix used in uniwig for creating coverage

Returns: float: likelihood

background_likelihood
background_likelihood(start, end, model_start, model_cove, model_end)

Calculate likelihood of background for given region

weigh_livelihood
weigh_livelihood(start, end, model_process, model_cove, model_out, reverse)

Calculate weighted likelihood of flexible part of the region.

Args: start (int): start of the region end (int): end of the region model_process (array): model for analyzed type of flexible region model_cove (array): model for coverage model_out (array): model for flexible region that is not being analyzed reverse (bool): if model_process corespondents to end we have to reverse the weighs

Returns: float: likelihood of flexible part of the region

likelihood_flexible_universe
likelihood_flexible_universe(model_file, universe, cove_folder, cove_prefix, save_peak_input=False)

Likelihood of given universe under the model.

Args: model_file (str): path to file with lh model universe (str): path to universe cove_folder: path to a folder with genome coverage by tracks cove_prefix: prefix used in uniwig for creating coverage save_peak_input (bool): whether to save universe with each peak lh

Returns: float: lh of the flexible universe

utils

Functions
prep_data
prep_data(folder, file, tmp_file)

File sort and merge

check_if_uni_sorted
check_if_uni_sorted(universe)

Check if regions in file are sorted

process_line
process_line(line)

Helper for reading in bed file line

chrom_cmp_bigger
chrom_cmp_bigger(a, b)

Natural check if chromosomes name is bigger

process_db_line
process_db_line(dn, pos_index)

Helper for reading in universe bed file line