Inference module¶
-
gadma.Inference.
get_claic_score
(func_ex, all_boot, p0, data, engine=None, args=(), eps=0.01, pts=None)¶ Returns CLAIC score for demographic model with specified value of eps.
- Parameters
func_ex – Custom function to evaluate demographic model. Usually it is model_func function from generated code of GADMA. It is run by calling func_ex(p, ns, *args), where p is values of parameters and ns - sample sizes.
all_boot – List of bootstrapped data for CLAIC evaluation.
p0 – Values of parameters for
func_ex
demographic model.data – Original data for CLAIC evaluation. It is data that was used for demographic inference.
engine – Engine id for likelihood evaluations. Could be one of the following: - dadi - moments
args – Arguments of
func_ex
function.eps – Step size for Hessian and gradient calculations. Usually is between 1e-5 and 1e-2. The smaller eps is the more accurate CLAIC value is.
pts – Deprecated parameter from GADMA version 1. If is set then warning is printed.
- returns: None if failed to get CLAIC due to singular matrix of Hessian.
Could be solved by increasing value of
eps
.- note: There differencies between GADMA v1 and GADMA v2, there is some
backward compatibility, but sometimes errors could be raised.
-
gadma.Inference.
load_data_from_dir
(dirname, engine, projections=None, population_labels=None, outgroup=None)¶ Load data of SFS type from the directory. Data is considered to be very consistent: for example, it could be bootstrap of one dataset. All data should have the same projections, pop labels and so on.
- Parameters
dirname – Path to the directory with data.
engine – Engine id for data loading. Could be one of the following: - dadi - moments
projections – Sample size of data. If None it will be chosen automatically.
population_labels – Labels of populations in the data.
outgroup – If True then there is outgroup represented in files. Then unfolded SFS will be loaded if SFS is needed.
-
gadma.Inference.
optimize_ga
(data, model_func, engine, args=(), lower_bound=None, upper_bound=None, p_ids=None, X_init=None, Y_init=None, num_init=50, gen_size=10, mut_strength=0.2, const_mut_strength=1.1, mut_rate=0.2, const_mut_rate=1.2, eps=0.01, n_stuck_gen=100, n_elitism=2, p_mutation=0.3, p_crossover=0.3, p_random=0.2, ga_maxiter=None, ga_maxeval=None, local_optimizer='BFGS_log', ls_maxiter=None, ls_maxeval=None, verbose=1, callback=None, save_file=None, eval_file=None, report_file=None)¶ Runs genetic algorithm optimizer in order to find best values of parameters for
model_func
demographic model fromdata
.- Parameters
data – Data for demographic inference.
model_func – Function to use for demographic inference that simulates SFS to compare it with
data
with log-likelihood. Is called by model_func(p, ns, *args), where p is values of parameters, ns - sample size and args - other arguments.engine – Engine id for demographic inference. Could be one of the following: - ‘dadi’ - ‘moments’
args – Arguments for
model_func
function. It is pts for dadi engine and could be dt_fac (or nothing) for moments engine.lower_bound (list) – Lower bound for each demographic parameter.
upper_bound (list) – Upper bound for each demographic parameter.
p_ids (list) –
Parameter identifiers for demographic parameters. Each identifier should start with one of the following letters: - n or N for size of populations; - t or T for time; - m or M for migration rates; - s or S for fractional parameters (between 0 and 1).
For example valid identifiers are: [‘nu1F’, ‘nu2B’, ‘nu2F’, ‘m’, ‘Tp’, ‘T’]
X_init (list of lists) – list of initial example parameters. GA will be initialized by those values. It could be used for combinations of optimizations or for restart.
Y_init (list) – value of log-likelihood for values in X_init.
num_init (int) – Number of initial points to start Genetic algorithm.
gen_size (int) – Size of generation of genetic algorithm. That is number of individuals/solutions on each step of GA.
mut_strength (float) – Mean fraction of parameters for mutation in GA.
const_mut_strength (float) – Const to change
mut_strength
during GA according to one-fifth rule.mut_rate (float) – Mean rate of mutation in GA.
const_mut_rate (float) – Const to change
mut_rate
during GA.eps (float) – const for model’s log likelihood compare. Model is better if its log likelihood is greater than log likelihood of another model by epsilon.
n_stuck_gen (int) – Number of iterations for GA stopping: GA stops when it can’t improve model during n_stuck_gen generations.
n_elitism (int) – Number of best models from previous generation in GA that will be taken to new iteration.
p_mutation (float) – probability of mutation in one generation of GA.
p_crossover (float) – probability of crossover in one generation of GA.
p_random (float) – Probability of random generated individual in one generation of GA.
ga_maxiter (int) – Maximum number of generations in GA.
ga_maxeval (int) – Maximum number of function evaluations in GA.
local_optimizer (str) – Local optimizer name to run for best solution of GA. Could be None or one of: * ‘BFGS’ * ‘BFGS_log’ * ‘L-BFGS-B’ * ‘L-BFGS-B_log’ * ‘Powell’ * ‘Powell_log’ * ‘Nelder-Mead’ * ‘Nelder-Mead_log’
ls_maxiter (int) – Maximum number of iterations in local optimization.
ls_maxeval (int) – Maximum number of function evaluations in local optimization.
verbose (int) – Verbose of output.
callback – callback to call during optimizations. (callback(x, y))
save_file (str) – File for save GA’s state on current generation.
eval_file – File to save all evaluations during GA and local optimization.
report_file – File to write reports of GA and local optimization.