Inference¶
GADMA’s inference could be run both from command-line and from python directly.
Command-line¶
Usage of GADMA:
$ gadma --help
GADMA version 2.0.0 by Ekaterina Noskova (ekaterina.e.noskova@gmail.com)
Usage:
gadma -p/--params <params_file>
-e/--extra <extra_params_file>
Instead/With -p/--params and -e/--extra option you can set:
-o/--output <output_dir> output directory.
-i/--input <in.fs>/<in.txt> input file with AFS or in dadi format.
--resume <resume_dir> resume another launch from <resume_dir>.
--only_models flag to take models only from another launch (--resume option).
-h/--help show this help message and exit.
-v/--version show version and exit.
--test run test case.
In case of any questions or problems, please contact: ekaterina.e.noskova@gmail.com
Run optimizer from Python¶
Genetic algorithm pipeline from GAGMA is available from gadma
’s API by calling gadma.Inference.optimize_ga
function. It is like usual optimization functions in dadi
and moments
. The short notice from the API:
-
gadma.Inference.
optimize_ga
(data, model_func, engine, args=(), lower_bound=None, upper_bound=None, p_ids=None, X_init=None, Y_init=None, num_init=50, gen_size=10, mut_strength=0.2, const_mut_strength=1.1, mut_rate=0.2, const_mut_rate=1.2, eps=0.01, n_stuck_gen=100, n_elitism=2, p_mutation=0.3, p_crossover=0.3, p_random=0.2, ga_maxiter=None, ga_maxeval=None, local_optimizer='BFGS_log', ls_maxiter=None, ls_maxeval=None, verbose=1, callback=None, save_file=None, eval_file=None, report_file=None)¶ Runs genetic algorithm optimizer in order to find best values of parameters for
model_func
demographic model fromdata
.- Parameters
data – Data for demographic inference.
model_func – Function to use for demographic inference that simulates SFS to compare it with
data
with log-likelihood. Is called by model_func(p, ns, *args), where p is values of parameters, ns - sample size and args - other arguments.engine – Engine id for demographic inference. Could be one of the following: - ‘dadi’ - ‘moments’
args – Arguments for
model_func
function. It is pts for dadi engine and could be dt_fac (or nothing) for moments engine.lower_bound (list) – Lower bound for each demographic parameter.
upper_bound (list) – Upper bound for each demographic parameter.
p_ids (list) –
Parameter identifiers for demographic parameters. Each identifier should start with one of the following letters: - n or N for size of populations; - t or T for time; - m or M for migration rates; - s or S for fractional parameters (between 0 and 1).
For example valid identifiers are: [‘nu1F’, ‘nu2B’, ‘nu2F’, ‘m’, ‘Tp’, ‘T’]
X_init (list of lists) – list of initial example parameters. GA will be initialized by those values. It could be used for combinations of optimizations or for restart.
Y_init (list) – value of log-likelihood for values in X_init.
num_init (int) – Number of initial points to start Genetic algorithm.
gen_size (int) – Size of generation of genetic algorithm. That is number of individuals/solutions on each step of GA.
mut_strength (float) – Mean fraction of parameters for mutation in GA.
const_mut_strength (float) – Const to change
mut_strength
during GA according to one-fifth rule.mut_rate (float) – Mean rate of mutation in GA.
const_mut_rate (float) – Const to change
mut_rate
during GA.eps (float) – const for model’s log likelihood compare. Model is better if its log likelihood is greater than log likelihood of another model by epsilon.
n_stuck_gen (int) – Number of iterations for GA stopping: GA stops when it can’t improve model during n_stuck_gen generations.
n_elitism (int) – Number of best models from previous generation in GA that will be taken to new iteration.
p_mutation (float) – probability of mutation in one generation of GA.
p_crossover (float) – probability of crossover in one generation of GA.
p_random (float) – Probability of random generated individual in one generation of GA.
ga_maxiter (int) – Maximum number of generations in GA.
ga_maxeval (int) – Maximum number of function evaluations in GA.
local_optimizer (str) – Local optimizer name to run for best solution of GA. Could be None or one of: * ‘BFGS’ * ‘BFGS_log’ * ‘L-BFGS-B’ * ‘L-BFGS-B_log’ * ‘Powell’ * ‘Powell_log’ * ‘Nelder-Mead’ * ‘Nelder-Mead_log’
ls_maxiter (int) – Maximum number of iterations in local optimization.
ls_maxeval (int) – Maximum number of function evaluations in local optimization.
verbose (int) – Verbose of output.
callback – callback to call during optimizations. (callback(x, y))
save_file (str) – File for save GA’s state on current generation.
eval_file – File to save all evaluations during GA and local optimization.
report_file – File to write reports of GA and local optimization.