Inference

GADMA’s inference could be run both from command-line and from python directly.

Command-line

Usage of GADMA:

$ gadma --help

GADMA version 2.0.0 by Ekaterina Noskova (ekaterina.e.noskova@gmail.com)
Usage:
            gadma   -p/--params <params_file>
                    -e/--extra <extra_params_file>


Instead/With -p/--params and -e/--extra option you can set:
            -o/--output <output_dir>        output directory.
            -i/--input <in.fs>/<in.txt>     input file with AFS or in dadi format.
            --resume <resume_dir>           resume another launch from <resume_dir>.
            --only_models                   flag to take models only from another launch (--resume option).

            -h/--help               show this help message and exit.
            -v/--version            show version and exit.
            --test                  run test case.

    In case of any questions or problems, please contact: ekaterina.e.noskova@gmail.com

Run optimizer from Python

Genetic algorithm pipeline from GAGMA is available from gadma’s API by calling gadma.Inference.optimize_ga function. It is like usual optimization functions in dadi and moments. The short notice from the API:

gadma.Inference.optimize_ga(data, model_func, engine, args=(), lower_bound=None, upper_bound=None, p_ids=None, X_init=None, Y_init=None, num_init=50, gen_size=10, mut_strength=0.2, const_mut_strength=1.1, mut_rate=0.2, const_mut_rate=1.2, eps=0.01, n_stuck_gen=100, n_elitism=2, p_mutation=0.3, p_crossover=0.3, p_random=0.2, ga_maxiter=None, ga_maxeval=None, local_optimizer='BFGS_log', ls_maxiter=None, ls_maxeval=None, verbose=1, callback=None, save_file=None, eval_file=None, report_file=None)

Runs genetic algorithm optimizer in order to find best values of parameters for model_func demographic model from data.

Parameters
  • data – Data for demographic inference.

  • model_func – Function to use for demographic inference that simulates SFS to compare it with data with log-likelihood. Is called by model_func(p, ns, *args), where p is values of parameters, ns - sample size and args - other arguments.

  • engine – Engine id for demographic inference. Could be one of the following: - ‘dadi’ - ‘moments’

  • args – Arguments for model_func function. It is pts for dadi engine and could be dt_fac (or nothing) for moments engine.

  • lower_bound (list) – Lower bound for each demographic parameter.

  • upper_bound (list) – Upper bound for each demographic parameter.

  • p_ids (list) –

    Parameter identifiers for demographic parameters. Each identifier should start with one of the following letters: - n or N for size of populations; - t or T for time; - m or M for migration rates; - s or S for fractional parameters (between 0 and 1).

    For example valid identifiers are: [‘nu1F’, ‘nu2B’, ‘nu2F’, ‘m’, ‘Tp’, ‘T’]

  • X_init (list of lists) – list of initial example parameters. GA will be initialized by those values. It could be used for combinations of optimizations or for restart.

  • Y_init (list) – value of log-likelihood for values in X_init.

  • num_init (int) – Number of initial points to start Genetic algorithm.

  • gen_size (int) – Size of generation of genetic algorithm. That is number of individuals/solutions on each step of GA.

  • mut_strength (float) – Mean fraction of parameters for mutation in GA.

  • const_mut_strength (float) – Const to change mut_strength during GA according to one-fifth rule.

  • mut_rate (float) – Mean rate of mutation in GA.

  • const_mut_rate (float) – Const to change mut_rate during GA.

  • eps (float) – const for model’s log likelihood compare. Model is better if its log likelihood is greater than log likelihood of another model by epsilon.

  • n_stuck_gen (int) – Number of iterations for GA stopping: GA stops when it can’t improve model during n_stuck_gen generations.

  • n_elitism (int) – Number of best models from previous generation in GA that will be taken to new iteration.

  • p_mutation (float) – probability of mutation in one generation of GA.

  • p_crossover (float) – probability of crossover in one generation of GA.

  • p_random (float) – Probability of random generated individual in one generation of GA.

  • ga_maxiter (int) – Maximum number of generations in GA.

  • ga_maxeval (int) – Maximum number of function evaluations in GA.

  • local_optimizer (str) – Local optimizer name to run for best solution of GA. Could be None or one of: * ‘BFGS’ * ‘BFGS_log’ * ‘L-BFGS-B’ * ‘L-BFGS-B_log’ * ‘Powell’ * ‘Powell_log’ * ‘Nelder-Mead’ * ‘Nelder-Mead_log’

  • ls_maxiter (int) – Maximum number of iterations in local optimization.

  • ls_maxeval (int) – Maximum number of function evaluations in local optimization.

  • verbose (int) – Verbose of output.

  • callback – callback to call during optimizations. (callback(x, y))

  • save_file (str) – File for save GA’s state on current generation.

  • eval_file – File to save all evaluations during GA and local optimization.

  • report_file – File to write reports of GA and local optimization.