Inference module

gadma.Inference.get_claic_score(func_ex, all_boot, p0, data, engine=None, args=(), eps=0.01, pts=None)

Returns CLAIC score for demographic model with specified value of eps.

Parameters
  • func_ex – Custom function to evaluate demographic model. Usually it is model_func function from generated code of GADMA. It is run by calling func_ex(p, ns, *args), where p is values of parameters and ns - sample sizes.

  • all_boot – List of bootstrapped data for CLAIC evaluation.

  • p0 – Values of parameters for func_ex demographic model.

  • data – Original data for CLAIC evaluation. It is data that was used for demographic inference.

  • engine – Engine id for likelihood evaluations. Could be one of the following: - dadi - moments

  • args – Arguments of func_ex function.

  • eps – Step size for Hessian and gradient calculations. Usually is between 1e-5 and 1e-2. The smaller eps is the more accurate CLAIC value is.

  • pts – Deprecated parameter from GADMA version 1. If is set then warning is printed.

returns: None if failed to get CLAIC due to singular matrix of Hessian.

Could be solved by increasing value of eps.

note: There differencies between GADMA v1 and GADMA v2, there is some

backward compatibility, but sometimes errors could be raised.

gadma.Inference.load_data_from_dir(dirname, engine, projections=None, population_labels=None, outgroup=None)

Load data of SFS type from the directory. Data is considered to be very consistent: for example, it could be bootstrap of one dataset. All data should have the same projections, pop labels and so on.

Parameters
  • dirname – Path to the directory with data.

  • engine – Engine id for data loading. Could be one of the following: - dadi - moments

  • projections – Sample size of data. If None it will be chosen automatically.

  • population_labels – Labels of populations in the data.

  • outgroup – If True then there is outgroup represented in files. Then unfolded SFS will be loaded if SFS is needed.

gadma.Inference.optimize_ga(data, model_func, engine, args=(), lower_bound=None, upper_bound=None, p_ids=None, X_init=None, Y_init=None, num_init=50, gen_size=10, mut_strength=0.2, const_mut_strength=1.1, mut_rate=0.2, const_mut_rate=1.2, eps=0.01, n_stuck_gen=100, n_elitism=2, p_mutation=0.3, p_crossover=0.3, p_random=0.2, ga_maxiter=None, ga_maxeval=None, local_optimizer='BFGS_log', ls_maxiter=None, ls_maxeval=None, verbose=1, callback=None, save_file=None, eval_file=None, report_file=None)

Runs genetic algorithm optimizer in order to find best values of parameters for model_func demographic model from data.

Parameters
  • data – Data for demographic inference.

  • model_func – Function to use for demographic inference that simulates SFS to compare it with data with log-likelihood. Is called by model_func(p, ns, *args), where p is values of parameters, ns - sample size and args - other arguments.

  • engine – Engine id for demographic inference. Could be one of the following: - ‘dadi’ - ‘moments’

  • args – Arguments for model_func function. It is pts for dadi engine and could be dt_fac (or nothing) for moments engine.

  • lower_bound (list) – Lower bound for each demographic parameter.

  • upper_bound (list) – Upper bound for each demographic parameter.

  • p_ids (list) –

    Parameter identifiers for demographic parameters. Each identifier should start with one of the following letters: - n or N for size of populations; - t or T for time; - m or M for migration rates; - s or S for fractional parameters (between 0 and 1).

    For example valid identifiers are: [‘nu1F’, ‘nu2B’, ‘nu2F’, ‘m’, ‘Tp’, ‘T’]

  • X_init (list of lists) – list of initial example parameters. GA will be initialized by those values. It could be used for combinations of optimizations or for restart.

  • Y_init (list) – value of log-likelihood for values in X_init.

  • num_init (int) – Number of initial points to start Genetic algorithm.

  • gen_size (int) – Size of generation of genetic algorithm. That is number of individuals/solutions on each step of GA.

  • mut_strength (float) – Mean fraction of parameters for mutation in GA.

  • const_mut_strength (float) – Const to change mut_strength during GA according to one-fifth rule.

  • mut_rate (float) – Mean rate of mutation in GA.

  • const_mut_rate (float) – Const to change mut_rate during GA.

  • eps (float) – const for model’s log likelihood compare. Model is better if its log likelihood is greater than log likelihood of another model by epsilon.

  • n_stuck_gen (int) – Number of iterations for GA stopping: GA stops when it can’t improve model during n_stuck_gen generations.

  • n_elitism (int) – Number of best models from previous generation in GA that will be taken to new iteration.

  • p_mutation (float) – probability of mutation in one generation of GA.

  • p_crossover (float) – probability of crossover in one generation of GA.

  • p_random (float) – Probability of random generated individual in one generation of GA.

  • ga_maxiter (int) – Maximum number of generations in GA.

  • ga_maxeval (int) – Maximum number of function evaluations in GA.

  • local_optimizer (str) – Local optimizer name to run for best solution of GA. Could be None or one of: * ‘BFGS’ * ‘BFGS_log’ * ‘L-BFGS-B’ * ‘L-BFGS-B_log’ * ‘Powell’ * ‘Powell_log’ * ‘Nelder-Mead’ * ‘Nelder-Mead_log’

  • ls_maxiter (int) – Maximum number of iterations in local optimization.

  • ls_maxeval (int) – Maximum number of function evaluations in local optimization.

  • verbose (int) – Verbose of output.

  • callback – callback to call during optimizations. (callback(x, y))

  • save_file (str) – File for save GA’s state on current generation.

  • eval_file – File to save all evaluations during GA and local optimization.

  • report_file – File to write reports of GA and local optimization.