Confidence intervals

GADMA contains special scripts for Confidence Intervals evaluation. To get CI one will need correctly bootstrapped data. If SNP’s that were used for AFS are unlinked, then usual bootstrap over them should be performed. However, if they are linked then block bootstrap should be used. It is done over the unlinked regions of genome.

When bootstrapped data is ready, one should run two scripts gadma-run_ls_on_boot_data and gadma-get_confidence_intervals in order to get CI. One can find example here.

Run local search on bootstrapped data

First script gadma-run_ls_on_boot_data runs local search from known optimum for initial AFS (the one that GADMA found) for each AFS from bootstrap. The usage is following:

$ gadma-run_ls_on_boot_data --help

usage: GADMA module for runs of local search on bootstrapped data.
Is needed for calculating confidence intervals.
       [-h] -b <dir> -d <filename> -o <dir> [-j N]
       [--opt log/powell] [-p <filename>]

optional arguments:
  -h, --help            show this help message and exit
  -b <dir>, --boots <dir>
                        Directory where bootstrapped data is
                        located.
  -d <filename>, --dem_model <filename>
                        File with demographic model. Should
                        contain `model_func` or `generated_model`
                        function. One can put there several extra
                        parameters and they will be taken
                        automatically, otherwise one will need to
                        enter them manually. Such parameters are:
                        1) p0 (or popt) - initial parameters values
                        2) lower_bound - list of lower bounds for
                        parameters values
                        4) upper_bound - list of upper bounds for
                        parameters values
                        5) par_labels/param_labels - list of
                        string names for parameters
                        6) pts - pts for dadi (if there is no pts
                        then moments will be run automatically).
  -o <dir>, --output <dir>
                        Output directory.
  -j N, --jobs N        Number of threads for parallel run.
  --opt log/powell      Local search algorithm, by now it can be:
                        1) `log` - Inference.optimize_log
                        2) `powell` - Inference.optimize_powell.
  -p <filename>, --params <filename>
                        Filename with parameters, should be valid
                        python file.
                        Parameters are presented in -d/--dem_model
                        option description upper.

After the run, there will be pandas table in output directory with two different extensions - result_table.pkl and result_table.csv. It contains parameters for each bootstrap. At this point of time it is possible to change its units and add new parameters with additional manipulations in Python and then run gadma-get_confidence_intervals to get CI.

Get Confidence Intervals from table

After the gadma-run_ls_on_boot_data the result table will be in output directory. One can change its columns due to what parameters should be used for CI. For example, it is possible to translate units from genetic and add such parameter as size of ancestral population (N_A). To do it user should write script.

To calculate confidence intervals:

$ gadma-get_confidence_intervals --help
usage: GADMA module for calculating confidence
        intervals from the result table of local
        search runs on bootstrapped data.
       [-h] [--log] [--tex] [--acc N] <filename>

positional arguments:
  <filename>  Filename (.csv or .pkl) with result from
              local search runs on bootstrapped data.
              Output of gadma-run_ls_on_boot_data.

optional arguments:
  -h, --help  show this help message and exit
  --log       If log then logarithm will be used to
              calculate confidence intervals.
  --tex       Tex output.
  --acc N     Accuracy of output (dafault: 5).