Output¶
Note
Output has changed a little in GADMA version 2.
GADMA puts all files to the directory that user set through -o/--output
command-line option:
$ gadma -o output_dir -i input_fs.fs
or through Output directory
option in the parameter file:
# param_file
...
Output directory : output_dir
...
Stdout and log file¶
GADMA prints its progress about every minute in stdout and in output_dir/GADMA.log
file:
[hhh:mm:ss]
All best logLL models:
GA number logLL AIC Model
All best AIC models:
GA number logLL AIC Model
--Best model by log likelihood--
Log likelihood: Best logLL
with AIC score: AIC_score
Model: representation of best logLL model
--Best model by AIC score--
Log likelihood: logLL
with AIC score: Best AIC score
Model: representation of best AIC model
Note
One can set Silence
option in the parameter file to True
to disable output in stdout, file py{output_dir/GADMA.log} will still have it.
Model representation¶
Every model is printed as a line of parameters, each parameter followed by its name in brackets.
Consider designations:
T
- time,s
- percent of split,nu
- size of population number py{i},d
- dynamic of changing of the size of population number py{i},m
- mutation rate from population py{i} to population py{j}.
Dynamic of population size change has string representations:
'Sud'
- sudden change of size for population number py{i};'Lin'
- linear change of size for population number py{i};'Exp'
- exponential change of size for population number py{i}.
Model is printed as sequence of time intervals and splits that are represented in the following way:
First period (
NAnc
— size of ancestry population):[Nanc = VALUE]Split:
If split divide population
X
of sizeNU
into two new populations by fractions1
:[ X pop split VALUE (s1) [VALUE_1(s1*NU), VALUE_2((1-s1)*NU)]If split divide population
X
of sizeNU
into two new populations without any fraction parameter (SettingSplit fractions
isFalse
):[ X pop split [VALUE_1(NU_1), VALUE_2(NU_2)]Usual time period:
If there is one population:
[ T_VALUE (t), [ NU_VALUE (nu) ], [D_VALUE (dyn)] ]If there are two populations:
[ T_VALUE (t), [ NU1_VALUE (nu1), NU2_VALUE (nu2)], [[None, M12_VALUE(m12)], [M21_VALUE (m21), None]], [D1_VALUE (dyn1), D2_VALUE (dyn2)]]}And similar if there are three populations.
At the end
theta
could be printed if length of sequence and mutation rate are known.
Also at the end of the string that corresponds to the model there is an information about model’s ancestry in the genetic algorithm:
- ‘c’ - for model, that is child of crossover,
- ‘r’ - if it was formed random way,
- ‘m’ - if it was mutated,
- ‘f’ - final model of genetic algorithm.
Note
‘m’ is added as many times as the model is mutated.
Example of the demographic model for two populations:
[Nanc = 7214] [ [ 7211(t1), [17004(nu11)], [Lin(dyn11)] ], [ 1 pop split 99.85% (s1) [16978.164(s1*nu11), 25.836((1-s1)*nu11)] ], [ 1365(t2), [12570(nu21), 8922(nu22)], [[0, 6.45e-05(m2_12)], [5.98e-05(m2_21), 0]], [Sud(dyn21), Lin(dyn22)] ] ] (theta = 2739.60)
Relative parameters¶
Sometimes it is more important to see parameters scaled to Nref = N_A
. To tell GADMA to show models with scaled parameters, option Relative parameters
should be set to True
. By default, it is False
. It is convenient when Theta0
is unknown.
Output directory content¶
For every repeat of the genetic algorithm GADMA creates a new folder in the output directory with corresponding number.
In every folder there is GADMA_GA.log
, where every iteration of the algorithm is saved, pictures and generated code of best models are saved in pictures
and code
directories of each run. eval_file
and save_file
have information about evaluations and optimization.
When the genetic algorithm finishes GADMA saves pictures and python code of obtained model in the corresponding folder.
When all GA are executed, the codes are saved in the root directory.
- <output_dir>
- 1
GADMA_GA.log
- pictures
- code
- dadi
- moments
- demes
final_best_logLL_model_dadi_code.py
final_best_logLL_model_moments_code.py
final_best_logLL_model_demes_code.yml
final_best_logLL_model.png
eval_file
save_file
- 2
GADMA_GA.log
- pictures
- code
- dadi
- moments
- demes
final_best_logLL_model_dadi_code.py
final_best_logLL_model_moments_code.py
final_best_logLL_model_demes_code.yml
final_best_logLL_model.png
eval_file
save_file
params
extra_params
GADMA.log
best_logLL_model.png
best_logLL_model_dadi_code.py
best_logLL_model_moments_code.py
best_logLL_model_demes_code.yml
best_aic_model.png
best_aic_model_dadi_code.py
best_aic_model_moments_code.py
best_aic_model_demes_code.yml
Generated code of models¶
By default, GADMA generates Python code only for final models for all available engines (dadi
, moments
). However, it can do it every N
iteration of the genetic algorithm. In this case option Print models code every N iteration
should be set in the parameter file. GADMA saves files with code to the output_dir/<GA_number>/python_code
directory. Both dadi
and moments
code are generated and saved in different folders there.
Moreover GADMA generates file with demographic history for demes
library. In contrast to python code of dadi
and moments
it is file in YAML format.
Each code contains the function of the model, which takes values of the parameters as input, and strings that load observed AFS, simulates expected AFS from the model’s function and calculates log-likelihood of two AFS’. The calculated log-likelihood is printed to stdout. For the moments
code, a picture is also drawn.
All code can be run in the following way:
$ python file_with_code.py
Example of generated code
import moments
import numpy as np
def model_func(params, ns):
t1, nu11, s1, t2, nu21, nu22, m2_12, m2_21 = params
sts = moments.LinearSystem_1D.steady_state_1D(np.sum(ns))
fs = moments.Spectrum(sts)
nu1_func = lambda t: 1.0 + (nu11 - 1.0) * (t / t1)
fs.integrate(tf=t1, Npop=lambda t: [nu1_func(t)], dt_fac=0.01)
fs = moments.Manips.split_1D_to_2D(fs, ns[0], ns[1])
nu2_func = lambda t: ((1 - s1) * nu11) + (nu22 - ((1 - s1) * nu11)) * (t / t2)
migs = np.array([[0, m2_12], [m2_21, 0]])
fs.integrate(tf=t2, Npop=lambda t: [nu21, nu2_func(t)], m=migs, dt_fac=0.01)
return fs
data = moments.Spectrum.from_file('YRI_CEU.fs')
ns = data.sample_sizes
p0 = [0.4998572004354712, 2.357114661127308, 0.9984806062829666,
0.09461000843655785, 1.7425719794077874, 1.2368394548443258,
0.9299543753642668, 0.8621179886837054]
model = model_func(p0, ns)
ll_model = moments.Inference.ll_multinom(model, data)
print('Model log likelihood (LL(model, data)): {0}'.format(ll_model))
theta = moments.Inference.optimal_sfs_scaling(model, data)
print('Optimal value of theta: {0}'.format(theta))
theta0 = 0.37976
Nanc = int(theta / theta0)
print('Size of ancestral population: {0}'.format(Nanc))
plot_ns = [4 for _ in ns] # small sizes for fast drawing
gen_mod = moments.ModelPlot.generate_model(model_func,
p0, plot_ns)
moments.ModelPlot.plot_model(gen_mod,
save_file='model_from_GADMA.png',
fig_title='Demographic model from GADMA',
draw_scale=True,
pop_labels=['YRI', 'CEU'],
nref=Nanc,
gen_time=None,
gen_time_units='generations',
reverse_timeline=True)