Hands on
Test case
GADMA has a test case for a simple demographic model for one population: just a constant size of 10,000 individuals in the population. To run this test case:
$ gadma --test
Example 2
Suppose that we have SNP data for two populations. Data is in dadi’s SNP file format in the snp_data.txt
. Suppose we want to get all output in directory called gadma_output
:
$ gadma -i snp_data.txt -o gadma_output
Example 3
Assume we have VCF file. It is possible to read SFS data directly from it. Some rules are applied to the VCF format for successful reading. VCF file should be set along with Popmap file that describes how samples map to the populations:
$ gadma -i vcf_data.vcf,popmap_file -o gadma_output
Example 4
We did not specify AFS size or labels for populations, they are taken automatically from the input file. We can see a parameter file of our run in the gadma_output/param_file
.
# gadma_output/param_file
...
Population labels : pop_name_1, pop_name_2
Projections: 18, 20
...
But we know that the spectrum should be 20 x 20
matrix! To specify the size of AFS we need to create a parameter file and set Projections
:
# param_file
Projections : 20, 20
The order of populations can be changed as:
# param_file
Projections : 20,20
Population labels : pop_name_2, pop_name_1
If we want to rename populations, we should change names in the snp_data.txt
file.
Now assume we want to get the simplest demographic model as fast as we can. We will tell GADMA that we need no other dynamics of population sizes except sudden (constant) population size change and that we want to use the moments library.
# param_file
Projections : 20,20
Population labels : pop_name_2, pop_name_1
Only sudden : True
Engine : moments
To run GADMA we need to specify the -p/--params
command-line option in the command line:
$ gadma -i snp_data.txt -o gadma_output -p params_file
Example 5
Consider some AFS file fs_data.fs
. There is a spectrum for three populations: YRI, CEU, CHB. However, the axes are mixed up: CHB, YRI, CEU. To run GADMA we should reorder them from most ancient to most recent:
# param_file
Population labels : YRI, CEU, CHB
We want to allow exponential growth (it is the default behaviour) and have some extra changes in the size of the ancient population. To do so we should specify Initial structure
. It is a list of three numbers: (1) the number of time intervals before the first split (we want here 2); (2) the number of time periods between the first and the second split events (at least 1); and (3) the number of time periods after the second split.
# param_file
Population labels : YRI, CEU, CHB
Initial structure : 2,1,1
Also we can put information about input file and output directory to our parameter file:
# param_file
Input data : fs_data.fs
Output directory : gadma_output
Population labels : YRI, CEU, CHB
Initial structure : 2,1,1
Now we can run GADMA in the following way:
$ gadma -p params
Example 6
We have our GADMA launch interrupted for some reasons. We want to resume it:
$ gadma --resume gadma_output
The directory gadma_output
is the output directory of the previous run. We can find the resumed run in gadma_output_resumed
Example 7
Our launch was finished, and we used dadi
with a default grid size which GADMA determines automatically if it is not specified by the user. We found out that it would be better to find some models using greater number of grid points in dadi scheme, but we want to take final models from the previous run:
# param_file
Pts : 40, 50, 60 #Greater value of grid size than it was
And run GADMA:
$ gadma --resume gadma_output --only_models -p params
Option --only_models
tells GADMA to take from gadma_output
final models only.
There is another way to do the same:
# param_file
Resume from : gadma_output
Only models : True
Pts : 40, 50, 60 #Greater value of grid size than it was
And run GADMA in the following way:
$ gadma -p params
Example 8
We can add a custom model using a parameter Custom filename
in the parameter file:
# param_file
Custom filename : YRI_CEU_demographic_model.py
Our custom file needs to contain a function with a fixed name model_func
. For example:
# YRI_CEU_demographic_model.py
def model_func(params, ns, pts)
nu1F, nu2B, nu2F, m, Tp, T = params
n1, n2 = ns
xx = yy = dadi.Numerics.default_grid(pts)
phi = dadi.PhiManip.phi_1D(xx)
phi = dadi.Integration.one_pop(phi, xx, Tp, nu=nu1F)
phi = dadi.PhiManip.phi_1D_to_2D(xx, phi)
nu2_func = lambda t: nu2B*(nu2F/nu2B)**(t/T)
phi = dadi.Integration.two_pops(phi, xx, T, nu1=nu1F,
nu2=nu2_func, m12=m, m21=m)
sfs = dadi.Spectrum.from_phi(phi, (n1,n2), (xx,yy))
return sfs
In addition, we can easily specify values for lower and upper bounds through a parameter file. Let’s set lower and upper bounds for the model we defined above:
# param_file
Lower bounds : 1e-2, 1e-2, 1e-2, 0, 0, 0
Upper bounds : 100, 100, 100, 10, 3, 3
Example 9
Also, we can get the values of lower/upper bounds, both, or none of them in the parameter file automatically. For this, each identifier in the parameter file must be declared through a parameter Parameter identifiers
. Below is an identifier list:
# param_file
# if identifier starts with letter:
# T/t - time
# N/n - size of population
# M/m - migration
# S/s/F/f - split event, proportion in which population size
# is divided to form two new populations.
# G/g - selection
# H/h - dominance coefficient for selection.
For example, we set a lower bound for the model we defined above (see Example 7) and we want to get an upper bound automatically.
# param_file
Lower bound : 1e-2, 1e-2, 1e-2, 0, 0, 0
Upper bound : None
Parameter identifiers : nu1F, nu2B, nu2F, m, Tp, T
If the custom function has first line line par1, par2, ... = params
then option Parameter identifiers
could be missed and GADMA will take this option from the function.
Example YRI, CEU
GADMA has an example of the parameter file example_params
. To run GADMA with this parameters one should just run from GADMA’s home directory:
$ gadma -p example_params