10 Building reticulating trees with TreeMix




TreeMix uses allele frequencies to reconstruct patterns of population splits and mixtures. You can find the instruction manual here. Similar to how ADMIXTURE requires that you supply a K value for the number of gene pool clusters, TreeMix requires that you supply the number of anticipated migration events. Using this parameter, TreeMix estimates a tree together with given number of reticulations such that it maximizes the likelihood of the data.

TreeMix is already installed on our VM so there’s nothing for you to do — but for those of you who want to install it on your Mac or LINUX computer, follow these instructions:


Install the Boost Library:

On Ubuntu:

sudo apt-get install libboost-all-dev

On Mac:

wget http://mirror-hk.koddos.net/gnu/gsl/gsl-2.6.tar.gz
tar -xzf gsl-2.6.tar.gz
cd gsl-2.6
./configure
make
sudo make install


Install GNU Scientific Library:

On Ubuntu:

sudo apt-get install libgsl-dev

On Mac:

Download MacPorts, and run the following command:

sudo port install boost


Install TreeMix:

On either Mac or LINUX:

wget https://bitbucket.org/nygcresearch/treemix/downloads/treemix-1.13.tar.gz
tar -xzf treemix-1.13.tar.gz
cd treemix-1.13
./configure
make
sudo make install

Example Analysis:

Make a new directory called AFROAMER and change into that directory. Let’s download the genomes of Mexicans, Mayans, Spaniards, Han Chinese, Africans, and Caucasians from Utah. We should expect at least two rather obvious admixture events: that African Americans are a blend of African and European genes while Mexicans are a blend of European and Mayan genes.

                              SELECT i.family_id, i.individual_id, i.paternal_id, i.maternal_id, i.sex, i.phenotype 
FROM individuals i NATURAL JOIN groupings g 
WHERE i.family_id IN ('Mexican', 'Maya', 'UtahWhite', 'Spaniards', 'Han_N', 'Han_S') OR 
(g.group_name = 'Africa' AND i.family_id NOT IN ('Ethiopians', 'EthiopianJews', 'AfricanBarbados')) 
ORDER BY i.individuals_id;
                            

But to do this, we’ll need a series of one-liner queries:

\copy ( SELECT i.family_id, i.individual_id, i.paternal_id, i.maternal_id, i.sex, i.phenotype, i.snps FROM individuals i NATURAL JOIN groupings g WHERE i.family_id IN ('Mexican', 'Maya', 'UtahWhite', 'Spaniards', 'Han_N', 'Han_S') OR ( g.group_name = 'Africa' AND i.family_id NOT IN ('Ethiopians', 'EthiopianJews', 'AfricanBarbados') ) ORDER BY i.individuals_id ) TO 'AFROAMER.ped' (FORMAT CSV, DELIMITER ' ');
\copy (SELECT i.family_id, i.individual_id, i.family_id FROM individuals i NATURAL JOIN groupings g WHERE i.family_id IN ('Mexican', 'Maya', 'UtahWhite', 'Spaniards', 'Han_N', 'Han_S') OR ( g.group_name = 'Africa' AND i.family_id NOT IN ('Ethiopians', 'EthiopianJews', 'AfricanBarbados') ) ORDER BY i.individuals_id ) TO 'AFROAMER.pop' (FORMAT CSV, DELIMITER ' ');
\copy ( SELECT s.chromosome_id, s.snp_id, s.genetic_distance, s.base_pair FROM datasets d NATURAL JOIN datasets_snps ds NATURAL JOIN snps s WHERE d.dataset_name = 'HGDP' ORDER BY ds.seq_order ) TO 'AFROAMER.map' (FORMAT CSV, DELIMITER ' ');

We can now exit psql and return to the shell. As usual, fix the pedigree file and convert to binary.

perl -i -pe's/"//g' AFROAMER.ped
plink --file AFROAMER --make-bed -out AFROAMER

Now we can calculate allele frequencies, grouped by population:

plink --bfile AFROAMER --within AFROAMER.pop --freq -out AFROAMER

… which will be the primary input for TreeMix. This program needs the data in a compressed format, and we can use gzip to accomplish this:

gzip AFROAMER.frq.strat

Download a Python script, plink2treemix.py, to convert the PLINK data for use with TreeMix:

plink2treemix.py

wget https://bitbucket.org/nygcresearch/treemix/downloads/plink2treemix.py
python plink2treemix.py AFROAMER.frq.strat.gz AFROAMER.gz

Finally, we can now run the analysis. We will root our tree with the African San people. We will predict that there will be two migration events. And we will build the tree using blocks of 1,000 SNPs:

treemix -i AFROAMER.gz -root 'San' -m 2 -k 1000 -global -o AFROAMER

To visualize the results in R, we need to use the “plotting_funcs.R” library:

plotting_funcs.R

wget https://bitbucket.org/nygcresearch/treemix/raw/f38bfada3286027a09924d630efa3ad190bda380/src/plotting_funcs.R

If you’re running R on your local computer, you will need to fetch the AFROAMER.vertices.gz, AFROAMER.edges.gz, and AFROAMER.covse.gz files from the VM.

source("plotting_funcs.R")
plot_tree("AFROAMER")

Can you tell whether Mexicans have received a greater injection of Mayan genes, or whether African Americans have received a greater injection of European genes?




Leave a Reply

Your email address will not be published. Required fields are marked *