GCTA

a tool for Genome-wide Complex Trait Analysis

 

The following options are designed to perform an MLM based association analysis. Previous data management options such as --keep, --extract and --maf, REML analysis options such as --reml-priors, --reml-maxit and --reml-no-constrain and multi-threading option --thread-num are still valid for this analysis.


--mlm
a

This option will initiate an MLM based association analysis including the candidate SNP 

y = a + bx + g + e

where y is the phenotype, a is the mean term, b is the additive effect (fixed effect) of the candidate SNP to be tested for association, x is the SNP genotype indicator variable coded as 0, 1 or 2, g is the polygenic effect (random effect) i.e. the accumulated effect of all SNPs (as captured by the GRM calculated using all SNPs) and e is the residual. For the ease of computation, the genetic variance, var(g), is estimated based on the null model i.e. y = a  + g + e and then fixed while testing for the association between each SNP and the trait. This analysis would be similar as that implemented in other software tools such as EMMAX, FaST-LMM and GEMMA.

The results will be saved in the *.mlma file.

 

--mlma-loco

This option will implement an MLM based association analysis with the chromosome, on which the candidate SNP is located, excluded from calculating the GRM. We call it MLM leaving-one-chromosome-out (LOCO) analysis. The model is

y = a + bx + g- + e

where g- is the accumulated effect of all SNPs except those on the chromosome where the candidate SNP is located. The var(g-) will be re-estimated each time when a chromosome is excluded from calculating the GRM. The MLM-LOCO analysis is computationally less efficient but more powerful as compared with the MLM analysis including the candidate (--mlma).

The results will be saved in the *.loco.mlma file.

 

--mlma-no-adj-covar

If there are covariates included in the analysis, the covariates will be fitted in the null model, a model including the mean term (fixed effect), covariates (fixed effects), polygenic effects (random effects) and residuals (random effects). By default, in order to improve computational efficiency, the phenotype will be adjusted by the mean and covariates, and the adjusted phenotype will subsequently be used for testing SNP association. However, if SNPs are correlated with the covariates, pre-adjusting the phenotype by the covariates will probably cause loss of power. If this option is specified, the covariates will be fitted together with the SNP for association test. However, this will significantly reduce computational efficiency.

 

Examples

# MLM based association analysis - If you have already computed the GRM

gcta64 --mlma --bfile test --grm test --pheno test.phen --out test --thread-num 10 


# MLM based association analysis including the candidate SNP (MLMi)

gcta64 --mlma --bfile test --pheno test.phen --out test --thread-num 10

 

# MLM leaving-one-chromosome-out (LOCO) analysis

gcta64 --mlma-loco --bfile test --pheno test.phen --out test --thread-num 10

 

Output file format

test.mlma or test.loco.mlma (columns are chromosome, SNP, physical position, reference allele (the coded effect allele), the other allele, frequency of the reference allele, SNP effect, standard error and p-value).

Chr   SNP  bp     ReferenceAllele         OtherAllele       Freq b       se     p

1       qtl2_1       1001         L        H       0.366        0.0143857        0.0411682        0.726761

1       qtl2_2       1002         H       L        0.326        -0.0240756       0.0421248        0.56764

1       qtl2_3       1003         H       L        0.146        -0.0921772       0.0565541        0.103124

1       qtl2_4       1004         H       L        0.3865      -0.0771376       0.0394826        0.0507357

1       qtl2_5       1005         H       L        0.1665      0.00251276      0.0526821        0.961958

1       qtl2_6       1006         L        H       0.119        -0.0153568       0.059891 0.797632

1       qtl2_7       1007         L        H       0.1675      -0.0487809       0.0512279        0.340979

 

 

References

 

An overview of the MLM based association methods: Yang J, Zaitlen NA, Goddard ME, Visscher PM and Price AL (2014) Mixed model association methods: advantages and pitfalls. Nat Genet. 2014 Feb;46(2):100-6. [Pubmed ID: 24473328]

REML analysis and GCTA Software: Yang J, Lee SH, Goddard ME and Visscher PM. GCTA: a tool for Genome-wide Complex Trait Analysis. Am J Hum Genet. 2011 Jan 88(1): 76-82. [PubMed ID: 21167468]

 

Overview

Download

Tutorial

FAQ

Options

1. Input and output

2. Data management

3. Estimation of the genetic relationships

4. Manipulation of the genetic relationship matrix

5. Principal component analysis

6. Estimation of the variance explained by all the SNPs

7. Estimation of the LD structure

8. GWAS Simulation

9. Raw genotype data

10. Conditional & joint GWAS analysis

11. Bivariate REML analysis

12. Mixed Linear Model Association Analysis

13. Multi-thread computing

 

 

 

 

GCTA-MLMA: mixed linear model based association analysis