Still Chasing Ghosts: A New Genetic Methodology Will Not Find the “Missing Heritability“

One of the hopes and promises of the Human Genome Sequencing Project was that it would revolutionize the understanding, diagnosis, and treatment of most human disorders. It would do this by uncovering the supposed “genetic bases” of human behavior. With a few exceptions, however, the search for common gene variants -“polymorphisms” – associated with common diseases has borne little fruit. And when such associations have been found the polymorphisms seem to have little predictive value and do little to advance our understanding of the causes of disease. In a 2012 study, for example, researchers found that incorporating genetic information did not improve doctors’ ability to predict disease risk for breast cancer, Type 2 diabetes, and rheumatoid arthritis [1].

And to date, not a single polymorphism has been reliably associated with any psychiatric disorders nor any aspect of human behavior within the “normal” range (e.g., differences in “intelligence”).

To some researchers this state of affairs has given rise to a conundrum known as the “problem of missing heritability.” If traits such as intelligence are reported to be 50% heritable, goes the theory, why have no genes associated with intelligence been identified? One possible solution to the problem of missing heritability is that the heritability estimates are wrong. Another proposal is that hundreds or even thousands of genes are involved, each gene of such small effect that it cannot be identified by the standard genome-wide association study (GWAS). These problems have spurred the development of a new methodology for identifying gene variants in human populations called genome-wide complex trait analysis (GCTA).

Enter Genome-wide complex trait analysis (GCTA)

The first results of a GCTA study were published in 2010 [2]. Since then its use has rapidly expanded with the results of GCTA studies on everything from obesity to intelligence to autism regularly appearing in prestigious science journals [3-17]. Like a typical GWAS, GCTA involves scanning hundreds of thousands of polymorphisms (specifically, a common form of gene variant known as a single nucleotide polymorphism [SNP]) of thousands of persons. But instead of trying to identify individual polymorphisms more common among those who share a given trait, the goal is to determine whether or not this trait similarity can be associated with a large number of (unidentified) polymorphisms. In other words, an estimate is generated as to how much of the genetic variance (i.e., heritability) of a trait can be accounted for by shared SNPs. These heritability estimates are termed “SNP-based” and differ from standard heritability estimates that rely upon assumptions of genetic relatedness, such as twin or family studies.

For example, the much used twin study methodology is based upon the assumption that monozygotic (MZ) twins share 100% of their inherited genes, as compared to “fraternal” or dizygotic (DZ) twins who share on average 50% of their inherited genes. If MZ twins show greater concordance for a trait of interest than DZ twins, this greater concordance is ascribed to greater genetic concordance, with the presumed genetic relationship of 1 to .5 serving as the basis of the heritability estimate. By contrast, GCTA does not rely upon genetic relatedness. In fact, it is critical to GCTA that those who are studied be unrelated.

The twin study methodology has long been critiqued as being based upon a number of faulty assumptions, in particular the assumption that the environments (pre and postnatal) of MZ twins are not more alike than DZ twins. Were the environments of MZ twins more alike than those of DZ twins (as numerous studies have indicated), trait similarities ascribed to the greater genetic similarity of MZ twins might in fact be due to greater environmental similarity, significantly inflating heritability estimates. Thus far, GCTA studies appear to have proven critics of the twin study methodology right, yielding significantly lower heritability estimates (e.g., an estimation of “callous-unemotional” behavior based on the twin study methodology yielded a heritability estimate of 64%, as compared to a GCTA that yielded a heritability estimate of 7% [9]). Long-time defenders of the accuracy of twin studies can now be found speculating, in light of GCTA findings, that “the estimates of … heritability from twin and family studies are biased upwards, for example, by not properly accounting for… (common) environmental factors” [7]. GCTA studies, however, just like their twin study predecessors, suffer from serious methodological problems that call into doubt the legitimacy of their findings. They, too, are likely to generate spurious associations and faulty estimates of genetic contributions to variation in traits.

GCTA studies are highly vulnerable to confounding by population stratification

Genetic studies (by whatever method) that have so far purported to identify SNPs associated with one or another trait have more often than not been false positives [18-20]. A prime cause of this has been the failure of researchers to take adequately into account population stratification. Population stratification refers to the fact that frequencies of polymorphisms can differ in different populations and subpopulations (ethnic or geographical) due to unique ancestral patterns of migration, mating practices, and reproductive expansions and contractions. Nearly all outbred (i.e., nonfamilial) populations exhibit population stratification, including populations deemed relatively homogenous (e.g., among Icelanders). One well-known example of a false association between a polymorphism and a trait was the link between the dopamine receptor gene DRD2 and alcoholism. Initial studies suggested a strong association, but subsequent investigations found none when more effective controls for population stratification were imposed. In retrospect, it is clear why this initial result was vulnerable to confounding due to population stratification: DRD2 alleles vary widely by ethnic ancestry, and ethnic differences in alcoholism rates are pronounced.

Recall that GCTA studies are supposed to involve unrelated persons. From the standpoint of population genetics, however, relatedness is not simply a matter of being someone’s second cousin. While the designers of the GCTA method are aware of the problem posed by population stratification and attempt to correct for it, there is growing evidence that the techniques they have employed are wholly inadequate and that GCTA itself is particularly vulnerable to confounding due to population stratification [21-23].

Consider a recent GCTA study by Plomin et al., who reported a SNP-based heritability estimate of 35% for “general cognitive ability” among UK 12 year olds (as compared to a twin heritability estimate of 46%) [8]. According to the Wellcome Trust “genetic map of Britain,” striking patterns of genetic clustering (i.e. population stratification) exist within different geographic regions of the UK, including distinct genetic clusterings comprised of the residents of the South, South-East and Midlands of England; Cumbria, Northumberland and the Scottish borders; Lancashire and Yorkshire; Cornwall; Devon; South Wales; the Welsh borders; Anglesey in North Wales; Scotland and Ireland; and the Orkney Islands [8]. Now consider the title of a study from the University and College Union: “Location, Location, Location – the widening education gap in Britain and how where you live determines your chances” [9]. This state of affairs (not at all unique to the UK), combined with widespread geographic population stratification, is fertile ground for spurious heritability estimates.

Further problems of GCTA

While I have focused on population stratification, there are at least two other things to note about GCTA studies. First, GCTA assumes “additive genetic variance,” i.e., that each polymorphism contributes a tiny amount to heritability and that the “effects” of all the polymorphisms can simply be added together. This ignores widespread evidence that genes influence the effects of other genes in highly complex, non-additive ways (“G x G” interactions), and that the environment influences the manner in which genes are transcribed in equally complex ways (“G x E” interactions). Second, all GCTA estimates are derived from looking only at SNPs, but SNPs are only one form of genetic polymorphism. There are numerous other kinds of prevalent genetic variations, including copy number variations, multiple copies of segments of genes, whole genes, and even whole chromosomes. There is no rational scientific reason to assume that SNPs are the only relevant, or even the “most important” form of genetic variation (other than the fact that SNP data is easiest to obtain).

The simplistic model of additive genetic variance upon which GCTA relies (and which assumes no epistasis and no gene x environment interactions) is out of touch with current understanding of the complex, multifactorial nature of most human traits that have a genetic component. Consider Type I diabetes (T1D). Fewer than 10% of individuals who possess gene variants associated with T1D progress to the clinical disease;

The identification of exogenous factors triggering and driving β-cell destruction [which results in the clinical disease] offers a potential means for intervention aimed at the prevention of T1D. Therefore, it is important to pursue studies on the role of environmental factors in the pathogenesis of this disease. Environmental modification is likely to offer the most powerful strategy for effective prevention of T1D, because such an approach can target the whole population or at least that proportion of the population carrying increased genetic disease susceptibility; therefore, preventing both sporadic and familial T1D, if successful [24, p. 13].

Or consider the effects of developmental stress on brain development and behavior. Monkeys generated from stressed mothers show significantly reduced hippocampal neurogenesis and significantly reduced hippocampal volume with corresponding cognitive and behavioral effects. Likewise in humans, prenatal stress has been associated with a wide array of adverse developmental cognitive and behavioral outcomes [25].

What these examples show is that if we want to understand human traits that have a genetic component, we must turn away from an excessive and offtimes exclusive focus upon genetic polymorphisms and take a more holistic approach, one in which disease and health are seen as attributes of plastic, adaptive organisms functioning within particular environments.

Advocates of GCTA, however, tell us that in order to find the multitude of polymorphisms of tiny effect underlying heritability estimates we must undertake ever larger studies involving hundreds of thousands of persons. These polymorphisms of tiny effect, however, are so many ghosts and the search for them is the last gasp of a failed paradigm. Do we really want to squander our time and resources chasing ghosts?

References:

1. Aschard, H., et al., Inclusion of gene-gene and gene-environment interactions unlikely to dramatically improve risk prediction for complex diseases. Am J Hum Genet, 2012. 90(6): p. 962-72.

2. Yang, J., et al., Common SNPs explain a large proportion of the heritability for human height. Nat Genet, 2010. 42(7): p. 565-9.

3. Yang, L., et al., Polygenic transmission and complex neuro developmental network for attention deficit hyperactivity disorder: genome-wide association study of both common and rare variants. Am J Med Genet B Neuropsychiatr Genet, 2013. 162B(5): p. 419-30.

4. Rietveld, C.A., et al., Molecular genetics and subjective well-being. Proc Natl Acad Sci U S A, 2013. 110(24): p. 9692-7.

5. Llewellyn, C.H., et al., Finding the missing heritability in pediatric obesity: the contribution of genome-wide complex trait analysis. Int J Obes (Lond), 2013.

6. Keller, M.F., et al., Using genome-wide complex trait analysis to quantify ‘missing heritability’ in Parkinson’s disease. Hum Mol Genet, 2012. 21(22): p. 4996-5009.

7. Vinkhuyzen, A.A., et al., Common SNPs explain some of the variation in the personality dimensions of neuroticism and extraversion. Transl Psychiatry, 2012. 2: p. e102.

8. Plomin, R., et al., Common DNA markers can account for more than half of the genetic influence on cognitive abilities. Psychol Sci, 2013. 24(4): p. 562-8.

9. Viding, E., et al., Genetics of callous-unemotional behavior in children. PLoS One, 2013. 8(7): p. e65789.

10. Vrieze, S.I., et al., Three mutually informative ways to understand the genetic relationships among behavioral disinhibition, alcohol use, drug use, nicotine use/dependence, and their co-occurrence: twin biometry, GCTA, and genome-wide scoring. Behav Genet, 2013. 43(2): p. 97-107.

11. Power, R.A., et al., Estimating the heritability of reporting stressful life events captured by common genetic variants. Psychol Med, 2013. 43(9): p. 1965-71.

12. Trzaskowski, M., et al., First Genome-Wide Association Study on Anxiety-Related Behaviours in Childhood. PLoS ONE, 2013. 8(4): p. e58676.

13. Speed, D., et al., Improved Heritability Estimation from Genome-wide SNPs. American journal of human genetics, 2012. 91(6): p. 1011-1021.

14. Yang, J., et al., Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. PLoS Genet, 2013. 9(3): p. e1003355.

15. Watson, C.T., et al., Estimating the proportion of variation in susceptibility to multiple sclerosis captured by common SNPs. Sci Rep, 2012. 2: p. 770.

16. Klei, L., et al., Common genetic variants, acting additively, are a major source of risk for autism. Mol Autism, 2012. 3(1): p. 9.

17. Lee, S.H., et al., Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet, 2011. 88(3): p. 294-305.

18. Bosker, F.J., et al., Poor replication of candidate genes for major depressive disorder using genome-wide association data. Molecular Psychiatry, 2011. 16(5): p. 516-32.

19. Chabris, C.F., et al., Most reported genetic associations with general intelligence are probably false positives. Psychol Sci, 2012. 23(11): p. 1314-23.

20. Ioannidis, J.P., Non-replication and inconsistency in the genome-wide association setting. Hum Hered, 2007. 64(4): p. 203-13.

21. Browning, S.R. and B.L. Browning, Population structure can inflate SNP-based heritability estimates. Am J Hum Genet, 2011. 89(1): p. 191-3; author reply 193-5.

22. Janss, L., et al., Inferences from genomic models in stratified populations. Genetics, 2012. 192(2): p. 693-704.

23. Browning, S.R. and B.L. Browning, Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort. Hum Genet, 2013. 132(2): p. 129-38.

24. Knip, M. and O. Simell, Environmental triggers of type 1 diabetes. Cold Spring Harb Perspect Med, 2012. 2(7): p. a007690.

25. Coe, C. L., Kramer, M., Czéh, B., Gould, E., Reeves, A. J., Kirschbaum, C. & Fuchs, E. (2003) Prenatal stress diminishes neurogenesis in the dentate gyrus of juvenile rhesus monkeys. Biological Psychiatry 54(10):1025–34