Jumping genes, gene loss and genome dark matter
New map of copy number variation in the human genome is a resource for human genetics
Jan Aerts, Wellcome Trust Sanger Institute
However, the team cautions that they have not found large numbers of candidates that might alter susceptibility to complex diseases such as diabetes or heart disease among the common structural variants. They suggest strategies for finding this 'dark matter' of genetic variation.
Human genomes differ because of single-letter variations in the genetic code and also because whole segments of the code might be deleted or multiplied in different human genomes. These larger, structural differences are called copy number variants (CNVs). The new research to map and characterize CNVs is of a scale and a power unmatched to date, involving hundreds of human genomes, billions of data points and many thousands of CNVs.
"This study is more than ten times as powerful as our first map, published three years ago," explains Dr Matt Hurles from the Wellcome Trust Sanger Institute and a leader on the project, "and much more detailed than any other. Importantly, we have also assigned the CNVs to a specific genetic background so that they can be readily examined in disease studies carried out by others, such as the Wellcome Trust Case Control Consortium.
"Nevertheless, we have not found large numbers of common CNVs that we can tie strongly to disease. There remains much to be discovered and much to understand and our freely available genotyped collection will drive that discovery."
The results show that any two genomes differ by more than 1000 CNVs, or around 0.8% of a person's genome sequence. Most of these CNVs are deletions, with a minority being duplications. Two consequences are particularly striking in this study of apparently healthy people. First, 75 regions have jumped around in the genomes of these samples: second, more than 250 genes can lose one of the two copies in our genome without obvious consequences and a further 56 genes can fuse together potentially to form new composite genes.
"This paper detailing common CNVs in different world populations, and providing the first glimpse into evolutionary biology of such class of human variation, is unquestionably one of the most important advances in human genome research since the completion of a reference human genome," says Professor James R. Lupski, Vice Chair of the department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas. "It complements the cataloguing of single nucleotide variation delineated in the HapMap Project and will both enable some new approaches to, and further augment other studies of, basic human biology relevant to health and disease."
The results also give, for the first time, a minimum measure of the rate of CNV mutation: at least one in 17 children will have a new CNV. In many cases, that CNV will have no obvious clinical consequences. However, for some the effects are severe. In those cases the data are captured in the DECIPHER database, a repository of clinical information about CNVs designed to aid the diagnosis of rare disorders in young children.
But CNVs are not only about here and now; they are also ancient legacies of how our ancestors adapted to their environments. Among the most impressive variations between populations are CNVs that modify the activity of the immune system, known to be evolving rapidly in human populations, and genes implicated in muscle function. The researchers propose that the consequences of these CNVs can be dissected in population studies.
The team scanned 42 million locations on the genomes of 40 people, half of European ancestry and half of West-African ancestry. The scale of the method meant they could detect CNVs as small as 450 bases occurring in one in 20 individuals. However, the researchers concede that their map of common variants will not account for much of the 'dark matter' of the genome - the missing heritability where, despite diligent searches, genetic variants have not been found for common disease.
The research group have maximized the value of their research by not only mapping the CNVs, but by also genotyping them - assigning them to a specific genetic background that makes them readily useful in wider genetic studies, such as the Wellcome Trust Case Control Consortium.