The landscape of human genetics is rapidly changing, fueled by the advent of massively parallel sequencing technologies [1
]. New instruments from Roche (454), Illumina (GenomeAnalyzer), Life Technologies (SOLiD) and Helicos Biosciences (Heliscope) generate millions of short sequence reads per run, making it possible to sequence entire human genomes in a matter of weeks. These ‘next-generation sequencing’ (NGS) technologies have already been employed to sequence the constitutional genomes of several individuals [2–10
]. Ambitious efforts like the 1000 Genomes Project and the Personal Genomes Project [11
] hope to add thousands more. The first five cancer genomes to be published [12–17
] revealed thousands of novel somatic mutations and implicated new genes in tumor development and progression. Our knowledge of the genetic variants that underlie disease susceptibility, treatment response and other phenotypes will continually improve as these studies expand the catalog of DNA sequence variation in humans.
The genomes of at least 10 individuals have been sequenced to high coverage using NGS technologies (). The first such genome (Watson) was sequenced to ~7.4× coverage on the 454 GS (Roche) platform [9
], and included ~3.3 million single nucleotide polymorphisms (SNPs) of which 82% were already listed in the National Center for Biotechnology Information SNP database (dbSNP) [18
]. Remarkably, the nine personal genomes that followed on NGS technologies [2–8
] reported similar results in terms of SNPs: 3–4 million per genome, 80–90% of which overlapped dbSNP. This pattern is so robust, in fact, that many consider ~3 million SNPs with 80–90% dbSNP concordance (depending on the ethnicity of the sample) to be the ‘gold standard’ for SNP discovery in whole-genome sequencing (WGS). Another implication of this pattern is that individual genomes contain ~0.5 million novel SNPs, whose submission to public databases will cause exponential growth as WGS studies expand. Indeed, since the completion of the Watson genome in 2007, submissions to dbSNP have skyrocketed (). As of February 2010, dbSNP received over 100 million submissions for human, corresponding to 23.7 unique sequence variants of which more than half have been validated [18
Complete individual genomes and cancer genomes sequenced on massively parallel sequencing instruments
Growth of public database dbSNP from 2002 to 2010. Note exponential growth in submissions following the first genome sequenced on next-generation technology (Watson) in 2007.
NGS technologies show great promise for the study of the genetic underpinnings of human disease. WGS is particularly appealing because it can detect the full spectrum of genetic variants—SNPs, indels, structural variants (SVs) and copy number variatons (CNV)—that may contribute to a phenotype [19
]. Indeed, the complete genome sequences several human cancers—AML [12
], breast cancer [16
], melanoma [14
], lung cancer [15
] and glioblastoma [20
]—have dramatically expanded the catalog of acquired (somatic) changes that may contribute to tumor development and growth (). For Mendelian diseases, massively parallel sequencing of family pedigrees offers an effective means of identifying the variants and genes underlying inherited disease [21
]. Indeed, the recent sequencing and analysis of a proband with Charcot–Marie tooth syndrome [22
] demonstrates that these technologies have the potential for diagnostics in a clinical setting.
The value of massively parallel sequencing instruments for research is clearly illustrated by the widespread adoption of these platforms throughout North America, Europe, Asia and the Pacific (). The commoditization of NGS throughout the world suggests that a substantial portion of sequenced human genomes will be produced outside of major genome sequencing centers. Very soon, groups with little to no experience in working with massively parallel sequencing data will gain access to these powerful technologies. The challenges that they face—in terms of production, management, analysis and interpretation of incredible amounts of sequence data—are daunting indeed. Fortunately, major genome centers and other groups who pioneered both traditional and NGS of human genomes have already addressed many of the key issues. Their strategies and methods for high-throughput sequencing of human genomes are the focus of this review.
Figure 2: Distribution of NGS instruments by country (March 2010). Courtesy of next-generation sequencing maps maintained by Nick Loman  and James Hadfield .