1.  Symptoms and markers of symptom severity in asthma—content validity of the asthma symptom diary 
Background and objectives
The American Thoracic Society/European Respiratory Society (ATS/ERS) Task Force acknowledged the multi-faceted nature of asthma in its recent definition of asthma control as a summary term capturing symptoms, reliever use, frequency/severity of exacerbations, lung function, and future risk and the Global Initiative for Asthma (GINA) defines the clinical manifestations (well established markers of asthma severity) of asthma to include symptoms, sleep disturbances, limitations of daily activity, impairment of lung function, and use of rescue medications. The objectives of this qualitative work were to identify symptoms and markers of symptom severity relevant to patients with moderate to severe asthma and to evaluate the content validity of the asthma symptom diary (ASD).
A qualitative interview study was conducted using a purposive sample of symptomatic adult and adolescent (≥12 years) subjects with asthma. Concept elicitation (CE) interviews (n = 50) were conducted to identify core asthma symptoms and symptom-related clinical markers, followed by cognitive interviews (n = 24) to ensure patient comprehension of the items, instructions and response options. CE interviews were coded using ATLAS.ti for content analysis.
The study sample had a diverse range of symptom severity, level of symptom control, sociodemographic and socioeconomic status. The most frequently reported symptoms in adults were chest tightness (n = 33/34; 97.1%), wheezing (n = 31; 91.2%), coughing (n = 30; 88.2%), and shortness of breath (n = 25; 73.5%); in adolescents they were wheezing (n = 14/16; 87.5%), coughing (n = 13; 81.3%), and chest tightness (n = 11; 68.8%). Adults identified chest tightness followed by shortness of breath as their most severe symptoms; while adolescents reported coughing and chest tightness as their most severe symptoms. Sleep awakenings and limitations in day-to-day activities were frequent symptom-related clinical markers. Day-to-day variability and differences between daytime and nighttime symptom experiences reported by subjects resulted in the need for the ASD to be administered twice daily. Cognitive interviews indicated that subjects found the revised ASD items clear and easy to understand.
This study supports the content validity of the revised ASD, showing it to be consistent with patient experiences and ready for further psychometric testing.
PMCID: PMC4336744
Asthma; Symptoms; Patient-reported outcome; Instrument development; Qualitative
2.  Extending reference assembly models 
Genome Biology  2015;16(1):13.
The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
PMCID: PMC4305238  PMID: 25651527
3.  The Challenge of Small-Scale Repeats for Indel Discovery 
Repetitive sequences are abundant in the human genome. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by small-scale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. We also discuss the de Bruijn graph sequence assembly paradigm that is emerging as the most popular and promising approach for detecting indels. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations.
PMCID: PMC4306302
next-generation sequencing; sequence assembly; sequence analysis; variant detection; indel mutation; repetitive sequences; nucleic acid
4.  Asthma Outcomes: Composite Scores of Asthma Control 
Current asthma guidelines recommend assessing the level of a patient’s asthma control. Consequently, there is increasing use of asthma control as an outcome measure in clinical research studies. Several composite assessment instruments have been developed to measure asthma control.
National Institutes of Health (NIH) institutes and federal agencies convened an expert group to propose the most appropriate standardized composite score of asthma control instruments to be used in future asthma studies.
We conducted a comprehensive search of PubMed, using both the National Library of Medicine’s Medical Subject Headings (MeSH) and key terms to identify studies that attempted to develop and/or test composite score instruments for asthma control. We classified instruments as core (required in future studies), supplemental (used according to study aims and standardized), or emerging (requiring validation and standardization). This work was discussed at an NIH-organized workshop convened in March 2010 and finalized in September 2011.
We identified 17 composite score instruments with published validation information; all had comparable content. Eight instruments demonstrated responsiveness over time; 3 demonstrated responsiveness to treatment. A minimal clinically important difference has been established for 3 instruments. The instruments have demographic limitations; some are proprietary, and their use could be limited by cost.
Two asthma composite score instruments are sufficiently validated for use in adult populations, but additional research is necessary to validate their use in nonwhite populations. Gaps also exist in validating instruments for pediatric populations.
PMCID: PMC4269334  PMID: 22386507
Asthma Control Questionnaire; Asthma Control Test; Asthma Therapy Assessment Questionnaire; childhood Asthma Control Test
5.  Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica 
Genome Biology  2014;15(11):506.
The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate.
Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the ‘pan-genome’ of three divergent rice varieties and document several megabases of each genome absent in the other two.
Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0506-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4268812  PMID: 25468217
6.  Longitudinal Validation of a Tool for Asthma Self-Monitoring 
Pediatrics  2013;132(6):e1554-e1561.
To establish longitudinal validation of a new tool, the Asthma Symptom Tracker (AST). AST combines weekly use of the Asthma Control Test with a color-coded graph for visual trending.
Prospective cohort study of children age 2 to 18 years admitted for asthma. Parents or children (n = 210) completed baseline AST assessments during hospitalization, then over 6 months after discharge. Concurrent with the first 5 AST assessments, the Asthma Control Questionnaire (ACQ) was administered for comparison.
Test–retest reliability (intraclass correlation) was moderate, with a small longitudinal variation of AST measurements within subjects during follow-ups. Internal consistency was strong at baseline (Cronbach’s α 0.70) and during follow-ups (Cronbach’s α 0.82–0.90). Criterion validity demonstrated a significant correlation between AST and ACQ scores at baseline (r = −0.80, P < .01) and during follow-ups (r = −0.64, −0.72, −0.63, and −0.69). The AST was responsive to change over time; an increased ACQ score by 1 point was associated with a decreased AST score by 2.65 points (P < .01) at baseline and 3.11 points (P < .01) during follow-ups. Discriminant validity demonstrated a strong association between decreased AST scores and increased oral corticosteroid use (odds ratio 1.13, 95% confidence interval, 1.10–1.16, P < .01) and increased unscheduled acute asthma visits (odds ratio 1.23, 95% confidence interval, 1.18–1.28, P < .01).
The AST is reliable, valid, and responsive to change over time, and can facilitate ongoing monitoring of asthma control and proactive medical decision-making in children.
PMCID: PMC4074668  PMID: 24218469
asthma control; pediatrics; self-monitoring; self-management
7.  Reducing INDEL calling errors in whole genome and exome sequencing data 
Genome Medicine  2014;6(10):89.
INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts.
We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%).
Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data.
Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-014-0089-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4240813  PMID: 25426171
8.  High-coverage sequencing and annotated assemblies of the budgerigar genome 
GigaScience  2014;3:11.
Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome.
We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) -- the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing.
Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.
PMCID: PMC4109783  PMID: 25061512
Melopsittacus undulatus; Budgerigar; Parakeet; Next-generation sequencing; Hybrid assemblies; Optical maps; Vocal learning
9.  The advantages of SMRT sequencing 
Genome Biology  2013;14(6):405.
Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.
PMCID: PMC3706782
11.  The DNA Data Deluge 
IEEE spectrum  2013;50(7):26-33.
PMCID: PMC4048922
12.  Sixty years of genome biology 
Genome Biology  2013;14(4):113.
Sixty years after Watson and Crick published the double helix model of DNA's structure, thirteen members of Genome Biology's Editorial Board select key advances in the field of genome biology subsequent to that discovery.
PMCID: PMC3663092  PMID: 23651518
13.  Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies 
Briefings in Bioinformatics  2011;14(2):213-224.
Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at
PMCID: PMC3603210  PMID: 22199379
DNA Sequencing; genome assembly; assembly forensics; visual analytics
14.  Computational thinking in the era of big data biology 
Genome Biology  2012;13(11):177.
PMCID: PMC3580488  PMID: 23194371
15.  Cultivation and Complete Genome Sequencing of Gloeobacter kilaueensis sp. nov., from a Lava Cave in Kīlauea Caldera, Hawai'i 
PLoS ONE  2013;8(10):e76376.
The ancestor of Gloeobacter violaceus PCC 7421T is believed to have diverged from that of all known cyanobacteria before the evolution of thylakoid membranes and plant plastids. The long and largely independent evolutionary history of G. violaceus presents an organism retaining ancestral features of early oxygenic photoautotrophs, and in whom cyanobacteria evolution can be investigated. No other Gloeobacter species has been described since the genus was established in 1974 (Rippka et al., Arch Microbiol 100:435). Gloeobacter affiliated ribosomal gene sequences have been reported in environmental DNA libraries, but only the type strain's genome has been sequenced. However, we report here the cultivation of a new Gloeobacter species, G. kilaueensis JS1T, from an epilithic biofilm in a lava cave in Kīlauea Caldera, Hawai'i. The strain's genome was sequenced from an enriched culture resembling a low-complexity metagenomic sample, using 9 kb paired-end 454 pyrosequences and 400 bp paired-end Illumina reads. The JS1T and G. violaceus PCC 7421T genomes have little gene synteny despite sharing 2842 orthologous genes; comparing the genomes shows they do not belong to the same species. Our results support establishing a new species to accommodate JS1T, for which we propose the name Gloeobacter kilaueensis sp. nov. Strain JS1T has been deposited in the American Type Culture Collection (BAA-2537), the Scottish Marine Institute's Culture Collection of Algae and Protozoa (CCAP 1431/1), and the Belgian Coordinated Collections of Microorganisms (ULC0316). The G. kilaueensis holotype has been deposited in the Algal Collection of the US National Herbarium (US# 217948). The JS1T genome sequence has been deposited in GenBank under accession number CP003587. The G+C content of the genome is 60.54 mol%. The complete genome sequence of G. kilaueensis JS1T may further understanding of cyanobacteria evolution, and the shift from anoxygenic to oxygenic photosynthesis.
PMCID: PMC3806779  PMID: 24194836
16.  Genotyping in the Cloud with Crossbow 
Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high coverage short read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.
PMCID: PMC3465669  PMID: 22948728
short reads; read alignment; SNP calling; cloud computing; hadoop; software package
17.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species 
Bradnam, Keith R | Fass, Joseph N | Alexandrov, Anton | Baranay, Paul | Bechner, Michael | Birol, Inanç | Boisvert, Sébastien | Chapman, Jarrod A | Chapuis, Guillaume | Chikhi, Rayan | Chitsaz, Hamidreza | Chou, Wen-Chi | Corbeil, Jacques | Del Fabbro, Cristian | Docking, T Roderick | Durbin, Richard | Earl, Dent | Emrich, Scott | Fedotov, Pavel | Fonseca, Nuno A | Ganapathy, Ganeshkumar | Gibbs, Richard A | Gnerre, Sante | Godzaridis, Élénie | Goldstein, Steve | Haimel, Matthias | Hall, Giles | Haussler, David | Hiatt, Joseph B | Ho, Isaac Y | Howard, Jason | Hunt, Martin | Jackman, Shaun D | Jaffe, David B | Jarvis, Erich D | Jiang, Huaiyang | Kazakov, Sergey | Kersey, Paul J | Kitzman, Jacob O | Knight, James R | Koren, Sergey | Lam, Tak-Wah | Lavenier, Dominique | Laviolette, François | Li, Yingrui | Li, Zhenyu | Liu, Binghang | Liu, Yue | Luo, Ruibang | MacCallum, Iain | MacManes, Matthew D | Maillet, Nicolas | Melnikov, Sergey | Naquin, Delphine | Ning, Zemin | Otto, Thomas D | Paten, Benedict | Paulo, Octávio S | Phillippy, Adam M | Pina-Martins, Francisco | Place, Michael | Przybylski, Dariusz | Qin, Xiang | Qu, Carson | Ribeiro, Filipe J | Richards, Stephen | Rokhsar, Daniel S | Ruby, J Graham | Scalabrin, Simone | Schatz, Michael C | Schwartz, David C | Sergushichev, Alexey | Sharpe, Ted | Shaw, Timothy I | Shendure, Jay | Shi, Yujian | Simpson, Jared T | Song, Henry | Tsarev, Fedor | Vezzi, Francesco | Vicedomini, Riccardo | Vieira, Bruno M | Wang, Jun | Worley, Kim C | Yin, Shuangye | Yiu, Siu-Ming | Yuan, Jianying | Zhang, Guojie | Zhang, Hao | Zhou, Shiguo | Korf, Ian F
GigaScience  2013;2:10.
The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.
In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.
Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
PMCID: PMC3844414  PMID: 23870653
Genome assembly; N50; Scaffolds; Assessment; Heterozygosity; COMPASS
18.  Hybrid error correction and de novo assembly of single-molecule sequencing reads 
Nature biotechnology  2012;30(7):693-700.
Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
PMCID: PMC3707490  PMID: 22750884
19.  The DNA60IFX contest 
Genome Biology  2013;14(6):124.
PMCID: PMC3706964  PMID: 23809492
20.  Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.) 
Genome Biology  2013;14(5):R41.
Sacred lotus is a basal eudicot with agricultural, medicinal, cultural and religious importance. It was domesticated in Asia about 7,000 years ago, and cultivated for its rhizomes and seeds as a food crop. It is particularly noted for its 1,300-year seed longevity and exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan.
The genome of the China Antique variety of the sacred lotus was sequenced with Illumina and 454 technologies, at respective depths of 101× and 5.2×. The final assembly has a contig N50 of 38.8 kbp and a scaffold N50 of 3.4 Mbp, and covers 86.5% of the estimated 929 Mbp total genome size. The genome notably lacks the paleo-triplication observed in other eudicots, but reveals a lineage-specific duplication. The genome has evidence of slow evolution, with a 30% slower nucleotide mutation rate than observed in grape. Comparisons of the available sequenced genomes suggest a minimum gene set for vascular plants of 4,223 genes. Strikingly, the sacred lotus has 16 COG2132 multi-copper oxidase family proteins with root-specific expression; these are involved in root meristem phosphate starvation, reflecting adaptation to limited nutrient availability in an aquatic environment.
The slow nucleotide substitution rate makes the sacred lotus a better resource than the current standard, grape, for reconstructing the pan-eudicot genome, and should therefore accelerate comparative analysis between eudicots and monocots.
PMCID: PMC4053705  PMID: 23663246
21.  Allergic Bronchopulmonary Aspergillosis Presenting as Chronic Cough in an Elderly Woman Without Previously Documented Asthma 
The Permanente Journal  2013;17(2):e103-e108.
This is a case report from a specialist point of view that includes a comprehensive review of the clinical course pre- and postconsultation along with a brief but pertinent review of the literature as it relates to this particular unusual and protracted case, which was ultimately successfully diagnosed and treated.
A nonsmoking woman in her mid-70s presents to the allergist for consultation of a chronic cough of almost 3-years’ duration without a specific diagnosis as to etiology in spite of numerous diagnostic tests and therapeutic trials.
This is a case report from a specialist point of view that includes a comprehensive review of her clinical course pre- and postconsultation along with a brief but pertinent review of the literature as it relates to this particular unusual and protracted case, which was ultimately successfully diagnosed and treated.
PMCID: PMC3662291  PMID: 23704852
22.  Current challenges in de novo plant genome sequencing and assembly 
Genome Biology  2012;13(4):243.
Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community.
PMCID: PMC3446297  PMID: 22546054
DNA sequencing; genome assembly; plant genomics
23.  De Novo Gene Disruptions in Children on the Autistic Spectrum 
Neuron  2012;74(2):285-299.
Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders.
PMCID: PMC3619976  PMID: 22542183
24.  Illuminating the genetics of complex human diseases 
BMC Proceedings  2012;6(Suppl 6):O4.
PMCID: PMC3467579
25.  The rise of a digital immune system 
GigaScience  2012;1:4.
Driven by million-fold improvements in biotechnology, biology is increasingly shifting towards high-resolution, quantitative approaches to study the molecular dynamics of entire populations. One exciting application enabled by this new era of biology is the “digital immune system”. It would work in much the same way as an adaptive, biological immune system: by observing the microbial landscape, detecting potential threats, and neutralizing them before they spread beyond control. With the potential to have an enormous impact on public health, it is time to integrate the necessary biotechnology, computational, and organizational systems to seed the development of a global, sequencing-based pathogen surveillance system.
PMCID: PMC3617452  PMID: 23587178

