|Home | About | Journals | Submit | Contact Us | Français|
Whole-genome sequencing is a potentially powerful tool for the diagnosis of genetic diseases. Here, we used sequencing-by-ligation to sequence the genome of an 11-month-old breast-fed girl with xanthomas and very high plasma cholesterol levels (1023 mg/dl). Her parents had normal plasma cholesterol levels and reported no family history of hypercholesterolemia, suggesting either an autosomal recessive disorder or a de novo mutation. Known genetic causes of severe hypercholesterolemia were ruled out by sequencing the responsible genes (LDLRAP, LDLR, PCSK9, APOE and APOB), and sitosterolemia was ruled out by documenting a normal plasma sitosterol:cholesterol ratio. Sequencing revealed 3 797 207 deviations from the reference sequence, of which 9726 were nonsynonymous single-nucleotide substitutions. A total of 9027 of the nonsynonymous substitutions were present in dbSNP or in 21 additional individuals from whom complete exonic sequences were available. The 699 novel nonsynonymous substitutions were distributed among 604 genes, 23 of which were single-copy genes that each contained 2 nonsynonymous substitutions consistent with an autosomal recessive model. One gene, ABCG5, had two nonsense mutations (Q16X and R446X). This finding indicated that the infant has sitosterolemia. Thus, whole-genome sequencing led to the diagnosis of a known disease with an atypical presentation. Diagnosis was confirmed by the finding of severe sitosterolemia in a blood sample obtained after the infant had been weaned. These findings demonstrate that whole-genome (or exome) sequencing can be a valuable aid to diagnose genetic diseases, even in individual patients.
Identification of a causal mutation provides the most definitive diagnosis for a genetic disease. For disorders caused by mutations in a limited number of genes, conventional Sanger sequencing can efficiently identify the culprit mutation. For many single-gene disorders, diagnosis by conventional resequencing of candidate genes is not feasible because the responsible gene(s) has not been identified, or because the disease is highly genetically heterogenous. The use of massively parallel DNA sequencing technologies (1) has the potential to identify the causative mutation by resequencing the whole genome (or exome) of affected individuals.
Recently, Ng et al. (2) used exome resequencing to identify the gene responsible for Miller syndrome, a rare autosomal recessive disorder of unknown etiology. A single candidate gene (DHODH) was identified by sequencing the exomes of four affected individuals from three independent kindreds. Large-scale sequencing can also be useful in evaluating patients with a suspected genetic disease in whom the diagnosis is uncertain. Choi et al. (3) identified mutations associated with congenital chloride diarrhea in several patients from consanguineous kindreds who had a suspected diagnosis of Bartter syndrome, a renal salt-wasting disease.
The utility of whole-genome resequencing for genetic diagnosis was further illustrated by Lupski et al. (4), who sequenced the genome of a proband with Charcot–Marie–Tooth disease, an inherited peripheral neuropathy that can be caused by mutations in 39 separate loci (5). Although the entire genome was sequenced, the authors focused on only those genes known to cause the neuropathic condition. All affected individuals in the family of the proband were found to be compound heterozygotes for SH3TC2 mutations identified in the proband. One of the mutations (R954X) had been found previously in unrelated patients with the disease (6).
Severe mongenic hypercholesterolemia can be caused by defects in several different genes (7). Familial hypercholesterolemia (FH), an autosomal dominant disorder due to mutations in the gene encoding the low-density lipoprotein receptor (LDLR), is the most common and severe genetic form of hypercholesterolemia (8). The LDLR is a cell-surface receptor that mediates clearance of low-density lipoproteins (LDL) from the circulation. Defects in other proteins required for LDL clearance cause phenocopies of FH: missense mutations in apolipoprotein (apo) B (9), a ligand for the LDLR, or in PCSK9 (10), a proprotein convertase that promotes degradation of the LDLR, cause autosomal dominant hypercholesterolemia. Mutations in ARH (encoded by LDLRAP), an adaptor protein required for LDLR internalization (11), or inactivating mutations in APOE (12), another LDLR ligand, produce recessive forms of hypercholesterolemia. All these disorders present with severe hypercholesterolemia, xanthomas (cholesterol deposits in the skin and tendons) and premature coronary artery disease (7).
Hypercholesterolemia can also be caused by mutations in proteins required for the elimination of sterols. A rare, recessive disorder, sitosterolemia, is caused by mutations in either of two ATP-binding cassette (ABC) hemitransporters, ABCG5 or ABCG8, that heterodimerize to form a duplex that promotes sterol excretion (13). Mutations in either ABCG5 or ABCG8 result in increased fractional absorption and decreased biliary excretion of neutral sterols (14). Plant sterols (e.g. sitosterol, campesterol, stigmasterol), in addition to cholesterol, accumulate in the blood and body tissues of patients with this disorder. Normally plant sterols are selectively eliminated from the body in the intestine by secretion into the gut lumen or in the liver via secretion into the bile. Cholesterol levels are more variable in sitosterolemia than in other genetic hyperlipidemias, but can be extremely elevated in some patients (15). The diagnosis of this disease rests on documenting elevated plasma levels of plant sterols, which are normally very low (<1 mg/dl). Patients with sitosterolemia have levels of plasma sitosterol that are >50-fold elevated (15).
In the present study, we used whole-genome sequencing to identify the molecular defect in an infant with severe hypercholesterolemia of unknown etiology.
The proband for this study was a healthy 11-month-old girl who presented with subcutaneous xanthomas over the Achilles tendon (Fig. 1A). Her plasma cholesterol level was 1023 mg/dl with an LDL-C of 837 mg/dl, HDL-C of 54 mg/dl and triglyceride of 120 mg/dl. Thyroid function and liver function tests were normal as was her urinalysis. A pedigree of the proband's family, with plasma lipid and lipoprotein levels determined 1 month after presentation, is shown in Figure 1B. The proband was born to unrelated Romanian parents after an unremarkable pregnancy. The plasma lipid levels of both parents were within the normal range. The child was exclusively breast-fed for the first 6 months of life and then was slowly weaned off breast milk. At the time of diagnosis, breast milk comprised 80% of her dietary intake.
Since the inheritance pattern of the hypercholesterolemia was most consistent with the disorder being recessive, we first ruled out sitosterolemia by measuring plasma levels of plant sterols using gas chromatography-mass spectroscopy (GC/MS). Although the absolute level of sitosterol (2.37 mg/dl) exceeded the normal range (0.2–1.0 mg/dl), it was well below those seen in sitosterolemic individuals (14–65 mg/dl) (16). Plant sterol levels are expressed as a fraction of the plasma cholesterol level since phytosterols are transported as consituents of cholesteryl-rich lipoproteins. The sitosterol:cholesterol and the campesterol:cholesterol ratios of the proband were within the normal range and similar to the levels of both parents. Next, we sequenced the exons of LDLRAP, the gene defective in autosomal recessive hypercholesterolemia (11) and identified no mutations. To rule out incomplete penetrance of a dominant form of hypercholesterolemia, we also sequenced the coding regions of LDLR and PCSK9 (17,18) and found no mutations (the oligonucleotides used to sequence the exons in these genes are available in the referenced papers).
To provide a more comprehensive screen for possible mutations causing the hypercholesterolemia in the child, whole-genome sequencing was performed by Complete Genomics, Inc. (Mountain View, CA, USA), using a sequencing-by-ligation method as described in detail previously (19). Sequencing of the proband's DNA sample yielded 138 gigabase (Gb) of mappable sequence, for an average fold coverage of 49× per base. A total of 3 797 207 deviations from the reference sequence were noted (Fig. 2A), of which 3 295 207 were single-nucleotide substitutions (including 2 245 981 heterozygous and 1 049 225 homozygous) and 502 000 were insertions, deletions or more complex rearrangements. Since most Mendelian disorders are caused by rare single-nucleotide substitutions in coding regions and exon splice junctions, we restricted our initial search to nonsynonymous variants and splice junction variants that were not present in the public repository of sequence variants (dbSNP) or in 16 exomes (2,20) and 5 genomes (J.C.C. and H.H.H., unpublished data) from 21 individuals who did not have hypercholesterolemia.
Of the 9726 nonsynonymous single-nucleotide variants identified in the proband, 8626 were present in dbSNP and a further 401 were present in at least one of the 21 sequenced individuals, leaving 699 novel nonsynonymous SNPs that were predicted to change an amino acid or a consensus splice site in 604 genes. Under an autosomal recessive model, mutations would be expected in both parental alleles. Two or more novel nonsynonymous variants were present in 42 genes, but 19 of these genes were known to be present in multiple copies in the genome. Next, we queried whether there were any nonsense mutations among the 23 single-copy genes that were found to contain two or more novel, nonsynonymous variants. A single gene, ABCG5, contained two different nonsense mutations: Q16X and R446X (Fig. 2B). Sanger sequencing confirmed both mutations in the proband and that her mother and father were heterozygotes for the Q16X and R446X mutations, respectively (data not shown). Both these mutations are incompatible with the expression of a functional protein. The ABCG5-R446X mutation was observed in a 10-year-old girl with sitosterolemia (21), whereas the Q16X mutation has not been reported previously.
Thus, the patient had two inactivating mutations in ABCG5, which is consistent with her having sitosterolemia. Elevated plasma levels of plant sterols are the sine qua non of this disorder, and yet the proband had normal phytosterol levels at presentation (Fig. 1A). Another blood sample was obtained when the proband was 2 years of age.
Figure 3A shows the plasma cholesterol levels of the proband measured at various intervals after the initial blood sample (Sample 1) was obtained. The patient was initially treated with high-dose statins plus ezetimibe. On this regimen, her cholesterol fell dramatically, but she developed hepatitis, so both medications were withdrawn and her cholesterol level increased to >600 mg/dl. Ezetimibe was reinstituted and her plasma cholesterol level progressively fell. A low dose of statin (rosuvastatin, 2.5 mg/day) was then added. She was slowly weaned off breast milk and stopped breastfeeding 1 year after the hypercholesterolemia was first detected. The second blood sample (Sample 2) was obtained 4 months after she stopped breastfeeding and assayed for campesterol and sitosterol levels together with an aliquot of frozen plasma (maintained at −80°C) from the original sample (Sample 1) (Fig. 3B). GC/MS revealed that plasma plant sterol concentrations increased markedly in the 1 year interval between the two samples. Her plasma sitosterol level was 8.4 mg/dl and campesterol level was 4.05 mg/dl while her plasma cholesterol level had fallen to 151 mg/dl. Her sitosterol:cholesterol ratio had increased >100-fold from 0.44 to 5.5 µg/mg. In contrast, the proband's cholesterol level, which was very high in her first sample, was now within the normal range. These results are consistent with the molecular diagnosis of sitosterolemia.
The major finding of this study is that whole-genome sequencing identified the culprit mutations and provided a definitive diagnosis of sitosterolemia in a severely hypercholesterolemic patient who did not have the classical hallmark feature of the disease. Although the patient's genome had more than 3.5 million sequence variations, the substantial majority of these could be excluded from consideration as disease-causing mutations because they are common among healthy individuals. The autosomal recessive inheritance pattern of hypercholesterolemia in the proband's family imposed the further requirement that both alleles contain rare mutation, which greatly reduces the number of candidate genes. Thus, whole-genome sequencing will not only be useful in revealing the etiology of unknown genetic diseases but also to diagnose patients with atypical presentation of known diseases.
Why did the patient not have sitosterolemia at presentation (15)? A likely contributing factor is that she was being breast-fed when the original sample was obtained. Since plant sterols are derived entirely from the diet (22), the plasma of the mother has low levels of these sterols. The only source of phytosterols in the milk is from plasma, so the child was exposed to much lower levels of plant sterols while being breastfed than when consuming a diet containing fruits and vegetables (23).
A more perplexing question is why plasma levels of cholesterol in the proband were so high while the child was breastfeeding, at a time when the circulating levels of plant sterols were not elevated? The accumulation of noncholesterol sterols has been implicated in contributing to the very low rates of cholesterol synthesis seen in patients with sitosterolemia. (24–26). Previously, we showed that stigmasterol, a plant sterol that accumulates in sitosterolemia, inhibits processing of a transcription factor, sterol regulatory element binding protein-2, which is required for the upregulation of cholesterol synthesis in response to reductions in cellular cholesterol levels (27). Since ABCG5 and ABCG8 have been found in all animals that synthesize cholesterol, but not in those that do not, we proposed that the rigorous exclusion of sterols other than cholesterol may be essential for the maintenance of normal cholesterol homeostasis (25). Interestingly, the proband in the present study developed severe hypercholesterolemia before accumulating very high levels of plant sterols. This finding suggests that the severe hypercholesterolemia observed in sitosterolemic individuals may reflect a failure to excrete cholesterol into bile, rather than disruption of cholesterol homeostasis by plant sterols.
Fortuitously, the proband was treated with ezetimibe, a lipid-lowering agent that reduces the absorption of plant sterol levels as well as cholesterol. Sitosterolemia has been shown previously to be effectively treated by blocking the absorption of neutral sterols using ezetimibe (28). When absorption of dietary sterols is blocked in a mouse model of sitosterolemia, cholesterol homeostasis is restored coincident with a reduction in plant sterol levels (24). The plasma cholesterol level progressively fell into the normal range with ezetimibe treatment, although the plant sterol levels remain elevated despite the patient consuming a low cholesterol, low-plant-sterol diet, which is characteristic of this disorder, even after more prolonged treatment with a sterol absorption inhibitor (28).
Although a growing number of reports attest the utility of both whole-genome and whole-exome sequencing for the identification of mutations in patients with heritable diseases, the choice of which strategy to pursue remains a subject of debate. Whole-genome sequencing is potentially comprehensive and obviates the potential problems associated with selection and incomplete capture of exons, but is substantially more expensive than exome sequencing using current generation parallel sequencing protocols and instrumentation. Since most known single-gene disorders are caused by mutations that alter the protein-coding sequences of genes (29), exome sequencing should suffice in most instances for the identification of mutations underlying these diseases. Nonetheless, the favorable cost–benefit ratio of exome sequencing is likely to decrease as sequencing costs continue to decline and throughput increases.
The proband for this study was an apparently healthy infant girl who came to clinical attention at the age of 11 months when she was found to have subcutaneous xanthomas over her Achilles tendon. Her parents provided written informed consent for genetic analyses to determine the basis for her hypercholesterolemia, and the family was enrolled into the study.
Whole-genome sequencing was performed by Complete Genomics Inc. using a sequencing-by-ligation method (19). Briefly, sequencing libraries were generated by fragmenting genomic DNA followed by recursive cutting with type IIS restriction enzymes. Directional adaptors were ligated to the fragments and the resulting circular substrates were replicated with Phi29 polymerase (RCR) to yield hundreds of copies of single-stranded DNA. The resulting concatamers were adsorbed to grid-patterned arrays, and combinatorial probe-anchor ligation chemistry was used to read 10 bases adjacent to each of the 8 anchor sites, providing 30–35 base paired-end reads per concatamer. The resulting mate-paired reads were aligned to the reference genome (NCBI Build 36.1) or recruited by the mapped mate-pair reads for local de novo assembly as well as for determining genotyping calls for each reference position for each genome. Data were provided as lists of sequence variants (SNPs and short indels) relative to the reference genome.
The exons and flanking intronic sequences of LDLRAP, LDLR, PCSK9 and APOE were sequenced by BigDye Terminator chemistry on an ABI3730XL automated sequencer (Applied Biosystems) using the protocol provided by the manufacturer.
Lipids were assayed in whole plasma and in lipoprotein fractions using enzymatic assays.
Sterols, including cholesterol, campesterol and sitosterol, were measured in petroleum ether extracts of plasma as described previously (30).
This work was supported by the National Institutes of Health (HL082896, HL72304 and HL20848). J.R. is supported by a training grant, 5TLIDK081181. Funding to pay the Open Access Charge was provided by the Howard Hughes Medical Institute.
We would like to thank Erica Solis and Fang Xu for excellent technical assistance.
Conflict of Interest statement. None declared.