Maternal plasma DNA is a mixture of maternal and fetal DNA; the fraction of fetal DNA ranges from a few percent or lower early in pregnancy to as high as ~50%2,7
, and generally increases with gestational age. Since the fetal genome is a combination of the four parental chromosomes, or haplotypes, as a result of random assortment and recombination during meiosis, three haplotypes exist in maternal plasma per genomic region: the maternal haplotype that is transmitted to the fetus, the maternal haplotype that is not transmitted, and the paternal haplotype that is transmitted. If the relative copy number of the untransmitted maternal haplotype is 1 - ε
, where ε
is the fetal DNA fraction, then the relative copy number of the transmitted maternal haplotype is 1, and the relative copy numbers of the transmitted and untransmitted paternal haplotypes are ε
and 0, respectively (). Therefore, within each pair of parental haplotypes, the transmitted haplotype is over-represented relative to the untransmitted one. By measuring the relative amount of parental haplotypes through counting the number of alleles specific to each parental haplotype (referred to as ‘markers’), one can deduce the inheritance of each parental haplotype and hence build the full inherited fetal genome.
Figure 1 Molecular counting strategies for measuring the fetal genome noninvasively from maternal blood only. Genome-wide, chromosome length haplotypes of the mother are obtained using direct deterministic phasing. The inheritance of maternal haplotypes is revealed (more ...)
Strictly speaking, the markers that define each maternal haplotype are the alleles that are present in one maternal haplotype but not in the other maternal haplotype and the two paternal haplotypes. However, since it is rare that two unrelated persons share the same long-range haplotype, that is, a haplotype much longer than the usual length of haplotype blocks observed in the population (~100kb), the presence of alleles contributed by the transmitted paternal haplotype at these loci would not interfere with the measurement of representation of maternal haplotypes as long as the haplotype being considered is sufficiently long (>1 Mb). Thus all the maternal heterozygous loci can be used to define the two maternal haplotypes (). This enables the measurement of relative representation of the two maternal haplotypes without the knowledge of paternal haplotypes. The relative representation of the two maternal haplotypes is the difference in the counts of markers specific to each haplotype. Even if the over-representation of the transmitted maternal haplotype is small, the over-represented haplotype can be identified provided that the counting depth exceeds the counting noise, which is governed by Poisson statistics. Table S1 and Figure S1
provide estimations of counting requirement as a function of confidence of measurement and fetal DNA percentage in the clinically observed range. Because the number of markers that define each parental haplotype increases with haplotype length, the longer the phased haplotypes, the lower the average number of sampling per individual marker is required for confident determination of the over-represented parental haplotypes.
If paternal haplotypes are known, it is straightforward to determine the inherited paternal haplotypes by comparing the sum of count of alleles specific to each paternal haplotype (Figure S2
), thereby revealing the entire inherited fetal genome. Figure S3
and the accompanied supplemental text show how this could be achieved using sequencing data of a synthetic mixture of DNA from a mother and daughter within a fully phased family trio12
. However, it is not always possible to obtain paternal information; the incidence of non-paternity is estimated to be between 3% and 10%13,14
, making this a particularly delicate issue. In the absence of paternal information, the paternally inherited haplotypes can be reconstructed via linkage to observed non-maternal (i.e. paternal specific) alleles ().
We verified this approach on samples collected from two pregnancies. Pregnant woman P1 carried a female fetus with normal karyotype, while pregnant woman P2 is an individual with a ~2.85 Mb heterozygous deletion on chromosome 22 that is associated with DiGeorge syndrome. To obtain phased maternal chromosomes, we performed ‘direct deterministic phasing’ (DDP)15
on 3 or 4 maternal metaphase cells obtained by culturing maternal whole blood (Table S2, Figure S4
). DDP involves microfluidic separation and amplification of individual metaphase chromosomes from single cells followed by genome-wide genotyping analysis of amplified materials, and enables each chromosome in the genome to be phased along its full length. Genomic DNA of cord blood collected at delivery was also genotyped to serve as the true reference for fetal genotypes. The true inheritance of maternal haplotypes was determined by aligning the homozygous SNPs of the fetus by cord blood genotyping against the two maternal haplotypes defined by the phased maternal heterozygous SNPs (). The analysis here concerns the ~1 million positions across the genome present on Omni1-Quad genotyping array. Phase information of the remaining genomic positions, particularly those that carry rare variants of clinical importance, can be obtained by broader array coverage or direct sequencing of amplified chromosome materials, as demonstrated previously15
Figure 2 Noninvasively determining genome-wide fetal inheritance of maternal haplotypes via haplotype counting of maternal plasma DNA with at least 99.8% accuracy over 99% of the genome in three maternal plasma samples (A-C). Each point on a black line represents (more ...)
Maternal cell-free DNA samples were shotgun sequenced on the Illumina platform to a final depth of ~52.7x (151Gb), ~20.8x (59.7Gb), and ~1.3x (30.8Gb) haploid genome coverage for P1T1 (P1, 1st
trimester), P1T2 (P2, 2nd
trimester), and P2T3 (P3, 3rd
trimester) respectively (Table S2
). To determine fetal inheritance of maternal haplotypes, we divided each chromosome into bins of 2.5–3.5Mb for autosomal chromosomes and 5Mb–7.5Mb for chromosome X (Table S2
), with sliding steps of 100kb, and compared the counts of alleles specific to each of the two haplotypes. Bin sizes were chosen according to the estimated sampling requirement (Table S1
) based on the sequencing depth, density of markers, and fetal DNA fraction, which was estimated, by comparing relative representation of maternal haplotypes, to be ~5%, ~18%, and ~43% for P1T1, P1T2, and P2T3, respectively. The lower SNP array density on chromosome X required larger bin sizes for that chromosome. The over-represented maternal haplotype over the entire genome was apparent and corresponded to the maternal haplotype transmitted to the fetus (). Taking into account the uncertainty surrounding regions of cross-overs (median ~350–450kb per cross-over, Figure S5
), maternal inheritance of at least 99% of the SNPs could be deduced with at least 99.8% accuracy for all samples. Less sequencing depth also allowed the inherited maternal haplotypes to be deduced (Figure S6
) with lower resolution of cross-overs (Figure S5
The paternally inherited haplotypes were reconstructed by detection of paternal specific alleles, followed by imputation at linked positions. We used the haplotypes of normal population documented by the 1000 Genome Project16
as reference haplotypes for imputation. Imputation accuracy is dependent on the density of markers, and the number of identified non-maternal alleles is dependent on sequencing depth and fetal DNA fraction. At the final sequencing depth, we detected ~66–70% of the paternal specific alleles at least once (Table S2, Figure S7
). Approximately 3.4%–5.6% of the non-maternal alleles were sequencing noise. Using the non-maternal markers, we deduced ~70% of the paternally inherited haplotypes with ~94–97% accuracy via imputation (). The loci that could not be confidently imputed reside in regions where paternal specific alleles were not detected, in regions that lack paternal specific alleles, or where the paternal alleles are associated with more than one haplotype observed in the population. In principle these regions could be completely determined by deeper sequencing and application of the counting principle directly to the local regions or the individual alleles at every genomic position, as shown below.
Figure 3 Reconstruction of paternally inherited chromosomes noninvasively based on imputation using observed non-maternal alleles. The paternally inherited haplotypes were reconstructed by detection of paternal specific alleles, followed by imputation at linked (more ...)