|Home | About | Journals | Submit | Contact Us | Français|
In animals, maternal gene products deposited into eggs regulate embryonic development before activation of the zygotic genome1. In plants, an analogous period of prolonged maternal control over embryogenesis is thought to occur based on gene-expression studies2–6. However, other gene-expression studies and genetic analyses show that some transcripts must derive from the early zygotic genome7–14, implying that the prevailing model does not fully explain the nature of zygotic genome activation in plants. To determine the maternal, paternal and zygotic contributions to the early embryonic transcriptome, we sequenced the transcripts of hybrid embryos from crosses between two polymorphic inbred lines of Arabidopsis thaliana and used single-nucleotide polymorphisms (SNPs) diagnostic of each parental line to quantify parental contributions. Although some transcripts appeared to be either inherited from primarily one parent or transcribed from imprinted loci, the vast majority of transcripts were produced in near-equal amounts from both maternal and paternal alleles, even during the initial stages of embryogenesis. Results of reporter experiments and analyses of transcripts from genes that are not expressed in sperm and egg indicate early and widespread zygotic transcription. Thus, in contrast to early animal embryogenesis, early plant embryogenesis is mostly under zygotic control.
The prevailing model for the maternal-to-zygotic transition in plants proposes that most early embryonic mRNAs are maternally derived transcripts, resulting either from maternal inheritance or from higher transcriptional activity of maternally derived genes until the globular stages (in which the embryo proper has between ~32 to >100 cells)2,6,15. Because fundamental patterning events, including apical–basal and radial axis formation16,17, occur during the preglobular stages, this model implies that key cell-specification decisions are mostly under maternal control. However, this model is difficult to reconcile with other studies that report equivalent maternal and paternal expression of interrogated genes in preglobular stages8,9,11 and zygotic-recessive behavior of mutants with preglobular developmental phenotypes12–14.
To globally determine the origins of embryonic transcripts, we crossed polymorphic Col-0 and Cvi-0 Arabidopsis thaliana accessions and performed RNA-Seq on poly(A)+ RNA isolated from hybrid embryos with either one-to-two, eight, or ~32 cells in the embryo proper (hereafter referred to as 1-cell/2-cell, 8-cell and ~32-cell embryos). To control for inherent expression differences between Col-0 and Cvi-0 loci, the same procedure was performed using embryos derived from reciprocal crosses. Illumina sequencing of the six samples yielded 73,955,956 reads that both perfectly and uniquely matched the transcribed regions of 23,874 genes (Supplementary Table 1). Overall, transcript levels from the same stage but different reciprocal crosses were highly correlated (r ≥ 0.96; Supplementary Fig. 1). Both the sequencing depth and reproducibility indicated that our results would be informative for inferring the maternal, paternal and zygotic contributions to the early embryonic transcriptome.
The prevailing model for the maternal-to-zygotic transition predicted that at the early embryonic stages transcripts would derive primarily from maternal alleles, and then at the globular stage they would derive more evenly from both alleles because at this stage the zygotic genome would be active. In contrast to this expectation, we found equal amounts of paternally and maternally derived reads at all three stages, including the 1-cell/2-cell and 8-cell embryos (Fig. 1a). When examining transcripts with at least five reads overlapping SNPs in each cross, most were expressed equally from the maternal and paternal alleles, even at the earliest stage (Fig. 1b; Supplementary Dataset 1; note that in the Col-0 × Cvi-0 cross, the maternal parent is Col-0, as the convention is to write the maternal parent first). In one cross, small but discernable sub-populations of transcripts were derived predominantly from either maternal or paternal alleles. However, in the reciprocal cross those same transcripts tended to derive from the opposite parent, which showed that rather than arising from parent-of-origin effects, these sub-populations mostly arose from genotypic effects, i.e., from preferential expression from either the Col-0 or Cvi-0 alleles. Thus, when considering results of both crosses together, no overall maternal (or paternal) bias was observed, and at each stage the distribution of maternal-to-paternal ratios resembled that predicted for equal contribution from both alleles, with most of the variability explained by stochastic counting statistics, as modeled by the binomial distribution (Fig. 1b). The remainder of the variability was attributed to both additional noise (both experimental and biological) and a small subset of transcripts with parent-of-origin effects, for which the number that were maternally enriched approximately equaled the number that were paternally enriched.
Because most transcripts were derived from the maternal and paternal genomes in near-equal amounts, we hypothesized that either equal amounts of each transcript were inherited upon fertilization or the zygotic genome was activated much earlier than previously proposed. Supporting the later possibility, the fold-change distributions centered at 0.0 (log2), showing no hint of the maternal bias that would be expected if the egg, with its larger cytoplasm, contributed more RNA than the sperm (Fig. 1b). In addition, 543 transcripts had at least four-fold higher RPM (reads per million genome- and cDNA-matching reads) values at the 8-cell stage than at the 1-cell/2-cell stages, which suggested active transcription of the corresponding genes between the 1-cell/2-cell and 8-cell stages (since the alternative model of differential mRNA stability would require a large decrease in total mRNA between these two stages; Supplementary Dataset 1). Furthermore, 1,138 transcripts that were previously called undetectable in both egg and sperm microarray datasets18,19 were among the top 50% most abundant transcripts in our 1-cell/2-cell datasets (Supplementary Dataset 1). These results, taken together with the observation that transcripts for RNA Polymerase II subunits were among the most abundant in the early embryo (Supplementary Table 2), suggested that during the initial stages of embryogenesis many transcripts are transcribed from both maternal and paternal alleles, and that this transcription, combined with turnover of inherited transcripts, quickly overwrites most parent-of-origin biases present when the egg and sperm first fuse.
To test directly whether maternally and paternally inherited genes are transcribed in very early embryos, we used the LhG4/pOp transactivation system20. In this system, one parent contained a transgene encoding the LhG4 transcription factor under the control of either the RPS5A or the UBI3 promoter (chosen because they generate products that can be found in the early embryo9,21), while the other parent harbored a nuclear-localized green fluorescent protein reporter transgene under the control of the artificial pOp promoter to which LhG4 binds and transcriptionally activates (pOp::GFP). Because one gamete contributed the reporter gene and the other gamete contributed its activator, any reporter expression dependent on the activator could not occur until after the zygotic genome was transcriptionally active. No GFP signal was observed in embryos carrying the pOp::GFP reporter gene but no activator, which confirmed the absence of leaky expression in either the gametes or embryo (Fig. 2). GFP signal was detected in zygotes within four to eight hours after fertilization from crosses that brought the reporter together with its activator, regardless of whether the reporter was inherited through the egg or sperm (Fig. 2). The GFP signal in the early zygote was not as strong as that in the endosperm, which might indicate a slight delay in activation of the genome of the very early zygote or might result from more robust transcription in the endosperm. Nonetheless, when considered together our results demonstrated that both maternally and paternally inherited chromosomes are transcriptionally active at least in 1-cell embryos and most likely before.
Having established that most early embryonic transcripts derived equally from the maternal and paternal genomes, we turned our attention to the few that might be preferentially inherited or preferentially expressed from the maternal or paternal alleles. For most early embryonic transcripts the genotype of the allele had a larger affect on transcript levels than did parent-of-origin (Figs. 1b and and3a).3a). Indeed, hundreds of transcripts were preferentially expressed from either the Col-0 or Cvi-0 allele irrespective of their parent-of-origin (Supplementary Dataset 2), presumably as direct consequences of either DNA-sequence or epigenetic differences between the Col-0 and Cvi-0 alleles. This prevalence of genotypic/epigenotypic effects over parent-of-origin effects led to significant negative correlations between the maternal-to-paternal ratios at each stage (Fig. 3a, Pearson’s r < −0.43, P < 10−15). Nonetheless, at each stage a small subset of transcripts passed our cutoffs for classification as maternally or paternally enriched, i.e., four-fold maternally or paternally enriched in each cross, and not exceeding the false discovery rate (FDR) threshold of 0.05 (Pearson’s chi-square tests, Benjamini and Hochberg FDR). With these cutoffs, 77 and 45 transcripts were designated maternally and paternally enriched, respectively, in at least one stage (Fig. 3a; Supplementary Dataset 3). For five transcripts, an independent assay involving diagnostic cleavage of an amplified polymorphic sequence was performed, and for all five, the parent-of-origin effects were confirmed (Supplementary Fig. 2).
Although the maternal enrichment of some transcripts might be due to their transport from maternal sporophytic tissues, such a mechanism cannot explain paternal enrichment. Some transcripts had strong parent-of-origin effects in 1-cell/2-cell embryos, suggesting that they were preferentially inherited from one parent (Fig. 3b). Confirmation that these biases are indeed due to inheritance would substantially add to the single previously documented example of an inherited transcript in plants22. Other transcripts had stronger parent-of-origin effects at later stages, suggesting that they were preferentially expressed from either the maternally or the paternally inherited allele (Fig. 3b). This potential parent-of-origin–specific expression implied a form of imprinting in the Arabidopsis embryo. Genome-wide approaches similar to ours but looking much later after fertilization greatly expanded the list of genes with parent-of-origin–specific expression in the endosperm but did not identify such genes in embryos23,24. Thus, the imprinting-like phenomenon that we observed in early embryos is short-lived, which suggests that it differs from the more persistent imprinting previously characterized in either endosperm or mammals25. For example, it might involve alternative chromatin states inherited from the egg and sperm, which after several cell divisions equilibrate between the two alleles. Such a mechanism would not necessarily require DNA methylation, although we note that in the only previous report of imprinting in plant embryos (at the MEE1 locus of maize), methylation marks are lost from maternal alleles at the initial stages of embryogenesis, but then re-established to match the paternal alleles at later stages26.
Our results showing that during the initial stages of Arabidopsis embryogenesis both the maternal and paternal genomes are active and make essentially equivalent contributions to the embryonic transcriptome are in stark contrast to a recent report that ~88% of the 2-cell/4-cell embryonic transcriptome is derived from the maternal genome2. That study, published while our manuscript was in preparation, used SNPs in the transcriptomes of hybrid embryos derived from crosses between the Ler-1 (maternal parent) and Col-0 (paternal parent) accessions to estimate maternal and paternal genomic contributions. Because results from reciprocal crosses were not reported, the previous study could not distinguish parent-of-origin effects from genotypic effects. However, because an analysis unable to account for genotypic effects would be expected to misidentify transcripts as paternally derived as frequently as it misidentifies them as maternally derived, we considered other possibilities for the discrepancy between their results and ours. Our pilot studies had shown that early embryos must be extensively washed to prevent seed-coat RNA contamination, and indeed we found evidence that their embryo RNA samples contained large amounts of seed-coat mRNA (Supplementary Fig. 3). Because the seed coat is a maternal tissue, this contamination explained why they observed such a large bias in maternal RNAs.
Genes zygotically required for the initial zygotic division have been identified in Arabidopsis9,12,13. Moreover, paternal gene expression has been detected for 24 endogenous genes and a transgene in maize zygotes8,27, and de novo transcription has been demonstrated recently in tobacco zygotes28. Rather than interpreting these findings as exceptions to the model or as differences between species, our transcriptome-wide analyses and reporter data strongly support the proposal of an alternative model for the maternal-to-zygotic (or, to put it more precisely, the maternal/paternal-to-zygotic) transition in plants. In this model, both maternal and paternal gene products are inherited in the zygote upon fertilization and contribute to the earliest stages of embryogenesis. Within the first few hours after fertilization, the zygotic genome is activated, and equal transcription of both maternal and paternal alleles generates most of the early embryonic transcriptome at the 1-cell stage or before. Our model indicates that although the maternal, paternal and zygotic gene products control early-zygote development, the zygotic products are the ones that primarily control late-zygote development and the fundamental cell-specification events of the subsequent preglobular stages.
Col-0 and Cvi-0 seed stocks were obtained from the Arabidopsis Biological Resource Center. Plants were grown at 22° C in a Conviron growth chamber with a 16-h light/8-h dark cycle. Flowers were emasculated one day before crossing, and pools of twenty 1-cell/2-cell, 8-cell and ~32-cell embryos were hand-dissected approximately 40, 64 and 78 hours after pollination, respectively.
Embryo dissections, RNA isolation, linear amplification of poly(A) RNA and strand-specific RNA-Seq was as described21, except that embryos were extensively washed prior to RNA isolation. After removing adaptor sequences, reads were mapped to both the A. thaliana genome and transcript models (Col-0 genome reference, TAIR10) with the Bowtie short-read aligner29. Reads were also mapped to a ‘pseudo’ Cvi-0 genome and Cvi-0 transcript models, in which SNPs in the Col-0 genome and transcript models were replaced with Cvi-0 variants (ftp://ftp.arabidopsis.org/home/tair/Sequences/Ecker_Cvi_snps.txt) generated by the 1,001 Genomes Project. Reads that both perfectly and uniquely matched either the genomes or transcript models were retained, and those overlapping transcribed regions of annotated genes were evaluated for overlap with SNPs. To normalize for differences in library sizes, the numbers of reads representing each transcript were divided by the total number of reads matching the genome and transcript models. SNP-overlapping reads were assigned to one of the parental genomes and tallied for each transcript, combining tallies for multiple SNPs within the same transcript.
Statistical analyses were performed and associated graphics were generated with the R statistical computing base package30. The SciPy tools for Python were used to calculate chi-square test statistics and associated probabilities.
The pBIN+LhG4-GW vector was generated by digesting both the pBIN+LhG4 vector and the attR1/attR2-containing vector GWRFa::pET42a with KpnI and ligating together the appropriate digestion fragments. RPS5A and UBI3 promoter fragments were amplified with appropriate primer pairs, cloned into pENTR-D/TOPO (Invitrogen) and then recombined with pBIN+LhG4-GW to generate pRPS5A::LhG4-GW and pUBI3::LhG4-GW, respectively. The pV-TOP-GFP vector was generated by double-digesting pV-TOP(E3) and NLSGFPF1R1::pCR8 with HindIII and PmeI, and ligating together the appropriate digestion fragments to replace the GUS reporter with a nuclear-localized GFP reporter.
NLSGFPF1R1::pCR8 was created by amplifying pCGTAG with appropriate primers and cloning into pCR8/GW-TOPO (Invitrogen). pOp::GFP was then generated by digesting pV-TOP-GFP and GWF1R3::pENTR with SalI, and ligating together the appropriate digestion fragments. GWF1R3::pENTR was created by amplifying the attR1/attR2 Gateway cassette from GWRFa::pET42a with a series of overlap extension PCRs to remove internal SalI sites and add SalI sites on the ends of amplicons, which were then cloned into pENTR/D-TOPO (Invitrogen). Oligonucleotides used to generate the above constructs are listed in Supplementary Table 3. All constructs were transformed into Col-0 by Agrobacterium-mediated transformation. Because the progeny of T1 lines gave more robust GFP signal than did progeny of established lines, T1 lines were used for all reporter crosses. Embryos from crosses using at least thirteen pOp::GFP, four pRPS5A::LhG4 and six pUBI3::LhG4 independent T1 lines were examined.
For confocal scanning laser microscopy, developing seeds were mounted in 50 mM potassium phosphate buffer, pH 7.2, with 5% glycerol. A 488 nm laser on a Zeiss LSM 510 confocal microscope was used to excite GFP and seed autofluorescence, and images were collected at 505–530 nm and 644–719 nm, respectively.
Transcripts containing a SNP that created or disrupted a restriction site in the corresponding cDNA were selected for analysis. Random hexamers (Invitrogen) were used for reverse transcription with Superscript III (Invitrogen) and primers flanking the SNPs were used for amplification (Supplementary Table 3). The amplified DNA was digested with the appropriate restriction enzyme (New England Biolabs; Supplementary Table 4). Digestion products were resolved on 2% agarose gels stained with ethidium bromide, and bands were quantified with Quantity One 1-D analysis software (Bio-Rad).
We thank Joe Ecker and the 1,001 Genomes Project for generating the list of Cvi-0 SNPs, the John Harada and Robert Goldberg labs for making the Arabidopsis seed development LCM microarray datasets publicly available, Ian Moore for the transactivation vectors, Jeff Long for the UBI3 promoter fragment, the Whitehead Genome Technology Core for sequencing, and David Meinke for providing a curated list of preglobular zygotic-recessive mutants prior to publication. This work utilized the W.M. Keck Biological Imaging Facility at the Whitehead Institute and was supported by NIH grant GM067031 (D.P.B.) and NIH Postdoctoral Fellowship GM084656 (M.D.N). D.P.B. is an Investigator of the Howard Hughes Medical Institute.
ContributionsM.D.N designed and performed the experiments. M.D.N and D.P.B interpreted the results and wrote the manuscript.
Raw and processed mRNA-Seq datasets have been deposited into NCBI GEO (GSE33713) available at http://www.ncbi.nlm.nih.gov/geo/.
Competing financial interests
The authors declare no competing financial interests
The file contains Supplementary Tables 1–5 and Supplementary Figures 1–3. Supplementary Datasets 1–3 are provided as separate files.