Identification and analysis of genes encoding P4H in maize
By querying DNA databases and performing bioinformatic analyses, nine non-redundant putative maize P4H
genes were obtained and were named relative to their orders on chromosomes. The proteins encoded by the diverse transcripts contained 298–308 amino acids and highly conserved domains, such as the two histidine and aspartate (His-X-Asp) motifs that bind the Fe2+
atom at the catalytic site, and a histidine residue that binds the C-5 carboxyl group of the 2-oxoglutarate (Fig. ). Interestingly, there were fewer P4H
genes in maize than in arabidopsis (Vlad et al., 2007
), despite the fact that the size of the maize genome is 2·3 Gb (Schnable et al., 2009
), which is much larger than that of arabidopsis.
Fig. 1. Multiple alignments of zmP4H proteins obtained with ClustalX. Gaps were introduced for optimal alignment. The three Fe2+-binding amino acids, two histidines and one aspartate, the lysine that binds the C-5 carboxyl group of 2-oxglutarate and the serine (more ...)
The predicted exon/intron structures were determined by comparing the coding regions of all the zmP4H genes with the B73 genomic sequence. All the coding sequences of the zmP4H genes were disrupted by introns, with the number varying from four to seven (Fig. ). The smallest intron was only 45 bp and, surprisingly, the largest intron, which appeared to be inserted by a transposon, was 18·8 kb long in zmP4H2. The lengths of the DNA sequences of these genes ranged from 2 to 20·6 kb. The large intron explained the length of the longest gene, zmP4H2.
Gene structures of zmP4H genes. Introns and exons are as indicated. The numbers above the introns indicate the length of the introns.
The zmP4H polypeptides share 67–100 % identity, indicating high conservation. This contrasted with the highly divergent proteins, ranging from 18 to 82 % identity, in arabidopsis (Vlad et al., 2007
). To evaluate the evolutionary relationships among the genes, phylogenetic trees were constructed with the deduced amino acid sequences of the zmP4H proteins (Fig. ). Based on phylogenetic analysis, the orthologous relationships can be grouped into three classes, A, B and C, comprising five, three and one member, respectively. The mammalian HIF-P4Hs, as expected, were clustered in a distinct group compared with the plant P4Hs.
Fig. 3. Neighbor–Joining phylogenetic tree of zmP4H members with deduced P4H amino acid sequences from arabidopsis and humans. The unrooted tree was generated using the MEGA4·0 program by the Neighbor–Joining method. Bootstrap values are (more ...)
To investigate further the relationship between the genetic divergence within the zmP4H family and gene duplication in maize, the chromosomal location of each zmP4H gene was determined. The results showed that zmP4H genes were located on five chromosomes (Fig. ), three on chromosome 5 (zmP4H6, zmP4H7 and zmP4H8), three on chromosome 1 (zmP4H1, zmP4H2 and zmP4H3) and one each on chromosomes 2 (zmP4H4), 4 (zmP4H5) and 6 (zmP4H9).
Chromosomal distribution of zmP4H genes in maize. The chromosome number is indicated at the top of each chromosome.
AS events in the zmP4H gene family
From the large number of sequences collected from public databases, it was noted that four members of the P4H family showed AS, i.e. zmP4H2, zmP4H4, zmP4H6 and zmP4H8 (Table ). We then investigated whether AS events exist in the waterlogging-tolerant inbred line HZ32. To confirm the existence of different transcripts derived from a single gene, an RT–PCR experiment was performed with primers located in the first and the last exons, which can indicate the presence of transcripts generated by AS in HZ32. The results showed that some zmP4H genes produced more isoforms than those predicted from EST alignments, suggesting that the limited number of available ESTs/cDNAs did not predict all the AS events for these genes. The abundance of the AS transcripts was very different. Although the AS transcripts were successfully cloned from HZ32, most of them were not visible when the products of RT–PCR were checked on a 2 % agarose gel containing ethidium bromide because of their low abundance. Taking zmP4H8 as an example, one pair of primers was designed based on the results of aligning the transcript of zmP4H8 from GenBank with the genome sequence from B73, and the forward primer was located in the first exon and the reverse primer was located in the last exon. As revealed by the results from the sequencing of clones from PCR products, zmP4H8 and zmP4H8-1 were not obtained from HZ32 and four other variants were identified, i.e. zmP4H8-2, zmP4H8-3, zmP4H8-4 and zmP4H8-5. As shown by the results obtained by checking the RT–PCR products on the agarose gel (Supplementary Data Fig. S1), only zmP4H8-4 (1100 bp) and zmP4H8-5 (1219 bp) were visible, while zmP4H8-2 (600 bp) and zmP4H8-3 (750 bp) were not, although up to 40 cycles of PCR were performed and non-specific bands were obtained.
The sequence resources of zmP4H genes used to analyse the AS event
In total, combining the results of amplification in HZ32 and public data from GenBank, we discovered 19 alternatively spliced zmP4H
transcripts, representing two, three, three, five and six transcripts for zmP4H2
, respectively (Fig. ) and the resource of these transcripts were listed in Table . The zmP4H
sequences were deposited in GenBank and were assigned the GenBank accession numbers shown in Supplementary Data
Table S4. The following analyses of the AS transcripts were based on the data from both GenBank and the results of amplification from HZ32 in this study.
AS forms of zmP4H and their exon/intron composition. ZmP4H cDNAs are aligned with the genomic zmP4H DNA. Exons and introns are indicated by boxes and lines, respectively. The arrowheads represents the stop codon.
Sequencing of the RT–PCR products revealed two transcripts for zmP4H2 in HZ32, while there was only one transcript in GenBank. Compared with the normal transcript of zmP4H2, zmP4H2-1 retained an extra 135 bp fragment (Fig. A). The resulting change in the reading frame inserts a premature stop codon (Fig. B) and the deduced protein lacks 95 amino acids, resulting in the loss of the functional domain.
Fig. 6. Identification of two AS transcripts from the zmP4H2 gene in HZ32. (A) Schematic diagram showing zmP4H2 pre-mRNA and splicing patterns that produce zmP4H2 and zmP4H2-1 transcripts. Also shown are the primer-binding sites for PCR. (B) Comparison of zmP4H2 (more ...) zmP4H4
RT–PCR in HZ32 indicated that there was only one mature transcript for zmP4H4, not as many as the three variants identified in B73 (Table ). The deduced protein sequence from zmP4H4, which was the only mature transcript identified in HZ32, was a normal P4H protein containing a typical domain structure. The primary transcripts of the other two variants were interrupted by unspliced introns. The zmP4H4-2 gene retained an unspliced intron 5. The zmP4H4-1 gene retained three unspliced introns: 1, 4 and 5. Although intron retention lengthened the transcripts of zmP4H4-1 and zmP4H4-2, the transcripts contained premature termination codons producing proteins of 213 and 67 amino acids, respectively.
There is only one zmP4H5 transcript deposited in GenBank identified from hybrid 35A19; however, two other variants, zmP4H5-1 and zmP4H5-2, were identified in HZ32. Both zmP4H5-1 and zmP4H5-2 have a non-canonical 5′ splice site shifted 256 bp upstream of the normal site in exon 1 and a non-canonical 3′ splice site shifted 24 bp downstream in exon 2. zmP4H5-2 also lacked exon 3 compared with zmP4H5-1. The truncated protein products of zmP4H5-1 and zmP4H5-2 were only 21 amino acids long because of an AS donor site in exon 1.
There are two variants of zmP4H6 in public databases from different inbred lines, zmP4H6 from hybrid 35A19 and zmP4H6-1 from B73. zmP4H6-1 retained intron 3 compared with zmP4H6. In HZ32, four variants were identified, i.e. zmP4H6 and three other variants: zmP4H6-2, zmP4H6-3 and zmP4H6-4. zmP4H6-2 has a 3′ AS site at intron 4, 101 bp downstream of the normal splice site. Intron 6 was retained in zmP4H6-3. zmP4H6-4 showed a complicated AS phenotype, involving skipping of exon 6, 5′ AS shifted by 45 bp upstream in exon 5 and 3′ AS shifted 53 bp downstream in exon 6. The truncated protein product of zmP4H6-2 resulting from an AS acceptor site lacked 135 amino acids. Although intron retention lengthened the transcripts of zmP4H6-1 and zmP4H6-3, the transcripts contained premature termination codons and produced proteins of 175 and 89 amino acids, respectively. zmP4H6-4 encoded a protein of 146 amino acids, which was shorter by 152 acids compared with zmP4H6.
Six variants were identified for zmP4H8 based the data from GenBank and the amplification from HZ32. The two sequences from the public databases were identified in hybrid 35A19 (zmP4H8) and in B73 (zmP4H8-1); however, neither of them could be amplified from HZ32. In contrast, four other transcripts were obtained from HZ32 (zmP4H8-2, zmP4H8-3, zmP4H8-4 and zmP4H8-5), which have not been previously deposited in public databases. Compared with zmP4H8, zmP4H8-1 retained intron 7 and had a non-canonical 5′ splice site shifted 256 bp downstream of the normal site in exon 6. zmP4H8-2 was produced by complicated AS, involving skipping of exons 4, 5 and 6, retention of intron 7, 5′ AS 34 bp upstream in exon 3, and 3′ AS 35 bp downstream in exon 7. In zmP4H8-3, exon 4 was skipped, intron 7 was retained, 5′ AS shifted 112 bp upstream in exon 3, and 3′ AS 111 bp downstream in exon 5. Compared with zmP4H8, zmP4H8-4 was produced by retention of intron 7. zmP4H8-5 was produced by retention of introns 6 and 7. The deduced proteins of only three transcripts, zmP4H8-1, zmP4H8-4 and zmP4H8-5, were unaffected by AS: the same amino acid sequences as the reference protein were predicted because the AS was involved with only the 3′-untranslated region of these transcripts. The variants of zmP4H8-2 and zmP4H8-3 were produced by complex AS events, including all four types AS events. The protein sequence deduced from zmP4H8-2 was only 185 amino acids, lacking 123 acids. The transcript of zmP4H8-3 encoded a protein lacking 119 acids.
Splice site strength
In this study, we analysed sequence elements located at the 5′ and 3′ splice sites in all the transcripts of zmP4H (Supplementary Data Table S4). Among the 122 splice sites analysed, eight splice sites were unconventional sites. The results showed that 106 out of 122 junction sites contained short direct nucleotide repeats (Table ). The small repeats existed at the junction sites of the normal transcripts, and were all of the GT–AG type (Supplementary Data Fig. S2). However, for six out of the eight unconventional sites, which resulted from splicing in the short direct repeats, their splice sites could not be precisely identified. All possibilities were deduced (Fig. ).
Small nucleotide repeats at exon/intron junctions
All the possibilities of conjunction sites deduced at the non-canonical splice sites. Highlighted sequences are located in the 5′ exon and 3′ exon. The sequences in black and underlined are the small direct nucleotide repeat.
Changes in mRNA levels of zmP4H under waterlogging
We first examined the effect of waterlogging on the expression of those genes without AS. Only one transcript could be amplified from zmP4H1, zmP4H3, zmP4H4, zmP4H7 and zmP4H9 in HZ32 whether under control or waterlogging conditions in this study. As indicated by RT–PCR (Fig. ), there was significant induction in the transcript level of a marker gene for waterlogging, Adh1, indicating that our experimental conditions were valid. The differences in expression of zmP4H genes between the control and waterlogging conditions were not significant, except for zmP4H4. Very low expression was detected for zmP4H4 under control conditions, and zmP4H4 was significantly induced by waterlogging (Fig. ). This analysis revealed that zmP4H1, zmP4H3, zmP4H7 and zmP4H9 did not alter their expression level in response to waterlogging (Fig. ). To verify the result of RT–PCR of these five zmP4H genes, real-time PCR was carried out in an independent assay. actin-1 was used as a housekeeping gene for the normalization of the expression levels of zmP4H genes. Real-time PCR confirmed the RT–PCR expression pattern for all of the five zmP4H genes (Table ).
Fig. 8. The expression pattern of zmP4H genes in HZ32. RT–PCR analysis was performed with control and treated roots of three-leaf seedlings. actin-1 and γ-tubulin were used as an internal control for all the following RT–PCR experiments. (more ...)
Relative expression profiles of zmP4H genes analysed by real-time PCR in the roots of seedlings of HZ32
RT–PCR of AS transcripts of the zmP4H genes was complicated. Most of the AS transcripts were expressed at a very low level under both control and waterlogging conditions, which makes the transcripts difficult to visualize when the products of RT–PCR were checked on the agarose gel with ethidium bromide, even though they could be cloned. A better experimental approach would be real-time PCR to analyse the change in the genes involved with AS; however, there were two impediments to this: (1) the homology between the members of this gene family was very high, sharing 67–100 % identity; and (2) more than one transcript was generated from AS, and these were the same in most regions in the CDS as the normal transcripts, and it is thus difficult to design suitable primers specific to each transcript for real-time PCR. Hence, we performed RT–PCR only for those transcripts for which it was possible.
We first examined the effect of waterlogging stress on the expression of zmP4H2 in HZ32 (Fig. ). The first set of primers, P2-NM-RT-1 and -2, was designed to be specific for zmP4H2 because P2-NM-RT-2 bridged exon 6 and exon 7. It was found that zmP4H2 was decreased under waterlogging. The second set of primers P2-AS-RT-1 and -2 could only amplify zmP4H2-1. The results indicated that zmP4H2-1 was decreased under waterlogging. Furthermore, another pair of primers, P2-5 and -3, located in exon 1 and exon 7, were designed to amplify both zmP4H2 and zmP4H2-1. Both the transcripts were detectable under control and waterlogging conditions. Compared with the control, the zmP4H2/zmP4H2-1 ratio was reduced under waterlogging stress conditions.
Fig. 9. The expression patterns of variants of zmP4H2 genes in HZ32. RT–PCR analysis was performed with control and treated roots of three-leaf seedlings. Lanes 1 and 2: fragments amplified by primers for both zmP4H2 and zmP4H2-1 for 34 cycles. Lanes (more ...)
zmP4H5 was expressed at a very low level, and the result of real-time PCR amplified for all the variants of zmP4H5 revealed that the abundance of zmP4H5 was estimated to be approx. 25-fold less compared with that of actin under both control and waterlogging conditions. It was not suitable to perform RT–PCR for zmP4H5 to study the expression pattern under waterlogging because of the low expression level. It was also difficult to perform real-time PCR for this gene because there were no suitable primers to distinguish different variants from each other and we were also not able to design primers specific to the functional form only.
Only one transcript, zmP4H6 (600 bp), was visible in the control using primers designed to amplify all four variants of zmP4H6 (Fig. ). zmP4H6 was reduced under waterlogging (Fig. ). Under control conditions, there was only one visible band representing zmP4H8-4 (approx. 900 bp), while under waterlogging, as shown in Fig. , zmP4H8-4 was induced and another band could been clearly seen, representing zmP4H8-5 (approx. 1·0 kb), which indicated that zmP4H8-5 was induced in response to waterlogging.
Fig. 10. The expression pattern of zmP4H6 and zmP4H8 genes in HZ32. RT–PCR analysis was performed with control and treated roots of three-leaf seedlings. actin-1 was used as an internal control. Adh1 was used as the marker gene for waterlogging. M: DL (more ...)