Identification of Maize Genes Encoding Canonical C-terminal SUN-Domain (CCSD) Proteins
A reference genome sequence was recently produced for the inbred line B73 (B73 RefGen_v1 [53
]). The SUN
genes described here refer to B73 sequences where possible, although many of the public cDNA and EST sequences in GenBank are from multiple other inbred lines of maize. We identified SUN-domain protein genes in a model plant genetic system by using a BLAST homology search of the maize genome queried with a fungal SUN-domain protein Sad1p, from S. pombe
]. The two different putative maize SUN-domain protein genes we initially identified, ZmSUN1
, were each predicted to encode ~ 50-kDa proteins. When the predicted protein sequences were used to query the Conserved Domain Database (version 2.21, NCBI), each revealed the presence of a single conserved domain, the SUN/Sad1_UNC superfamily (pfam07738), near the C-terminus of the proteins. These maize genes are homologous to recently characterized plant SUN-domain protein genes from Arabidopsis
]) and rice (OsSad1
]). Experimental evidence from heterologous expression assays with fluorescent protein fusions indicates that these Arabidopsis
and rice CCSD proteins are localized at the NE. The presence of a C-terminal SUN domain and the NE localization are among the defining features of animal and fungal SUN proteins [9
]. Plant genomes therefore appear to encode canonical C-terminal SUN-domain (CCSD) type proteins, an observation that is not surprising given the conserved role of these proteins in basic eukaryotic processes such as meiosis, mitosis, and nuclear positioning [8
Discovery of Maize Genes Encoding PM3-type of SUN-domain Proteins
Additional bioinformatic analyses revealed that the maize genome encodes not only CCSD-type SUN-domain proteins but also a unique family of SUN-domain protein genes not previously described. Members of this second group of genes (ZmSUN3, ZmSUN4, and ZmSUN5) encode slightly larger proteins with three transmembrane domains, a single SUN-domain that is not at the C-terminus but rather in the middle of the protein, and a highly-conserved domain of unknown function that we refer to as the PM3-associated domain (PAD). When used to query the Conserved Domain Database, these predicted proteins also revealed the presence of the SUN/Sad1_UNC superfamily, pfam07738. Homologous protein sequences with similar secondary structure and motif arrangement were found to be prevalent within plant genomes. We refer to this group, therefore, as the PM3-type (Plant-prevalent Mid-SUN 3 transmembrane) SUN-domain proteins, as represented by the founding members ZmSUN3, ZmSUN4, and ZmSUN5. A summary of the five maize SUN-domain protein genes is provided in Table and the properties and motifs of the CCSD and PM3 subfamilies of these proteins are summarized in Table .
Maize genes encoding SUN-domain proteins.
Properties and motifs of maize SUN-domain protiens.
Conservation of Two Classes of SUN-domain Proteins in Plants
We next carried out a phylogenetic analysis of CCSD and PM3-type SUN-domain protein sequences from maize, sorghum, rice, Arabidopsis, and moss (Physcomitrella patens). Protein sequence alignments were used to produce an unrooted phylogenetic tree, shown in Figure . From the unrooted phylogenetic tree, we observed two different types of groupings. The first, a clear separation of the CCSD (green shaded area, Figure ) and PM3 (yellow shaded area, Figure ) subfamilies, suggests an ancient divergence of these two classes. These data also suggest that the PM3 proteins originated early in the life of the plant kingdom, predating the origin of flowering plants. The second, four orthologous groups observed within the grass species (SUN Orthologous Grass Groups, labeled SOGG1-SOGG4 in Figure ), may reflect functional divergence within each subfamily. If so, these SOGGs would be predicted to share expression patterns or genetic functions. Interestingly, the two plants outside the grass family, Arabidopsis and the nonflowering tracheophyte P. patens, also have genes predicted to encode at least two CCSD and at least two PM3 proteins, but their relationship to the SOGGs is not resolved by this phylogenetic analysis. Plant genomes therefore appear to encode two different multigene subfamilies of SUN-domain proteins, the CCSD and PM3 types.
Figure 1 Phylogenetic relationships among selected SUN-Domain proteins in the plant kingdom. An unrooted phylogenetic tree of SUN-domain proteins is shown, deduced from full-length cDNAs from maize (Zea mays, Zm), Arabidopsis (At), rice (Os), Sorghum bicolor (more ...)
Shared Gene Structures Reflect an Early Divergence of the Two Types of Maize SUN-domain Proteins
The 2.3-Gb maize genome is partitioned among 10 structurally diverse chromosomes, which are predicted to encode over 32,000 genes [53
]. The genetic map of maize is subdivided into approximately 100 10-to 15-cM bins [56
]. The genome is complex and dynamic because of extensive and recent large segmental duplications [53
] and a major expansion of long terminal repeat sequences over the last few million years. Current breeding lines and natural accessions of maize harbor large amounts of sequence diversity and many structural polymorphisms [53
Using full-length cDNAs (listed in Table ) together with the B73 reference genome, we were able to define the structures of five maize SUN-domain genes as shown in Table and Figure . Three of these genes (ZmSUN1, 2, and 3
) are distributed as unlinked loci that map to two different chromosomes; ZmSUN4
reside in adjacent genetic bins. In determining whether the CCSD or PM3 genes were located in any of the known blocks of genome duplication, we found that the high degree of sequence similarity between the SOGG3 genes ZmSUN3
suggests they arose as part of a gene-duplication event that is known to have resulted in many closely related gene pairs in maize [56
]. Indeed these two genes reside within a large syntenic duplicated block on chromosomes 3 (bin 3.06) and 8 (bin 8.06). This observation is consistent with the phylogenetic results that revealed the presence of four orthologous SUN-domain protein groups, SOGG1 (ZmSUN1
), SOGG2 (ZmSUN2
), SOGG3 (ZmSUN3
), and SOGG4 (ZmSUN5
). Surprisingly, we have not observed duplicate genes for ZmSUN1
, or ZmSUN5
, so these may exist as single copies in the B73 maize genome.
Figure 2 Genomic structures for the two subfamilies of maize SUN-domain protein genes. The locations of exons, start (ATG), and stop (TGA, TAA) codons are shown for each gene. The diagrams were drawn from predictions made by the SPIDEY program http://www.ncbi.nlm.nih.gov/spidey/ (more ...)
An analysis of intron and exon structures within the maize SUN genes showed that the gene structures are conserved within each class. The CCSD genes had two or three exons, and the SUN domain was split between the exons. On the other hand, the PM3 genes had 4-5 exons and a SUN domain that was encoded within the largest exon. Comparative analysis of the maize ZmSUN gene structures revealed that the CCSD genes shared an ancestral intron that interrupts the SUN domain (between K364 and V365 in the ORF of ZmSUN1 and between K338 and D339 in the ORF of ZmSUN2; Figure ). This ancestral intron position may be a hallmark of this class of SUN genes, as it is also found in the Arabidopsis, rice, sorghum, and moss homologs. ZmSUN1 and ZmSUN2 share a large intron, greater than 3 kb in size, whereas the PM3 genes all possess small introns ranging from 19 to 483 nucleotides in size.
Properties of Maize SUN-domain Proteins
Using the full-length cDNAs listed in Table we predicted the encoded proteins for five different maize SUN-domain proteins. Their features and primary motifs are summarized in Table and diagrammed in Figure . A multiple sequence alignment of CCSD-type proteins reveals divergence at the N-terminal region and conservation at the C-terminal region which encompasses the SUN domain (Additional file 1
Figure S1). Several previously characterized fungal and animal SUN-domain protein structures (Figure ) are also shown for comparison. The SUN-domain proteins of human, mouse, worm, and fission yeast differ in size and number of transmembrane and coiled-coil motifs, but all a have single C-terminal SUN domain, considered a diagnostic feature for this family of NE-associated proteins. The plant proteins that most closely resemble the founding members of the SUN-domain protein family are those encoded by the CCSD genes. The plant CCSD proteins exhibit conserved size and overall structure to a remarkable degree, having one transmembrane domain followed by one coiled-coil domain, and share an overall size of about 50 kDa (Figure ). Relatively little is known about the CCSD proteins in plants. Fluorescent protein fusion assays with AtSUN1, AtSUN2, and OsSad1 demonstrate localization to the NE [18
]. In addition, The CCSD proteins probably share some functions with their animal counterparts but have not been proven to do so.
Figure 3 Conservation of functional domains in plant and animal SUN-domain proteins. Comparative diagrams of SUN-domain proteins depicting protein sizes and domain locations (see Table 2). The positions of transmembrane (red), coiled-coil (blue), SUN (yellow), (more ...)
Even less is known about the PM3 proteins, and their functions are completely uncharacterized. They are significantly larger than plant CCSD proteins (Figure ). Their shared structural features are an N-terminal transmembrane domain, an internal SUN domain, a PAD, one or more predicted coiled-coil motifs, and two closely spaced C-terminal transmembrane domains (Table Figure ). This collection of features defines them structurally, but the central location of the SUN domain is not unique to plants. Other, nonplant mid-SUN-domain proteins, largely uncharacterized, from various species including fungi, flies, worms, and mammals can be identified by sequence-search analyses (data not shown). Whether or not these proteins reside or function in the NE remains to be determined.
In addition to their difference in size and SUN domain locations, these protein subfamilies are distinct in other interesting ways (Table ). The CCSD-type proteins have a basic isoelectric point, whereas the PM3-type proteins have an acidic one (Table ). In addition, the PM3 proteins have a relatively large number of cysteine residues that may play important roles in intra- or intermolecular disulfide bridge formation. Furthermore, a multiple sequence alignment reveals that the PM3 proteins all have the highly conserved region that we call the PAD (Figure Additional file 2
figure S2). This region of approximately 38 residues appears diagnostic for plant PM3 proteins and is spaced about 80-90 residues after the SUN domain. The SUN domain and the PAD for 11 plant proteins revealed a high degree of amino acid conservation.
Figure 4 Multiple sequence alignment of PM3 SUN domains and PAD regions. Multiple sequence alignments from ClustalW2 for isolated domains of PM3 proteins from five plant species. Box shade alignment displays show conserved residues (identical black, similar grey) (more ...)
Despite the similarity of domain architecture and sequence similarity within conserved domains, the remainder of the protein regions exhibit considerable sequence divergence between the SOGG3 and SOGG4 members in any given species. Overall, these analyses show that the maize genome encodes at least two multigene families of SUN-domain proteins. Each of these two subfamilies comprises at least two genes. ZmSUN1 and ZmSUN2 are CCSD-type and are most closely related to plant SUN-domain homologs AtSUN1, AtSUN2, and OsSad1. ZmSUN3, 4, and 5 are PM3-type and probably represent a previously unknown class of SUN-related proteins in plants.
mRNA Expression Profiling of ZmSUN Protein Genes
The conservation of the SUN-domain protein genes in plants suggests that they potentially have functions similar to those of their animal counterparts, for example nuclear positioning and motility within the cell, bridging the cytoplasm to the cortical layer of the nucleoplasm, and contributing to meiotic chromosome segregation through telomere tethering before synapsis and recombination [8
]. Maize SUN
domain genes that function in basic somatic cell processes such as mitosis, nuclear architecture, and chromosome tethering might be expected to show ubiquitous expression, whereas those that function in meiosis or pollen-nuclear migration or nuclear fusion at fertilization might show a more limited expression profile, being active in reproductive organs such as flowers, egg and pollen mother cells, and gametophytic tissues such as pollen grains. To begin to examine these possibilities, we looked at gene expression at the mRNA abundance level using three different sources of information: NCBI's UniGene; microarray expression data from anthers, which contain male meiotic cells; and Solexa transcriptome profiling data derived from maize inbred line B73 tissues.
Four of the five genes (all but ZmSUN3
) are represented by consensus UniGene models in NCBI (Table ), and three of these, ZmSUN1
, and ZmSUN4
, are accompanied by quantitative EST profile information expressed as transcripts per million, which we converted to transcripts per ten million (TPdM). The EST data were pooled according to tissue type, and only relatively deeply sequenced libraries (10,000-15,000 or more) showed evidence of expression, as summarized in Additional file 3
Figure S3. The CCSD genes, ZmSUN1
, appeared to be expressed at relatively low levels (200-2,000 TPdM) in several tissues, including ear, endosperm, embryo, meristem, pollen, and tassel. Only one PM3-type SUN-domain gene, ZmSUN4
, currently has corresponding EST profile data available from NCBI. It too shows relatively low expression levels (~400-3,000 TPdM) in a variety of tissues, such as embryo, pericarp, and shoot. These values are roughly 10% of those for UniGene EST data from two control so-called house-keeping genes, alpha tubulin 4 (tua4
, Zm.87258) and cytoplasmic GAPDH (Zm.3765), which are expressed in 17 of the 19 tissues at levels from ~2,200 to 21,000 TPdM.
Given the role of SUN-domain proteins in meitoic telomere behavior in a variety of nonplant eukaryotic species, we next examined microarray data from mRNA expression profiles of male reproductive organs from 1- to 2-mm anthers. Anthers in this size range are from tassels that had not yet emerged and and contain meiocytes before or during meiotic prophase. Microarray probes (60-mer oligonucleotides, as described in [61
]) that showed 100% match with our B73 gene models were available for each gene, and their relative expression values are plotted in Figure . From these analyses, we observed that the relative expression levels of ZmSUN5
were highest in meiosis-stage anthers, whereas ZmSUN1
were the lowest there, and ZmSUN4
was intermediate in the overall range (~80 to 3,000 TPdM).
Figure 5 Expression of ZmSUN genes in meiosis-stage anthers. Relative expression levels shown by maize SUN-domain protein genes obtained from published microarray experiments (Gene Expression Omnibus [73,79]). The cDNAs were from meiosis-stage anthers 1 mm, 1.5 (more ...)
Ascribing the meiotic telomere clustering functions to any one of the five SUN
genes may prove difficult, at least partly because the anther is made up of several different cell types that include not only cells in meiosis but also a layer of epidermal, intermediate, and tapetal cells. The expression or function of plant SUN
genes could be partitioned among these cell types, whereas these methods produced only a single value over the entire anther [61
]. Another consideration is that even single cells may contain multiple SUN proteins with different, related, or even cooperative functions, such as NE rearrangements, interaction with nuclear pores, or paternal storage of gene products for postmeiotic functions such as pollen mitosis, pollen tube growth, nuclear migration, and fertilization.
Solexa Transcriptome Expression Profiling
Expression levels for the two Solexa-based sequencing-by-synthesis methods we used, Solexa dual-tag-based (STB) and Solexa whole transcriptome (SWT) http://www.illumina.com/technology/sequencing_technology.ilmn
), are also reported in transcripts per 10 million and derived from experiments on pooled samples of six major tissues of the B73 cultivar. Both the Solexa technology and the EST UniGene data provide discrete counts of sequenced molecules, but the Solexa data are based on millions, not thousands, of reads per experiment, providing better representation of genes such as the ZmSUN
genes that were expressed at low levels in each organ. The two platforms gave similar results for pooled tissue samples, as summarized in Figure and tabulated in Additional file 4
Table S1. Most of the SUN genes were expressed at low levels across multiple tissues; expression was similar within tissue types, regardless of developmental stage. The ZmSUN
gene expression levels were about 2% of those of the moderately expressed housekeeping control gene, cytoplasmic glyceraldehyde 3-phosphate dehydrogenase (GAPDH
, Figure ).
Figure 6 Expression profiling of ZmSUN genes by Solexa tag-based and whole-transcriptome sequencing. mRNA from various B73 tissues was subjected to two Solexa sequencing platforms, Solexa whole-transcriptome (SWT) and Solexa dual-tag based (STB). The vertical (more ...)
To show more clearly the variation in expression levels among the SUN
genes, we replotted the same data as semi-log2
(Figure ). The overall expression pattern is consistent with basic functions for SUN-domain proteins in most cell types. A notable exception to the widespread pattern of expression was that of ZmSUN5
, which showed a very distinct and much more restricted pollen-related pattern of expression (Figure pollen). Such an expression profile predicts that ZmSUN5
should be required for specialized processes such as nuclear migration down the pollen tube and possibly double fertilization. An interesting and related observation is that fertilization involves nuclear fusion, as does karyogamy, which in yeast involves active nuclear migration and SUN-domain proteins [9
The present report represents the first description of relative mRNA expression levels of all members of a SUN
gene family in any plant species and may therefore prove useful to investigators of the functions of plant SUN-domain proteins. Despite some variation in the data across different expression platforms, as summarized above, a consistent trend for most of the ZmSUN
genes is that they are expressed in many different tissues at relatively low levels, a finding similar to that of Graumann et al
] for the CCSD-type AtSUN2
gene. In addition, we observed a distinct exception to this overall pattern with ZmSUN5
, whose expression appears to be highly specific to pollen. Given the lack of information on PM3-type SUN proteins, we set out to characterize this group further in plants. We chose to examine a PM3-type gene that was expressed in many cell types including those expressed in meiosis-stage anthers with possible roles in meiotic telomere functions.
Isolation and Characterization of a Maize PM3-type SUN-Domain Protein Gene from a Meiotic cDNA Library
The role of SUN
genes in telomere-associated recombination and crossover control has been established for animals and yeast and is likely to exist in plants as well [33
]. In this regard, we find intriguing that two different laboratories [65
] recently and independently mapped a recombination control QTL in maize to bin 3.06, where ZmSUN3
resides. We screened a meiosis-enriched cDNA library for ZmSUN3
and its closely related duplicate ZmSUN4
using a 639-bp PCR product corresponding to a region of the SUN domain of ZmSUN3
at a stringency of Tm-15°C. The probe has a high degree of similarity to both ZmSUN3
yet it is not similar enough to ZmSUN5
or either of the CCSD-type genes to detect them. From approximately 500,000 plaques, we isolated two identical full-length cDNA clones of ZmSUN4
with identical insert sequences. The detection of ZmSUN4
but not ZmSUN3
is consistent with the relative expression levels for ZmSUN3
in meiosis-stage anthers (Figure ).
The full-length cDNA sequence for ZmSUN4
] and the deduced protein sequence and motifs are illustrated in Figure . The predicted protein sequence from the ZmSUN3 gene is also shown (Figure ) and reveals that the B73 SUN3 and W23 SUN4 are 88% identical. This relatively high level of protein similarity reflects their divergence after a maize genome duplication event estimated to have occurred about 5-12 mya [53
]. The extent which these proteins have evolved functionally remains unknown.
Figure 7 ZmSUN4 cDNA and protein features. (A) ZmSUN4 (genotype W23) full-length cDNA, showing the 5' and 3' UTRs, open reading frame (ORF), and poly-A tail. A diagram of the protein indicates domain locations as described in Figure 3. (B) Annotated protein sequence (more ...)
The W23 ZmSUN4 full-length cDNA is 2,158 bp in length and has a predicted open reading frame (ORF) of 1,920 bp encoding a 639-residue protein with a predicted molecular mass of ~71 kD and an acidic isoelectric point of 5.2 This full-length ZmSUN4 cDNA predicts a protein with all of the motifs and arrangents (Table Figure ) that are typical of the entire class of PM3 proteins.
Localization of a Maize PM3-type Protein
To test for the presence and localization of ZmSUN3/4 proteins in planta, we developed peptide antibodies for western blotting and immunolocalization, and the results are summarized in Figure and . The peptides used and the corresponding ZmSUN3/4 sequences are shown Figure . Our survey of a variety of tissues for the presence of PM3-type proteins with antisera to zms3gsp1A (Figure ) revealed only one band band of about 70 kDa in all of the tissues surveyed, including leaf, root, silk, husk, earshoot, embryo, preemergence (meiotic) tassels, and emerged (postmeiotic) tassels. This broad detection is consistent with the mRNA expression profiles for ZmSUN3 and ZmSUN4 (Figure and ).
Figure 8 Western blot of proteins ZmSUN3 and ZmSUN4. (A) Two peptide antibodies were made against synthetic peptides within (zms3gsp1a) and just after (zms3gsp2) the SUN-domain of the maize ZmSUN3 protein. The corresponding regions in ZmSUN3 and ZmSUN4 are aligned, (more ...)
Figure 9 Immunolocalization of PM3 SOGG3 Proteins at the nuclear periphery. Combined antisera (zms3gsp1a and zms3gsp2) or preimmune control sera were used to stain formaldehyde-fixed uninucleate pollen mother cells. The immune complex was visualized by deconvolution (more ...)
Our examination of proteins from isolated male flowers at meiotic stages of development detected high-molecular-weight bands that were considerably larger than the predicted protein sizes. Given the number of cysteine residues and the possibility of disulfide bridges, we examined the effect of prolonged boiling times in the presence of reducing agents (0.1 M 2-mercaptoethanol, 10% SDS) on the detectable band patterns. These high-molecular-weight bands were not detected in the protein samples examined for multiple other, different, nonanther tissues (Figure ). The basis for this difference is not known, but it may result from more highly cross-linked SUN3/4 protein in the extracts from anthers than in those from the other tissues. After 10 or more minutes of boiling, the antibodies detected a single band of about 70 kDa (Figure ), similar to those detected in the multitissue survey blot (Figure ). Therefore, ZmSUN3, ZmSUN4, or both appear to be present in meiosis-stage anthers.
Our examination of formaldehyde-fixed cells, shown in Figure revealed the strongest staining around the nuclear periphery but also detected considerable speckled cytoplasmic staining in a postmeiotic uninucleate pollen mother cell. The cytoplasmic staining may reflect nonspecific background or true signal from ER-localized PM3-type SUN-domain protein. Interestingly, we have yet to detect staining in meiotic prophase nuclei with these antibodies, possibly because of difficulty in the preservation conditions or in detecting the epitope in prophase nuclei or possibly because of an absence of PM3-type SUN-domain proteins in meiotic cells. The results of negative control experiments, using preimmune sera and secondary antibody only, are shown in Figure at image scaling comparable to that used for the anti-PM3-antibody staining (Figure ). The lack of staining in the controls suggests that the staining patterns noted with the anti-PM3 sera were specific.
These data provide the first direct evidence of a PM3 SUN-domain protein localized to the nuclear periphery and suggest that this SUN domain in this subfamily of plant proteins can reside in the NE like the CCSD proteins. Together, these observations suggest that plant nuclei contain multiple different SUN-domain proteins.
Models of the Topology of Plant SUN-domain Proteins
The two structural classes of plant SUN-domain proteins found in maize, and shown to be occur commonly in many plant species, may have different functions. If they serve as physical connectors that transduce forces from the cytoplasm to the nucleus, determining their topologies and dispositions relative to the membranes of the NE will be an important step toward elucidating their biological roles. Several models of different topoligical arrangements for generalized CCSD and PM3 SUN proteins in the plant NE are presented in Figure .
Figure 10 Maize SUN topology models relative to the membranes of the nuclear envelope. Possible protein arrangement models with the SUN (yellow) domain in the perinuclear space are shown for the CCSD (A-B) and PM3 (C-F) proteins. Models do not attempt to depict (more ...)
If CCSD SUN proteins adopt a configuration like that of plant, animal, or fungal SUN proteins, the most likely arrangement would be that depicted by topology model "A" in Figure . In this configuration, the N-terminus would be in the nucleoplasm, possibly interacting with chromatin, inner-nuclear-membrane-associated proteins, or telomeres, and the SUN domain would be positioned within the perinuclear space. Connections to the cytoplasm would require interactions with other proteins embedded in the outer nuclear membrane. The configuration depicted in topology model "B" would suggest an opposite set of interactions. Given the structure of the NE, the two models are not necessarily exclusive, as the two membranes are continuous and fused around nuclear pore complexes.
For the PM3 SUN proteins, four different models (Figure ) are presented for consideration because three transmembrane domains are involved. The C-terminal transmembrane domains are close together and unlikely, although not necessarily unable, to traverse the entire lumenal space. Only models with the last two transmembrane domains in the same membrane are therefore presented. Of these, topology models "D" and "E" are intriguing in that they predict a single protein bridge with both nucleoplasmic and cytoplasmic segments. Topology model "C" could have two different nucleoplasmic segments and thereby serve as a scafold for multiple nuclear molecules or complexes, including chromatin and nonchromatin nuclear proteins, other NE proteins, or telomeric DNA. Similarly, topology model "F" depicts a protein with two cytoplasmic segments that might be capable of interacting with two cytoplasmic partners, while requiring additional protein interaction to form a functional nucleoplasmic-cytoplasmic bridge.
In nonplant systems, SUN proteins are linked to the cytoplasm by an interaction with KASH-domain proteins that traverse the outer nuclear membrane. The KASH domain proteins connect to various cytoskeletal components to function as cargo-specific cytoskeletal adaptor proteins [13
]. As a family, the KASH domain proteins have limited homology over a small portion of their entire protein sequence, and no plant KASH-domain protein homologs have been identified by sequence analyses thus far. Genetic or protein interaction screens may be required to identify SUN-interacting partners and their function in plants.