|Home | About | Journals | Submit | Contact Us | Français|
Spermatogenesis is a highly orchestrated developmental process by which spermatogonia develop into mature spermatozoa. This process involves many testis- or male germ cell-specific gene products whose expressions are strictly regulated. In the past decade the advent of high-throughput gene expression analytical techniques has made functional genomic studies of this process, particularly in model animals such as mice and rats, feasible and practical. These studies have just begun to reveal the complexity of the genomic landscape of the developing male germ cells. Over 50% of the mouse and rat genome are expressed during testicular development. Among transcripts present in germ cells, 40% – 60% are uncharacterized. A number of genes, and consequently their associated biological pathways, are differentially expressed at different stages of spermatogenesis. Developing male germ cells present a rich repertoire of genetic processes. Tissue-specific as well as spermatogenesis stage-specific alternative splicing of genes exemplifies the complexity of genome expression. In addition to this layer of control, discoveries of abundant presence of antisense transcripts, expressed psuedogenes, non-coding RNAs (ncRNA) including long ncRNAs, microRNAs (miRNAs) and Piwi-interacting RNAs (piRNAs), and retrogenes all point to the presence of multiple layers of expression and functional regulation in male germ cells. It is anticipated that application of systems biology approaches will further our understanding of the regulatory mechanism of spermatogenesis.†
Spermatogenesis is a highly orchestrated developmental process by which spermatogonia develop into mature spermatozoa. The whole process can be subdivided into three major phases, namely spermatogoniogenesis, meiosis, and spermiogenesis. Spermatogenesis begins shortly after birth, when the prospermatogonia (also known as gonocytes) cause type A spermatogonia. Type A spermatogonia can either self-renew or differentiate into type B spermatogonia, which then develop as spermatocytes and enter meiosis. During meiosis, two rounds of cell divisions, without intervening DNA replication, occur to reduce the chromosome number to haploid. The spermatids undergo extensive chromatin condensation and morphological changes to emerge as mature spermatozoa (Dym, 1994; Hecht, 1998). This complexly structured mechanism involves many testis- or male germ cell-specific gene products whose expressions are strictly regulated. On top of this, the proper development of male germ cells also depends on the testicular somatic cells such as Sertoli and Leydig cells, which provide growth factors and maintain the micro-environment necessary for the maturation and development of germ cells. As a result, knowing the change in gene expression and genetic programs in the developing germ cells would not only provide us the hints to understanding the regulatory mechanism of spermatogenesis at molecular level, but also to the identification of candidate genes for contraceptive development.
With the advent of large-scale gene expression analytical techniques in the past decade, functional genomic studies of developmental processes such as spermatogenesis become feasible and practical. More than a dozen of gene expression profiling experiments of mammalian spermatogenesis, primarily in mice and rats, had been published within the past few years (Rockett et al., 2001; Wang et al., 2001; Fujii et al., 2002; Li et al., 2002; Sha et al., 2002; Tanaka et al., 2002; Anway et al., 2003; Pang et al., 2003; Schultz et al., 2003; Yu et al., 2003; Almstrup et al., 2004; Guo et al., 2004; Schlecht et al., 2004; Shima et al., 2004; Wu et al., 2004; Pang et al., 2006; Johnston et al., 2008). Only a limited number of human studies were reported (Sha et al., 2002; Chalmel et al., 2007). Among the rodent spermatogenic gene expression studies, the transcriptomes of whole testes (harvested from animals of different ages during the first wave of spermatogenesis) or specific types of germ cells and testicular cells isolated from the testes were examined to different extents with a diverse selection of gene expression profiling tools. For instance, differential display PCR technique had been used to study the overall gene expression changes during mouse spermatogenesis (Almstrup et al., 2004), and subtractive hybridization was adopted to enrich mouse transcripts that are preferentially expressed in adult over juvenile mouse testes (Fujii et al., 2002). A comparison of gene expression pattern between meiotic and post-meiotic male germ cells was performed using the NIA 15k cDNA microarray platform (Pang et al., 2006). In two separate studies (Schultz et al., 2003; Shima et al., 2004), the research groups utilized the same oligonucleotide gene chip array platform to carry out time-course analyses of the mouse testicular transcriptome. In the latter study, gene expression profiles of Sertoli cells, peritubular myoid cells, type A/B spermatogonia, pachytene spermatocytes and round spermatids were examined in parallel. Similar experiments were also performed by other research groups on rat testes and male germ cells (Schlecht et al., 2004; Johnston et al., 2008). Using sequencing-based Serial Analysis of Gene Expression (SAGE) technique, Wu and others reported a comprehensive analysis of the changes in transcriptomes among mouse type A spermatogonia, pachytene spermatocytes, and round spermatids (Wu et al., 2004; Lee et al., 2009).
It is understood that the different gene expression profiling platforms described above have their own advantages and shortcomings. Nevertheless, some remarkable conclusions can be drawn from these studies which revealed the dynamic changes in the transcriptome of developing male germ cells.
One of the outcomes of these studies is the revelation of a large number of genes that are differentially expressed during spermatogenesis. It was estimated that ~50–58% of the mouse genome would be expressed during testis development from birth to adulthood (Schultz et al., 2003; Shima et al., 2004). A similar number (54%) was obtained from the study of rat spermatogenesis (Johnston et al., 2008). Meanwhile, around one-third of the mouse genome is differentially expressed during testicular development (Shima et al., 2004). Some of these genes are either male germ cell-specific or testis-predominant. Specifically, 2.3% of the rat testicular transcriptome is testis-specific (Johnston et al., 2008), and ~4% of the mouse genome is transcribed only in male germ cells (Schultz et al., 2003).
Despite the identification of the many genes that are involved in mammalian spermatogenesis, little is known about their function. This is exemplified by the percentage of uncharacterized/novel transcripts identified in the different studies. For instance, ~60% of the germ cell-enriched transcripts detected in the study by Schultz et al. (2003) represent uncharacterized transcripts. Half of the differentially expressed genes between meiotic and post-meiotic male germ cells are uncharacterized (Pang et al., 2006), and more than 40% of the germ cell transcripts identified by SAGE analysis (Wu et al., 2004) are uncharacterized.
Major peak expression of testicular transcripts was found to occur in three different phases which correspond to the mitotic phase (0–8 days after birth when spermatogonial proliferation predominates), the initiation of meiosis (14 days after birth when early pachytene spermatocytes appear) and the entry into spermiogenesis (20 days after birth when round spermatids first appear) (Wrobel and Primig, 2005). Interestingly, there is an elevation of transcript abundance in meiotic and post-meiotic male germ cells. The number of unique genes expressed in these cells is also significantly higher than that in spermatogonia (Shima et al., 2004). In contrast, many unique spermatogonial genes are down-regulated as male germ cells enter meiosis (Shima et al., 2004; Wu et al., 2004). Similar observation was made in a microarray validation experiment (Pang et al., 2006), in which the authors found that 80% of the genes showing differential expression between meiotic and post-meiotic male germ cells are absent or expressed at relatively low level in type A spermatogonia.
Transcript over-expression (as evaluated by the measurement of polyadenylated RNA level) in meiotic and post-meiotic male germ cells, with respect to other testicular cells, was first evident in an early study with rats (Morales and Hecht, 1994). Such phenomenon may be a bystander effect resulting from the open chromatin structure in the specified cell types, which leads to an overall activation of the transcriptional machinery (Kleene, 2001). On the other hand, it may be a mechanistic outcome to maintain transcript availability in response to the cessation of gene transcription due to chromatin condensation during spermiogenesis (Sassone-Corsi, 2002). The expression of increased number of unique genes may imply a concomitant increase in the demand of specific gene activities for the initiation and maintenance of meiosis-related events as well as the preparation for spermatozoon formation.
Another interesting phenomenon is that most of the transcripts that are first expressed during or after meiosis tend to be testis- or male germ cell-specific (Schultz et al., 2003; Chalmel et al., 2007). On the contrary, most of the genes active in spermatogonia (and Sertoli cells) are also expressed in non-reproductive tissues (Chalmel et al., 2007).
The profiling of gene expression at genomic scale allows us to examine the biochemical characteristics of a specific type of cell or tissue at an unprecedented level. This can be achieved by applying gene ontology analyses on the preferentially expressed genes to identify the key biological processes they are associated with. Such biological process representation analysis had been adopted in several of the male germ cell transcriptome studies (Wrobel and Primig, 2005; Lee et al., 2006; Pang et al., 2006; Chalmel et al., 2007). It was noticed that certain categories of biological processes are distinctively associated with mitotic, meiotic and post-meiotic male germ cells, respectively. For instance, biological processes such as integrin signaling, ribosome biogenesis and assembly, carbohydrate metabolism, protein biosynthesis, RNA processing, cell cycle, DNA replication, chromosome organization and biogenesis, and germ cell development are preferentially associated with type A spermatogonia. Surprisingly, genes involved in embryonic development and gastrulation are also found to be prevalent in these cells. On the other hand, biological processes pertaining to reproduction and spermatogenesis are commonly seen in meiotic and post-meiotic male germ cells. Biological processes such as meiotic cell cycle, chromatin structure and dynamics, chromosome segregation, cytoskeleton, and protein degradation by ubiquitin cycle are over-represented in pachytene spermatocytes. Meanwhile, genes involved in protein turnover, signal transduction, energy metabolism, intracellular transport, ubiquitin cycle, proteolysis, peptidolysis, and fertilization are more prevalent in round spermatids.
It has been observed that genes displaying higher expression level in testis (notably genes related to male meiotic and post-meiotic functions) are often located in the autosomes (Khil et al., 2004). In contrast, genes expressed at earlier stages of spermatogenesis are abundant on the X chromosome (Wang et al., 2001; Khil et al., 2004). This observation was further verified by a recent cross-species male germ cell transcriptome study (Chalmel et al., 2007). The authors noticed an enrichment of loci in the rodent mitotic and somatic expression clusters on X chromosome, but no X-linkage was observed for mouse and rat genes that showed peak expression in male meiotic germ cells. The same is true for the human homologs of mouse meiotic genes: none of them are found on the X chromosome. A similar phenomenon is also observed in a particular subset of genes, the X chromosome-derived autosomal retrogenes and their X-linked progenitor genes. Although not all testis-specific autosomal genes are X-derived retrogenes or retrogenes at all, the absence of X-linkage in general is believed to be the consequence of the selective force imposed by meiotic sex chromosome inactivation (MSCI) which will be further discussed in the following section.
A cross-species comparison of testicular and male germ cell transcriptomes revealed hundreds of genes which show concordant meiotic and post-meiotic expression profiles among human, mouse and rat, implying the existence of “conserved” transcriptome of mammalian spermatogenesis (Wrobel and Primig, 2005; Chalmel et al., 2007). Such findings suggest that the use of rodent models in the study of spermatogenesis is a relevant means in the elucidation of human spermatogenic process and defects. The similar differential expression pattern of the testicular genes across species also suggests the presence of similar regulatory mechanism in the control of their transcription.
The analyses of mammalian testicular/male germ cell transcriptomes provide us an affluent source of information for the characterization of gene functions in male germ cell development. Besides the changes in expression patterns of protein-encoding genes, testis is also known to be a hub of non-coding transcripts such as antisense transcript as well as small and non-coding RNAs, which are implicated and recognized to play an important role in mammalian testis development (Amaral and Mattick, 2008; Hayashi et al., 2008). Also, lots of male germ cell-specific transcripts undergo alternative splicing or are derived from sex-linked progenitor genes through retroposition to generate testis-specific isoforms of gene products to cope with the specific needs in the spermatogenic process. With the availability of the transcriptome data, the expression of, and the generation of, alternative products from the dynamic male germ cell transcriptome can now be better comprehended.
One of the surprising findings of the Human Genome Project is that the number of expressed transcripts is higher than the number of genes predicted based on the genomic sequence. For example, the estimated number of expressed transcripts in humans is 150,000 compared to 32,000 predicted based on the human genome sequence (Ben-Dov et al., 2008). This disparity suggests there may be underlying mechanisms leading to the production of gene complexity and diverse proteome. Such phenomena could be achieved by multiple transcription start site (Quelle et al., 1995), alternative pre-mRNA splicing (Ast, 2004), pre-mRNA editing (Keegan et al., 2001) and post-translational protein modifications (Banks et al., 2000). Among these mechanisms, alternative splicing is currently seen to be a major mechanism contributing to transcriptome diversification. This is achieved by splicing out introns in eukaryotic pre-mRNAs in various ways resulting in several different mRNAs and protein products from a single gene as illustrated in Figure 1. A recent genome-wide analysis shows that up to 60% of human genes have alternatively spliced forms, suggesting that alternative splicing is one of the most significant components of the functional complexity of the human genome (Maniatis and Tasic, 2002; Xu et al., 2002a; Boue et al., 2003).
The testis is a rich source for identifying gene regulation mechanisms because germ cell expansion and differentiation requires many cellular changes and regulatory steps (Yeo et al., 2004). Recent genome-wide analysis of expressed sequence tags (ESTs) showed that the testis has the greatest enrichment of tissue-specific splicing (Xu et al., 2002b). The multiple specialization events by a limited transcriptome during spermatogenesis provide a good model for studying alternative splicing and the identification of novel gene variants in germ cell development. In developing germ cells, the lengths of mRNA transcripts often vary as the cells mature (Eddy, 2002). Perhaps the best known example of alternative splicing in male cell development is cAMP-responsive element modulator (CREM) (Foulkes et al., 1992; Sanborn et al., 1997; Behr and Weinbauer, 2001; Blocher et al., 2005; Jaspers et al., 2007). The CREM regulatory pathway involved in the regulation of spermatogenesis illustrates the power and versatility of alternative splicing. CREM belongs to CREB (cAMP Responsive Element Binding protein) family of transcription factors. After binding to the promoter, CREM either activates or represses transcription (Foulkes et al., 1992; Delmas et al., 1993). When CREM acts as a transcriptional activator, it leads to gene transcription driven by cAMP-repsonsive elements in the promoter region. CREM is expressed primarily in the testes of males and the activity is controlled at the level of RNA processing. During spermatogenesis CREM pre-mRNA is alternatively processed by invoking different exon combinations and polyadenylation site selection.
The molecular basis of how alternative splice sites are selected is still largely unclear. Two general models for control of alternative splicing have been suggested. The first model envisions that alternative splicing events are controlled by specific alternative splicing factors. An example of this model include sex-lethal (Sxl) in Drosophila melanogaster and NOVA-1, a human neuron specific alternative splicing factor (Nagoshi et al., 1988; Jensen et al., 2000)). There are only small number of testis-specific splicing factors identified so far, such as cytotoxic granule-associated RNA binding protein-like 1 (TIAR1), RNA binding motif protein, Y-linked family members (RBMY), fus-like protein (FUS). During spermatogenesis different mechanisms have been proposed in the regulation of alternative splicing. For instance, TIAR1 is implicated in splice site control in spermatogonia and known to be required for germ cell survival (Forch et al., 2002), RBMY interacts with serine-arginine-rich protein (SR protein) to affect splicing and is specifically expressed in the nuclei of spermatogonia and spermatocytes (Venables et al., 2000); while male FUS knock-out mice are infertile and have defective synapsis in prophase of meiosis I (Kuroda et al., 2000). A partial list of transcripts known to be alternative spliced during male germ cell development is shown in Table 1.
Given that alternative RNA splicing is observed in a high proportion of genes and involved in the regulation of spermatogenesis, it is not surprising that alterations in RNA splicing can cause or be modified in diseases. It has been estimated that splicing defects account for nearly 15% of disease-causing mutations in humans, most of which can be attributed to point mutations in the 5′ and 3′ splice sites (Krawczak et al., 1992). Unusual splice isoforms have also been described in various cancers. Thus, characterization of specific splicing isoforms and the associated underlying regulations at different germ cell stages during spermatogensis may provide new therapeutic targets and diagnostics for some male reproductive disorders.
Despite the enormous potential to generate innumerable splice isoforms during germ cell development, only a small fraction of the variant have been observed due to the limitation of conventional experimental techniques. With the completion of the sequencing projects and full length cDNA libraries, global identification of alternative splicing becomes possible. Alignment of ESTs or full length cDNAs to genomic regions creates various splicing gene models by assessing splice-site consensus sequences and comparing transcripts originating from the same genomic location (Xu et al., 2002b; Carninci et al., 2005). The presence of genome sequences and large collections of transcripts provides a rich source for identifying alternative splicing events by computational methods (Kim and Lee, 2008). Tools have been recently developed that can automate much of this process (Noh et al., 2006; Birzele et al., 2008; Castrignano et al., 2008). Although computational prediction provides a systematic way to identify alternative splicing, it also has its limitations, such as quality of ESTs, insufficient numbers of transcript sequences that yields poor gene coverage and under-representation of alternatively spliced isoforms, expression biases that affect abundance, and the inability to distinguish cell type-specific alternative splicing from tissue-specific alternative splicing. Therefore, to validate and identify expressed alternative splicing on genome-wide scale reliably, high throughput expression experiment approaches should be applied. The use of exon-microarrays or sequencing-based method like SAGE allow unbiased interrogation of alternative splicing events without prior knowledge of the transcripts (Frey et al., 2005; Chan et al., 2006a; Cuperlovic-Culf et al., 2006; Calarco et al., 2007; Ruzanov et al., 2007). The identification of splice variants of Crem in the germ cell SAGE libraries demonstrates that SAGE offers an efficient approach to investigate the existence of alternatively processed transcripts (Wu et al., 2004). In this section, we will concentrate on the application of SAGE to identify alternative splicing events in male germ cell development.
SAGE provides a high-throughput means of generating tissue specific gene expression information that is not prone to potential orientation errors and sequence contaminations inherent in EST data. One of the compounding features is that SAGE shows extreme sensitivity for identifying 3′ end transcript population heterogeneity. Alternative splicing is often regulated in a temporal or tissue-specific fashion giving rise to different protein isoforms in different tissues or at different developmental stages (Matter et al., 2002). Specific splicing of a gene at different stages of germ cell development can be identified by mapping and comparing the SAGE tags of a gene to the genomic sequence.
On the basis of the male germ cell SAGE libraries (Wu et al., 2004; Lee et al., 2009), a total of 29,654 SAGE tags were mapped to at least one germ cell stage. Two key steps were applied to find alternative splicing variants of a gene. The first step was to find tags that are not expressed in all stages. i.e., stage-specific. This was followed by tag-to-gene mapping by aligning the tag sequence to the genomic sequence for Unigene cluster retrieval and comparing the location of tags to different alternative splicing models. To screen potential tags representing alternative transcripts at different germ cell stages, a Venn diagram based on tags adjacent to the most 3′ end of Nla III site followed by polyadenylation signal expression was first constructed to identify stage-specific tag expression (Sequence to Tag conversion). Stage-specific tags from the Venn diagram were then mapped to corresponding Unigene clusters based on SAGEmap available at NCBI (http://www.ncbi.nlm.nih.gov/SAGE), which provides tag-to-gene assignment information. To increase the reliability of subsequent mapping, tags with more or equal to three counts were considered. Genes mapped to any unknown or hypothetical transcripts, partial cDNA, image and EST clones were eliminated. Tags mapped to the same gene at different germ cell stages were then regrouped. Alternative splicing transcripts due to 3′ end variations were predicted by aligning the tag sequences to the alternative splicing models based on ECgenes, which combines genome based EST clustering instead of pair-wise alignment of ESTs, and transcript assembly sub-clustering procedures (Kim et al., 2004; Lee et al., 2007). Figure 2 shows the flow of the process for the identification of alternative splice variants.
The number of genes with 3′ end alternative splicing variants (3′ alternative splicing) expressing in type A spermatogonia, pachytene spermatocytes and round spermatids is 74, 58, and 62 (Chan et al., 2006a). Two hundred and seven genes with 3′ alternative splicing expressed in both type A spermatogonia and pachytene spermatocytes, 249 in both type A spermatogonia and round spermatids and 158 in both pachytene spermatocytes and round spermatids. There are 73 genes with different 3′ alternative splicing in all three stages. Novel variants involved in developmental and transcriptional controls are identified, such as heat shock protein 4 (Hspa4), H3 histone, family 3B (H3f3b) and ubiquitin protein ligase E3A (Ube3a). Taken together, SAGE not only provides a rapid global survey of gene expression profile in the germ cell transcriptome, but also allows the identification of novel alternative splicing variants that contribute to the progression of spermatogenesis. Further functional study on the variants will provide new insights in germ cell development during spermatogenesis.
It is well-known that global inhibition of the initiation of mRNA translation occurs in pachytene spermatocytes and round spermatids (Kleene, 2001, 2003). In fact, translation repression is one of the characteristic features of mammalian spermatogenesis (Kleene, 2001). However, little is known about the mechanisms of translational regulation in spermatogenic cells. Antisense transcription may provide one explanation for this phenomenon.
In spite of the fact that many reports documented the occurrence of antisense transcripts (Knee et al., 1997; Korneev et al., 1999; Hastings et al., 2000; Nemes et al., 2000; Li et al., 2002; Kiyosawa et al., 2003; Runkel et al., 2003; Spiess et al., 2003; Vu et al., 2003; Hernandez et al., 2004; Ihalmo et al., 2004; Robb et al., 2004; Solda et al., 2005; Korneev et al., 2008; Orfanelli et al., 2008; Seim et al., 2008), the widespread occurrence of antisense transcripts in humans and mice has only been known in the last decade. Computational analyses estimate 8 to 20% of human and mouse genes form sense-antisense transcript pairs (Fahey et al., 2002; Lehner et al., 2002; Okazaki et al., 2002; Shendure and Church, 2002; Carmichael, 2003; Yelin et al., 2003; Chen et al., 2004; Kampa et al., 2004; Rosok and Sioud, 2004). A study of 10 human chromosomes indicated about 61% of surveyed loci had antisense transcripts (Cheng et al., 2005). A more recent study of five different human cell types found evidence for antisense transcripts in 2900 to 6400 genes (He et al., 2008). Previous efforts to delineate mouse spermatogenic cell gene expression either only focused on a few genes or used the microarray platform which does not permit detection of antisense transcripts (Rockett et al., 2001; Wang et al., 2001; Fujii et al., 2002; Li et al., 2002; Sha et al., 2002; Tanaka et al., 2002; Anway et al., 2003; Pang et al., 2003; Schultz et al., 2003; Yu et al., 2003; Almstrup et al., 2004; Guo et al., 2004; Schlecht et al., 2004; Shima et al., 2004; Pang et al., 2006; Johnston et al., 2008). An antisense transcript, SPEER-2, was observed in late pachytene spermatocytes and early round spermatids in the mouse (Spiess et al., 2003). A systematic search for antisense transcripts in developing spermatogenic cells was only reported recently (Wu et al., 2004; Chan et al., 2006b; Chan et al., 2007).
Using orientation specific reverse transcription-polymerase chain reaction (RT-PCR), 19 of 64 (~31%) genes found to be differentially expressed at different stages of spermatogenesis were shown to have antisense transcripts (Wu et al., 2004; Chan et al., 2006b; Chan et al., 2007). The antisense amplicon of these 19 genes can be divided into three main groups based on the comparison with their sense genes (Chan et al., 2006b) (Fig. 3). In Group 1, the antisense amplicons were 100% complementary to the sense transcripts. This group could be divided into two subgroups; Subgroup 1A genes, which include a disintegrin and metalloprotease domain 5 (Adam5), diazepam binding inhibitor-like 5 (DbiL5), DnaJ (Hsp40) homolog, subfamily B, member 3 (DnajB3), four and a half LIM domains 4 (Fhl4), glucokinase activity related sequence 2 (Gk-rs2), phosducin-like 2 (Pdcl2), peptidylprolyl isomerase C (Ppic), isoform of protein phosphatase 1, catalytic subunit (Ppp1cc), protamine 2 (Prm2), sperm autoantigenic protein 17 (Sap17), associated molecule with the SH3 domain of STAM (Sh3-Stam), and testis specific gene 1 (Tsg1), cause antisense amplicons which are 100% complementary to a single exon of the sense gene. Subgroup 1B gene antisense amplicons are 100% complementary to the sense transcripts which are comprised of more than one spliced exon (two exons for t-complex-associated testis expressed 3 [Tcte3] and three exons for sperm specific lactate dehydrogenase 3, C chain [Ldh3c]). The antisense transcripts of these genes are short (< 300 bp) and cannot be extended beyond the exons identified in the antisense amplicons. Molecular cloning of the antisense transcripts of Tcte3 yielded two transcripts with variable length, both transcripts lack a poly(A) tail. Protamine 1 (Prm1) is the sole member of Group 2; its antisense amplicon is complementary to the exons as well as intron of the sense gene. Molecular cloning identified three antisense transcripts spanning the neighboring Prm1 and Prm2 loci, with sequences overlapping with exonic, intronic, and intergenic sequences of the sense genes. Antisense amplicons of Group 3 genes, which include calmodulin 2 (Calm-2), heat shock 10 kD protein 1 (chaperonin 10) (Ch10), ubiquitin A-52 residue ribosomal protein fusion product 1 (Uba52), and ubiquitin B (Ubb), are not complementary to the sense transcripts. Instead, they are complementary to the pseudogenes on a different chromosome. Thus, in germ cells, anti-sense transcripts can be derived from a wide variety of origins, including processed sense transcripts, intronic and exonic sequences of a single gene or multiple genes, intergenic sequences, and pseudogenes.
Analyses of the spermatogenic antisense transcripts reveal a variety of potential mechanisms by which antisense transcripts are generated (Chan et al., 2006b, 2007). Total complementarity of the antisense amplicons in Group 1B to the mature mRNA suggests that they are synthesized after the sense transcripts were processed. Similar observations have been made previously with antisense transcript of cardiac troponin 1 (Podlowski et al., 2002; Bartsch et al., 2004), hemoglobin β (Volloch et al., 1996; Bonafoux et al., 2004), and rat urocortin (Shi et al., 2000). It has been postulated that these antisense transcripts are transcribed from the sense mRNA in the cytoplasm by RNA-dependent RNA polymerase (RdRP) activity (Bartsch et al., 2004; Rosok and Sioud, 2004; Cheng et al., 2005). Even though a cytoplasmic, mcirosomal bound RdRP had been reported to be partially purified from rabbit reticulocyte lysates (Downey et al., 1973) no subsequent purification of the enzyme protein was reported. The presence of RdRP in eukaryotes remains an unconfirmed observation.
Group 2 antisense transcripts are apparently the products of post-transcription splicing of a larger antisense transcript which is processed by a splicing mechanism similar to that of the sense transcripts (Mount, 1982; Burset et al., 2000). Similar observation was reported in a more recent study that showed about ~1% of antisense fragments in two cell lines examined exhibit splicing (He et al., 2008). Several antisense transcripts have also been reported to arise in a similar manner previously (Knee et al., 1997; Hastings et al., 2000; Nemes et al., 2000; Li et al., 2002; Vu et al., 2003; Hernandez et al., 2004; Robb et al., 2004). The larger antisense transcripts spanning the Prm1 and Prm2 loci are comprised of different exonic, intronic, and intergenic sequences of sense genes. Similar to sense transcripts, these antisense transcripts appear to have arisen through alternative splicing, a phenomenon previously demonstrated in other antisense transcripts (Knee et al., 1997; Li and Murphy, 2000; Li et al., 2002; Alfano et al., 2005).
Pseudogenes can be another source of antisense transcripts as illustrated by the Group 3 anti-sense amplicons. The pseudogenes that cause antisense transcripts of the Group 3 genes are present in the intron of actively transcribed genes (the host genes) (Chan et al., 2007). Apparently, these pseudogenes were retrotransposed and inserted into the intron of the host genes. The direction of transcription of the functional “parent” gene is opposite to that of the host gene. The orientation of the pseudogene is the same as the parent gene. Anti-sense amplicons of the pseudogenes were apparently derived from spliced introns of the transcript of the host genes (see Fig. 4). The Uba52 pseudogene resides in the intron of chromobox homolog 1 (Cbx1); Calm2 pseudogene is in the intron of protein kinase, cAMP regulatory subunit beta (Prkar2b); Ch10 pseudogene is in the intron of trans-activating transcription factor 3 (Sp3); and Ubb pseudogene resides in the intron of cation channel sperm associated 2 (Catsper2). The expression pattern of the parent gene and “host” gene during spermatogenesis are comparable, with the exception of Prkar2b and Calm2, where Prkar2b is only expressed in type A spermatogonia while Calm2 is mainly expressed in pachytene spermatocytes. Different temporal expression of these two genes may imply no interaction between them. Sp3, the host gene of the Ch10 pseudogene, is also only expressed in type A spermatogonia. Expression of Cbx1 falls with differentiation while Catsper2 is preferentially expressed in spermatocytes and spermatids (unpublished observations, Chan, W. Y. and Wu, S.M.). In contrast to the relative abundance of this type of antisense transcript in germ cells, there is only one reported example of pseudogene-derived antisense transcript in somatic cells. The antisense transcript of neural nitric oxide synthase (nNOS) was reported to be transcribed from a pseudogene and regulate the synthesis of nNOS (Korneev et al., 1999). Judging from the proportion of antisense transcripts derived from pseudogenes identified in the male germ cells (Chan et al., 2006b, 2007) and the number of pseudogenes identified in the mouse and human genome (Zhang et al., 2003; Zhang and Gerstein, 2004; Harrison et al., 2005; Khelifi et al., 2005), pseudogenes may be a rich source of antisense transcripts.
Expression studies of antisense transcripts in male germ cells showed that the sense and the antisense transcripts can be regulated independently (Chan et al., 2006b, 2007). Testicular level of the sense transcripts is higher than that of the antisense transcripts in all cases while the relative expression in non-testicular tissues is variable implying tissue specific regulation of antisense transcription. Similar observations had been reported for other genes (Katayama et al., 2005; Nishida et al., 2005; Korneev et al., 2008), particularly imprinted genes (Li et al., 2002; Yamasaki et al., 2003; Lavorgna et al., 2004; Landers et al., 2005; Berteaux et al., 2008). Stage-specific expression of antisense transcripts has also been observed. A recent report showed that 41 genes have spermatogonia-specific antisense transcripts, 29 genes have spermatocyte-specific antisense transcrips and 17 genes have spermatid-specific antisense transcripts (Chan et al., 2007). Thus, the regulation of anti-sense transcription may be dependent on the stage of development during spermatogenesis.
There is increasing evidence that antisense transcripts are intentional transcripts and not the products of leaky transcription of the non-coding strand (Dahary et al., 2005; He et al., 2008; Morris et al., 2008). A role for anti-sense transcription in disease processes has also been suggested (Tufarelli et al., 2003; Reis et al., 2004, 2005; Seitz et al., 2005; Orfanelli et al., 2008).
Several the antisense transcripts described in male germ cells, including those derived from psuedogenes, have open reading frames (Chan et al., 2006b). Even though whether they encode any functional proteins is not clear at present, one could not preclude the possibility that they encode functional proteins similar to protein-encoding antisense transcripts previously reported (Knee et al., 1997; Hastings et al., 2000; Runkel et al., 2003; Spiess et al., 2003). The significance of the majority of antisense transcripts is currently unknown though several biological functions, including transcriptional and translational regulation, have been proposed (Blin-Wakkach et al., 2001; Kiyosawa and Abe, 2002; Ogawa and Lee, 2002; Shendure and Church, 2002; Imamura et al., 2004; Lavorgna et al., 2004; Mattick, 2004; Shibata and Wutz, 2008). Among the proposed functions is the antisense transcript mediated transcriptional silencing through epigenetic modifications at the chromatin level (Weinberg et al., 2006; Morris et al., 2008; Pandey et al., 2008). Antisense transcripts have been suggested to play important roles in the regulation of monoallelic expression in X chromosome inactivation and genomic imprinting (Kiyosawa and Abe, 2002; Ogawa and Lee, 2002; Brown and Chow, 2003; Ogawa et al., 2008). Antisense transcripts derived from pseudogenes suggest a novel function for these otherwise functionless genes in the genome. Aside from potential protein coding capability of the antisense transcripts, they may serve to regulate translation of the sense mRNA or trigger RNAi via forming RNA duplex with its own sense transcript or with the sense transcript of the functional gene as illustrated in Figure 5. Identification of anti-sense transcripts derived from pseudogenes embedded in the intron of anti-parallel genes also suggests a novel function for introns. These introns may serve the role of messenger mediating the interaction between the anti-parallel gene pair.
Antisense transcription may play an important role in regulating gene expression during spermatogenesis. It is tempting to speculate that this vigorous control of gene expression is necessary to ensure the accuracy or to facilitate the functioning of the biological processes occurring during spermatogenesis such as genome-wide methylation–demethylation, genomic imprinting, monoallelic gene expression, etc. The complex nature of the antisense transcripts suggests that they are well suited to provide an additional layer of vigorous regulation of gene expression during male germ cell development.
Past studies largely focused on protein-coding genes, which account for only around 5 to 10% of the human genome (Waterston et al., 2002; Lunter et al., 2006). However, recent large-scale transcriptome studies by whole genome tiling arrays (Johnson et al., 2005; Willingham and Gingeras, 2006), massively parallel signature sequencing (MPSS) (Jin et al., 2008), cap analysis of gene expression (CAGE) on the 5′ end of transcripts (Yasuda and Hayashizaki, 2008), SAGE on the 3′ end of polyadenylated transcripts (Chan et al., 2006a, b), and high-throughput cDNA sequencing by the FANTOM project (Bono et al., 2002) show that the number of expressed sequences is far more than annotated genes based on genomic sequences. A large number of transcripts do not appear to encode for proteins. It is increasingly recognized that in addition to protein-encoding mRNA, genome transcription generates a large number of non-coding RNAs (ncRNAs) (Washietl et al., 2005). Years ago the non-coding sequences in the genome were referred to as “junk” and RNAs that do not encode protein are often considered as results of “leakage” of the transcription machinery. However, present research has highlighted that ncRNAs can have a wide range of functions and can be divided into different classes by sizes or functions (see Fig. 6). These include relatively well-known long members such as transfer RNA (tRNA) and ribosomal RNA (rRNA). The small members of ncRNAs are relatively recent discoveries. The family of ncRNAs has grown substantially and has witnessed a spectacular expansion in their roles in an array of biological processes including developmental timing, metabolism, cell cycle progression, gene silencing, and programmed cell death. Recent findings also suggested some ncRNAs participate in epigenetic regulation such as genomic imprinting, DNA methylation, and histone modification (Rinn et al., 2007; Wutz and Gribnau, 2007; Royo and Cavaille, 2008). The number of species and functions of ncRNA keeps expanding and therefore deciphering the roles of these non-coding RNA genes has emerged as one of the hottest topics in molecular biology today. This section will focus on the biological importance of different ncRNA species in male germ cell development.
MicroRNAs (miRNAs) are small ncRNAs. They are 21 to 23 nt in length, and are generated by a multi-step process in which Dicer, the RNaseIII-containing enzyme, catalyzes the precursors of miRNA to form mature miRNAs (Carmell and Hannon, 2004; Kim, 2005). It is then incorporated into the effector RNA-induced silencing complex (RISC) composed of Argonaute (Ago) proteins (Liu et al., 2004). Depending upon degree of complementarity with their RNA target, miRNAs silence gene expression either by repressing translation or by triggering mRNA degradation in RNA interference (RNAi) (Bartel, 2004). Although over 420 miRNAs have been identified in mouse so far, the biological role is still poorly understood. Despite their relatively recent discovery, it is already clear that miRNAs play important roles in male germ cell development. It has been shown that miR-17–92 and miR-290–295 clusters are highly expressed throughout development of primordial germ cells (PGCs)/spermatogonia and are required for cell cycle progression in PGC (Hayashi et al., 2008). The conditional germ-cell specific Dicer-knockout mice revealed that PGC and spermatogonia exhibit poor proliferation with suppressed retrotransposon activity in PGC but not in spermatogonia. Spermatogenesis was retarded at an early stage of proliferation and differentiation (Mallardo et al., 2008). Dicer is not only important in spermatogonial stem cell stage but also appears to be expressed in all stages of the seminiferous epithelial cycle. It interacts with a germ cell-specific chromatoid body component, the RNA helicase MVH (Kotaja et al., 2006). Deletion of Dicer1 in PGC results in similar observations; the mutant animals are subfertile and exhibit spermatogenic defects (Maatouk et al., 2008). Though not directly demonstrated, it is tempting to speculate these effects of Dicer ablation are consequential of the disruption of miRNA processing. Occurrence of testis- or germ cell-specific mRNAs suggests potential presence of miRNA preferentially or specifically expressed in these cells. In fact, miRNA expression profiling assays revealed that 60% of the testis-expressed miRNAs are ubiquitously expressed and the remaining are either preferentially (35%) or exclusively (5%) expressed in the testis (Ro et al., 2007). Further evidence of the importance of miRNA in germ cell development is provided by the observation of aberrant expression of miRNA in germ cell tumor (GCT) (Gillis et al., 2007; Voorhoeve et al., 2007).
Piwi-interacting RNAs (piRNAs) are small ncRNAs (26–31 nt in length, depending on subfamily members) that were identified independently in testes by several laboratories (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006; Lau et al., 2006; Watanabe et al., 2006). piRNAs were cloned after co-purification of small RNAs with a spermatogenesis specific Argonaute protein family member (including Mili, Miwi and Miwi2) and demonstrated distinctive localization pattern in the genome. They constitute the biggest small RNA family; 50,000 species have been identified and the total population is estimated to be around 200,000 (Betel et al., 2007). piRNAs associate with either Mili, which is found in earlier germ cells (spermatogonia to pachytene spermatocytes), or Miwi, which is present in more mature germ cells after the pachytene spermatocyte stage (Deng and Lin, 2002; Kuramochi-Miyagawa and others, 2001). Mili-associated piRNAs typically have 26–28 nucleotides while Miwi-associated piRNAs are larger, usually contain 29 to 31 nucleotides (Aravin et al., 2006). Recent studies suggested that piRNAs play a role in a number of biological processes including transcriptional suppression pathways (Aravin et al., 2007; Brennecke et al., 2007; Gunawardane et al., 2007; Houwing et al., 2007), histone modification (Yin and Lin, 2007), and silencing of repressor through RNAi-mediated (Pal-Bhadra et al., 2002; Pal-Bhadra et al., 2004; Pelisson et al., 2007).
Recent studies revealed the presence of long ncRNAs (>200 nt in length) (Kampa, 2004; Carninci et al., 2005). The expression of long ncRNA is prevalent throughout the mammalian transcriptome and the configuration is comparable to the protein coding RNAs (Carninci et al., 2005; Engström et al., 2006). The regulatory element, instead of the nucleotide sequence of the long ncRNA, is often conserved (Ponjavic et al., 2007). The lack of sequence conservation was interpreted by some to suggest they may simply be “transcription noise”. However, some long ncRNAs such as Air and Xist, even though poorly conserved (Nesterova et al., 2001), have well characterized functions (Brockdorff et al., 1992; Sleutels et al., 2002). In fact, long ncRNAs have been found to participate in a variety of cellular processes, including transcriptional regulations in normal and abnormal cellular development (Feng et al., 2006; Goodrich and Kugel, 2006; Pennacchio et al., 2006; Calin et al., 2007; Visel et al., 2008), post-transcriptional regulation including splicing (Beltran et al., 2008), translation (Wang et al., 2005), and siRNA processing (Wang et al., 2005; Golden et al., 2008) and epigenetic regulations (Sanchez-Elsner et al., 2006; Rinn et al., 2007). Examples of long ncRNA with developmental roles include HOTAIR (Rinn et al., 2007), Xist (Duret et al., 2006), PINC, Evf2 (Feng et al., 2006). Recent array studies also identified novel long ncRNAs actively involved in embryonic stem cell pluripotency and differentiation (Dinger et al., 2008), brain development (Mercer et al., 2008) and male germ cell cancer (Perez et al., 2008). Given the extensive functional coverage of long ncRNAs in cellular regulation and development, it is highly likely that long ncRNAs may also play a critical role in male germ cell development and provide insights on cellular transitions and maintenance in each key germ cell stage. One of the stumbling blocks with long ncRNAs research, however, is that there is no clue about their biological functions. Unlike protein coding RNAs, ncRNAs are comprised of a heterogeneous group of genes that fulfill diverse biological roles. Conventional prediction methods based on primary sequence or structural features cannot be applied. Given the vast number of ncRNAs, novel high-throughput screening approaches are needed to elucidate their biological functions.
Male germ cells enter meiosis after mitotic expansion. Shortly after the zygotene-pachytene transition, the X and Y chromosomes are condensed and compartmentalized into a cytological structure called “sex body” or “XY body” (McKee and Handel, 1993). Meiotic sex chromosome inactivation (MSCI) describes the process of transcriptional silencing of the sex chromosomes that occurs during the meiotic phase (pachytene stage) of spermatogenesis as a result of the condensation of sex chromosomes (Turner, 2007). Originally thought to be limited to male meiosis only, the repression of sex-linked gene transcription is now known to extend to the post-meiotic phase. As males carry only one copy of X chromosome, it is conceivable that the loss of X-linked gene activities would jeopardize the survival of half of the species that possess differentiated sex chromosomes. To maintain the availability of sex-linked gene activities throughout and beyond male meiosis, two major mechanisms are believed to have evolved to overcome the transcriptional silencing event. The most acknowledged strategy is the generation of autosomal copies of X-linked genes (i.e., X-derived autosomal retrogenes) by retroposition which is believed to be important to the generation of functional gene duplicates in new genomic positions (Long et al., 2003). The second strategy, which was illustrated recently, is to increase X-linked gene copy numbers by gene amplification (Mueller et al., 2008). In this section, we will focus our discussion on the involvement of retrogenes in male germ cell development.
During retroposition, a mature transcript of the progenitor gene is reverse-transcribed into cDNA and integrated into the genome. Most retroposition events generate nonfunctional retroposed gene copies (processed pseudogenes) because they lack promoter elements for transcription and may have accumulated degenerative mutations (e.g., introduction of stop codons and frameshift mutations). Some of them, however, are transcribed and retain protein coding potential and are thus named retrogenes. Several genome-wide searches of retrogene movement among chromosomes have been published recently (Emerson et al., 2004; Vinckenbosch et al., 2006; Shiao et al., 2007; Potrzebowski et al., 2008). These studies predicted that at least ~100 progenitor-retrogene pairs, for which the retrogenes do not share chromosome linkage with the progenitors, exist in the genome of human, mouse, dog or opossum (Emerson et al., 2004; Potrzebowski et al., 2008). Further analyses on the chromosomal locations revealed a large excess of retrogenes to be originated from the X chromosome when comparing to autosomes. The excess X chromosome retroposons is not observed for processed pseudogenes, suggesting that functional retrogenes showing X-to-autosome movement are preserved by natural selection, which is believed to be driven by MSCI. Interestingly, the majority of X-derived retrogenes are specifically or predominantly expressed in the testis (particularly in meiotic and post-meiotic male germ cells), and they display functional roles in spermatogenesis after leaving the X chromosome when compared to autosome-derived retrogenes (Potrzebowski et al., 2008). In contrast, their X-linked progenitor genes demonstrate broad expression patterns in different somatic tissues but are down-regulated starting from male meiosis. Such reciprocal spermatogenic expression pattern is coherent with the compensation hypothesis (McCarrey and Thomas, 1987) that autosomal retrogenes are evolved to compensate for the loss of their X-linked progenitors during male meiosis. Since autosomes would not be subjected to MSCI effect, genes encoding products essential to male meiotic and post-meiotic activities would be fixed on autosomes on evolutionary scale. In this regard, any loss-of-function mutation to these retrogenes would be expected to incur spermatogenic defects, a phenomenon that has been demonstrated in several rodent studies (Bradley et al., 2004; Miki et al., 2004; Rohozinski and Bishop, 2004; Rohozinski et al., 2006).
The prevalence of X-derived autosomal retrogene expression in the testis leads to the interesting question of how their tissue-specificity is regulated. The majority of retroposition events are known to occur in the male germline (Khil et al., 2005). It has been predicted that most retrogenes, especially younger ones, would initially be transcribed in the testis because of the facilitated transcription in this tissue (Vincken-bosch et al., 2006; Shiao et al., 2007) and thus they would evolve a functional role in the testis first. Despite these observations, little is known about the mechanism of X-derived autosomal retrogene transcription and how exactly the testis-specificity of these genes is accomplished. In humans, a significant excess of retrogenes are found to locate close to other genes or within introns, suggesting retrogenes can exploit the open chromatin of neighboring genes to drive their transcription. In this regard, genomic regions surrounding retrogenes are found to be transcriptionally more active than those surrounding transcriptionally silent retrocopies (Vinckenbosch et al., 2006). Retrogenes can also achieve competence in transcription by “hitchhiking” the regulatory elements of host genes (e.g., by promoter element acquisition or chimeric transcript formation) or acquiring untranslated exons de novo (Vinckenbosch et al., 2006; Shiao et al., 2007).
At the molecular level, there are examples that consistently point to the involvement of epigenetic factors, notably DNA methylation, in the regulation of tissue specificity of X-derived autosomal retrogenes. Hypermethylation of CpG dinucleotides has been observed in retrogenes such as Pdha2 (Iannello et al., 1997), Pgk2 (Geyer et al., 2004) and Ard1b (Pang et al., in press) in somatic cells/tissues that do not show endogenous expression of the respective genes. On the contrary, these genes are exclusively or predominantly expressed in the testis, with which a correlation of the extent of DNA hypomethylation and the upregulation of gene expression is observed (Pang et al., in press). It is not yet certain if DNA methylation is commonly employed to regulate X-derived autosomal retrogene transcription in the testis. A detailed examination of the genomic landscapes of all X-derived autosomal retrogenes may thus provide important clues on the involvement of epigenetic factors, including differentially methylated regions (Yagi et al., 2008) and histone modifications, (Berger, 2007) in the regulation of their expression.
Systems biology examines the networks behind complex functionality. This requires a shift toward quantitative biology. The process of knowledge generation in systems biology involves several key steps as illustrated in Figure 7. Experimental paradigm in systems biology involves integration of experimental and various supporting evidence from the literature. This is an iterative, integrated process of data mining and collection. To facilitate data exchange and processing, the information has to be in digital format and able to be processed by the computer using different computational algorithms (in silico). Data mining and integration agglomerate sufficient detail for the generation of mathematical model, which allows the generation of various hypotheses about the characteristics and behavior of the system to be derived. These hypotheses can then be cross-validated in silico or in in vitro or in vivo experiments. The ultimate goal is to develop global representations of the biological system of the cell and the entire organism.
The first step of system biology research starts with data capture. Genomic studies involve large sets of genes or their products in the form of RNA expression (transcriptome), protein expression (proteome), protein–protein or protein–DNA interactions (interac-tome) by high-throughput genomic assays. It spawns new disciplines including various “omics” approaches to capture expression information. Methods such as microarray and SAGE discussed in the earlier sections are among the popular tools to survey the expression from the genomes. The generation of various datasets from omics approaches is the prerequisite for getting to the basic level of the biological system - the data level. Most of the biological information in the past decades is characterized in forms of qualitative and descriptive formats. Because of the complexity of biology and various type of information involved, creating a consolidated database is the essential step for the subsequent model generation. Development of high-throughput technologies in recent years facilitated the exponential growth in the size and the details of various databases which often reach petabyte scale (Cochrane et al., 2008). Specialized databases have been developed. The knowledge (data) base can be generally classified into primary database, theme-based database, and pathway and interaction database. Some of the frequently used databases in systems biology are shown in Table 2.
Primary databases are usually the “hub” of basic biological information. These include microarray database Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) and ArrayExpress at the European Bioinformatics Institute (EMBL-EBI), SAGE database (SAGEmap) (Lash et al., 2000), microRNA database (miR-Base) (Griffiths-Jones et al., 2008), mammalian non-protein-coding RNA database (RNAdb) (Pang et al., 2007), etc. They serve as repositories for expression datasets from high-throughput genomic assays.
Other than sequence context, functional context-based databases are evolved with the advent of omics data. A microarray experiment may result in hundreds of differentially expressed genes that are subject to interpretation and further analysis. As analyzing these lists gene-by-gene is tedious and error prone, the concept of Gene Ontology (GO) has emerged. The main objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process, and cellular component of gene products across different species. The GO terms can then be associated to various datasets and provide insights in terms of unified biological vocabularies.
While most databases mentioned covers a wide variety of information in different organisms. Some databases are specialized in certain biological phenomenon or species. GermSAGE is a comprehensive transcriptome database for mouse male germ cells derived from SAGE experiments (Lee et al., 2009). A total of 452,095 tags derived from type A spermatogonia, pachytene spermatocytes and round spermatids were included. It allows browsing, comparing and searching male germ cell transcriptome data at different stages with customizable searching parameters. The data can be visualized in a tabulated format or further analyzed by aligning with various annotations available in the UCSC genome browser. This flexible platform is useful for gaining better understanding of the genetic networks that regulate spermatogonial cell renewal and differentiation, and will allow novel gene discovery. Another germ cell database, GermOnline, mainly stores microarray data related to the study of mitosis, meiosis, germline development and gametogenesis across species (Gattiker et al., 2007).
To extract information from the datasets, bioinformatics methods are used to identify trends and correlations of different genes from the prospective of expression level, sequence context or other intrinsic biological features, including potential protein domains and cis or trans-regulatory elements. Through the integration of informatics and results from different omics studies at different levels, data on individual components or functions are amassed.
To transit from the knowledge of individual component to the functional relationship between the components is the key to the understanding of an entire biological system. It is necessary to collect information on the interactions between the components in the system at different levels and on the dynamics of the interactions to obtain insights on the properties of complex systems such as regulation, control and adaptation. This integration step is facilitated by the availability of pathway and interaction databases.
The explosion in the amount of biological data and information in recent years has made the construction of complex network of biological associations possible. Examples of public resources for finding associations among gene sets and known biological pathways include BioCarta which is useful for charting pathways, BIND which is a molecular interaction database, Reactome which provides information on core human pathways, etc. Some of the pathway and interaction databases are listed in Table 2.
With sufficient biological knowledge, data can be assimilated for the next step - modeling and hypothesis generation. A model is the attempt to create an abstract representation from different knowledge or data resources, such as experimental observations on the structure and function of a particular biological element. Decoding the complexity of male germ cell development by SAGE data represents such an approach (Lee et al., 2006). Instead of resorting to descriptive analysis, the dynamics contained in the dataset was retrieved by applying clustering algorithm similar to microarray studies. The clustered gene sets demonstrating preferential expression for the three germ cell stages were exported for gene network analysis. Based on the unique, curated database of protein-protein and protein-DNA interactions, transcriptional factors, signaling, metabolism and bioactive molecules, a network specific for each germ cell stage was created, which allowed data visualization on a list of relevant biological networks (Fig. 8A–C). The figures not only link the gene members in the clusters, but also provide the functionality to identify co-regulated genes ranked by statistics and scores. This provides a foundation for the generation of novel biological hypotheses for studying spermatogenesis.
Spermatogenesis is essential for the perpetuation of the human race. Germ cells are the only cells in our body that undergo mitosis and meiosis. It is not surprising that germ cells express the largest number of genes. It is also not surprising that genetic processes that occur in different somatic cells are also operative in germ cells. The advances brought forth by the Human Genome Project give us tools to examine the complex genomic and genetic landscape of the germ cells. The discoveries of the biological activities of alternative splice variants, anti-sense transcripts, pseudogenes, retrogenes, miRNA, piRNA, long ncRNAs, etc., in male germ cells represent just the tip of the iceberg. It is almost certain that application of systems biology approaches to study male germ cells will bring more surprises in the foreseeable future.
Grant sponsor: NIH (Intramural Research Program), Eunice Kennedy Shriver National Institute of Child Health and Human Development.
†This article is a US Government work and, as such, is in the public domain in the United States of America.
Tin-Lap Lee, Section on Developmental Genomics, Laboratory of Clinical Genomics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland.
Alan Lap-Yin Pang, Section on Developmental Genomics, Laboratory of Clinical Genomics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland.
Owen M. Rennert, Section on Developmental Genomics, Laboratory of Clinical Genomics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland.
Wai-Yee Chan, Section on Developmental Genomics, Laboratory of Clinical Genomics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, Department of Pediatrics, Georgetown University College of Medicine, Washington, DC.