The use of two different combination functions to investigate the network enables the development of an investigative methodology that supports hypothesis generation through systematic network exploration. The top 1000 edges as scored by either function generate a network comprised of 945 genes and 1,743 total edges. This collection of high scoring edges is organized as 92 pairs, 15 triplets, seven small clusters (<10 nodes), one large ‘yarnball’ (551 nodes), and three medium-sized clusters (comprising 27 to 51 nodes) (). One of the medium-sized sub-networks (total 45 nodes, 107 edges, is analyzed in detail here (circled in ), illustrating a typical use of the Hanalyzer.
Sub-network explanation guided by the Average combination network That sub-network contains 50 edges from the Average combination graph, involving 20 nodes (); 15 edges asserted solely by the Average metric and 35 asserted by both the Average and Logit measures. By browsing the annotations associated with these 20 genes and their protein products it quickly became apparent that the theme common to this sub-network is muscle (
Table S1). Nineteen of the 20 nodes have at least one reference to ‘muscle’ within their annotations or description, with the most informative descriptive terms being the GO Biological Process terms “muscle contraction” GO:0006936 (and children, including “regulation of muscle contraction” GO:0006937) and “muscle development” GO:0007517, together annotating 15 of the 20 nodes. It is also of interest to note that the majority of the nodes (13 of 20) in this network belong to one of three well characterized muscle protein families (Actin, Myosin and Troponin), suggesting that this network is involved in force generation and structural integrity of muscle.
The single apparent exception to this muscle theme was
Thbs4 (Thrombospondin 4, MGI:1101779). Direct searching of PubMed identified a role for Thbs4 (also known as TSP-4) in muscle formation. Thbs4 is secreted by developing tendon mesenchyme cells, and is part of a local signaling process involving the protein ankyrin repeat domain 1 (
Ankrd1; MGI:1097717) which couples tendon morphogenesis to muscle formation
[86] (note that
Ankrd1 was called “muscle ankyrin repeat protein” or
marp in that paper).
Thbs4 is expressed at high levels (and in complementary patterns) to
Ankrd1 during myogenesis through late embryogenesis and is still observed postnatally
[86].
This network is intriguing because of its strong muscle theme and because the expression profile of the nodes within this network is striking in its mandibular specificity (). The expression of this group of 20 genes is consistently and exclusively up-regulated in the mandibular sample as development progresses from E10.5–12.5. The literature indicates that this expression profile is consistent with tongue muscle development; the tongue being the largest single muscle mass in the head and located within the mandible. At approximately E11, the migration of myogenic cells from the occipital somites into the tongue primordia is considered complete, with myoblasts continuing to proliferate and differentiate until around E15 when they fuse and withdraw from the cell cycle
[87]. Desmin (
Des, MGI:94885) mRNA is detected as early as E10, consistent with its marking early steps in skeletal myogenesis, such as myoblast determination
[88]. Also,
Thbs4 has been shown to promote myogenic differentiation specifically in the tongue, which due to its lack of cartilage, links muscle groups through a tendinous scaffold
[86].
This same group of genes is also up-regulated at the later E12–12.5 time point in the maxilla sample, consistent with a later onset of all other muscle cell differentiation in relation to the tongue. Skeletal muscle development is staggered, with the tongue maturing approximately 1.5 days (in mice) earlier than all other skeletal muscles. The more advanced stage of tongue muscle development at birth is thought to correlate with its requirement for mammalian suckling immediately after birth
[88]. The lack of significant muscle in the frontonasal prominence accounts for the low level of expression of these genes in that tissue. The systematically reported and easily explored collection of relevant background knowledge made the interpretation of this complex set of evidence regarding the broad developmental function of a complex group of interacting genes much more straightforward than it would have been using any other approach with which we are familiar.
Hypothesis generation guided by the Logit combination network Once the well understood aspects of the sub-network had been explored and a biological explanation for the observations created, the analyst adds the edges asserted only by the Logit metric to the visualization of the sub-network. The inclusion of Logit-asserted edges introduced an additional 25 nodes to the network (total 45 nodes), and expanded the network to 107 edges (). These 107 edges consist of 48 Logit-only edges, 18 Average edges (note the additional 3 Average edges linked into the network via connection to nodes introduced by the Logit edges) and 41 edges asserted by both Logit and Average metrics. The nodes comprising this larger network display the same striking mandible-specific expression pattern of the Average combination network, suggesting these additional nodes may also be implicated in tongue development ().
Although nine of these additional nodes expand the core cluster described above, the majority of nodes form two new clusters tethered to the initial group by one to four edges. Browsing the collated annotations associated with these additional nodes allowed rapid insight into common functional themes. These annotations indicated that the two additional clusters represent myogenic differentiation (six nodes) and synapse interactions (eight nodes) ( and
Table S2). Within the synapse cluster the most informative annotations are the KEGG annotation “Neuroactive ligand-receptor interaction” KEGG:mmu04080 and the GO Cellular Component term “postsynaptic membrane” GO:0045211, which together annotate all six members of this cluster. All eight nodes within the transcription cluster are, unsurprisingly, annotated with the GO Biological Process “transcription” GO:0006350, and five of these nodes also have a documented muscle-related knock out phenotype. The specific genes and interactions in each of these three clusters are explored in turn, and several are selected for experimental validation.
The first cluster investigated we called the core cluster. Of the nine additional nodes contributing to the structural cluster, four (Cdh15, Nrk, Fndc5, and E430002G05Rik; MGI:106672, MGI:1351326, MGI:1917614 and MGI:2445082, respectively) lack annotations from our experts suggesting a role in either muscle, or more generally, craniofacial development. Supplementary investigation of the literature and publicly available expression data was required to extrapolate the muscle association of these four genes.
In contrast to the other ‘unannotated’ nodes,
Cdh15 (also known as M-Cadherin, M denoting muscle
[89]) is a very well studied gene with a number of associated publications (23 references tied to its MGI record alone [accessed 4/23/2008]). It has long been known that
Cdh15 is expressed in myogenic cells and has a role in skeletal muscle differentiation, as indicated by low level expression in skeletal myoblasts followed by an increased expression in myotube forming cells
[89]. Its precise role during muscle development and regeneration is yet to be determined however, and a recent
Cdh15 null mouse model with apparently normal muscle phenotype suggesting functional compensation by other cadherin proteins
[90].
The lack of information linking
Cdh15 with muscle development highlights the persisting problem of organism-specific gene name normalization. While
Cdh15 is the only official gene symbol, there are two approved names for the resultant protein product; Cadherin 15 and M-Cadherin (myotubule) [Data from HUGO,
www.genenames.org Accessed 5/1/2008], and to confuse things further, both names are only used in the human records for this gene (Both GeneBank [NM_004933] and Entrez Gene [ID: 1013] use “Homo sapiens cadherin 15, M-cadherin (myotubule) (CDH15), mRNA” as their definition).
The literature indicates that the Ste20-type kinase, NIK-related kinase (
Nrk) is predominantly expressed in developing skeletal musculature from E10.5 through E17 during mouse embryogenesis; however,
Nrk expression is not detected in any adult tissues, including skeletal muscles
[91]. Limited RNA expression data obtained from GenePaint.org
[92], also appears to show
Nrk expression in E14.5 tongue (GenePaint set ID: MH1818, section Embryo_C1818_1_4B).
In the developing embryo, the recently characterized fibronectin type III domain containing 5 gene (
Fndc5, also known as
PeP and
Pxp; data from iHop
[93]) is almost exclusively expressed in developing skeletal muscle
[94]. Absent at E7,
Fndc5 expression is first detected in whole embryos at E11, and at E13.5 is specifically observed in the tongue and other skeletal muscles
[94]. A role during myoblast differentiation is indicated by a two-fold increase in expression during the transition from myoblasts into myotubes, after which expression stabilizes and continues into and throughout adulthood
[94].
Finally, investigation of the Riken clone
E430002G05Rik presented little informative annotation. A single GeneRif identified from the associated EntrezGene entry (GeneID: 210622) yielded all information ascertained about this gene via the associated publication. This single publication
[95] identified mRNAs affected in a mouse model (
mdx) for Duchenne muscular dystrophy (DMD).
E430002G05Rik was identified as a down-regulated transcript in the
mdx mouse and subsequently named
RAMP (Regeneration-associated muscle protease homolog)
[95]. It was observed that
RAMP is predominantly expressed in normal adult skeletal muscle and brain, and that it is specifically up-regulated in regenerating skeletal muscle fibers after injury
[95]. The absence of any annotation regarding development prompted the selection of this gene for further experimental validation.
We called the second cluster explored the
Transcription Factor Cluster. Although well annotated as transcription factors, information provided by reading experts on
Pitx3,
Rxrg and
Zim1 (MGI:1100498, MGI:98216, and MGI:1341879, respectively) did not suggest roles in muscle development (
Table S2), prompting further investigations.
Pitx3 is well characterized and annotated with respect to its role in lens formation during eye development
[96],
[97]. However, literature searching revealed that tongue-specific expression of
Pitx3 (also known as
Ptx3) during development (expression first detected at E11.5) was documented over a decade ago
[98], while its specific role in myogenesis and myoblast differentiation has only more recently been reported
[99].
Known and annotated principally for its role in mediating the effects of retinoic acid, there also exists extensive literature associating
Rxrg (retinoid X receptor gamma) with myoblast differentiation. This association was not asserted by any of the reading experts, although 117 papers were returned by PubMed search with query “rxr muscle” (accessed 4/25/2008), also suggesting difficulties in species-specific gene name normalization. As early as 1993, RXRs were identified as positive regulators of skeletal muscle development via their direct interactions with Myogenin and MyoD promotor elements
[100],
[101], and the role of
Rxrg in muscle continues to be explored, with the most recent associated publication identifying a role in lipogenesis and SREBP1c regulation in skeletal muscle
[102]. A high-throughput study identifying transcription units involved in brain development
[103] indirectly documented the tongue-specific expression profile of
Rxrg in E13.5 mice (image MGI:3507450), with the same expression pattern weakly persevering in E14.5 mice (GenePaint.org set ID: C1279, section Embryo_C1279_6_3D).
Significantly less is known about the zinc-finger gene,
Zim1. In mouse, this gene is part of an imprinted cluster that includes
Zim2 (MGI:1923887) and
Peg3 (MGI:104748)
[104], but a
Zim1 ortholog has not been identified to date in human. Therefore, it has been proposed that
Zim1 is a recent addition to the mouse genome that was derived via a local duplication of
Zim2. In mice,
Zim1 is maternally imprinted and is only expressed during embryogenesis, notably in the limb bud and therefore it has been suggested as having a role in limb development
[105]. Limited and unannotated RNA expression information was available from additional studies in the mouse
[103]; however, these did not address
Zim1 expression in the developing face. We therefore selected
Zim1 for experimental validation, as there was only limited knowledge of this gene and its function in mouse facial and muscle development.
Although well studied in craniofacial development, we also selected
Hoxa2 (MGI:96174) for further analysis as its expression is not normally associated with branchial arch 1, which gives rise to the mandible. Indeed,
Hoxa2 has a strong anterior limit of expression in the neural crest cells originating in rhombomere 4 that generate the mesenchyme of the second branchial arch. Moreover, the absence of Hox gene expression in more rostral tissues, including the first branchial arch, has been postulated to have enabled the evolution of the vertebrate head
[106],
[107],
[108],
[109],
[110],
[111],
[112]. We therefore decided to explore this potential novel domain of
Hoxa2 expression in more detail.
The third cluster explored was called the
synapse cluster. All the nodes contributing to the synapse cluster are unambiguously implicated in neuromuscular signaling. However, two additional nodes (
Ablim3 and
Apobec2; MGI:2442582 and MGI:1343178 respectively) fail to fit neatly into any cluster, and instead appear to straddle the synapse interaction and muscle structure clusters.
Ablim3 annotation includes both the GO Molecular Function term “actin binding” GO:0003779 as well as the KEGG annotation “Axon guidance” KEGG:mmu04360. However, the annotation associated with
Apobec2 strongly indicates a role in RNA editing and processing, but gives no indication of a role in muscle (
Table S2).
The
Apobec2-associated literature revealed little consensus regarding its function.
Apobec2 has been documented as an ancestral, cardiac and skeletal muscle-specific member of the
Apobec family implicated in muscle regeneration
[113]. It has also been described as a ubiquitously expressed protein with cytidine deaminase RNA editing activity
[114]. Apobec2 knockout mice appear viable and fertile
[113] but no examination of the tongue was reported.
Apobec2 was selected for further biological investigation due to the sparse nature of current associated knowledge and its possible function in the tongue muscle development.