Identification of a topology-independent, AR-dependent gene expression program
We had previously identified an AR-dependent gene expression signature by comparing normal male foreskin fibroblasts to those cultured from diverse sites in the genitals of patients with documented CAIS. Subsequent reports that cultured fibroblasts retain topographic transcriptional memory (gene expression signatures that reflect their site of biopsy) led to concerns that the AR-dependent gene expression signature we identified could have been affected by topological differences in the fibroblast samples used. To test for this potential confounder, we repeated microarray experiments on seven independent strains of normal male scrotal fibroblasts (S1, S4, S5, S8, S9, S11, and S12) and duplicate samples of four labia majora fibroblasts derived from 46, XY individuals with CAIS due to proven inactivating mutations of the AR-gene. Both fibroblasts were derived from identical anlagen: the labioscrotal swellings. The SAM procedure revealed 612 transcripts representing 440 unique genes that differed significantly in expression level between the groups at a false discovery rate of 0.038 (Figure ).
Figure 1 Transcripts with significant differences of expression levels between normal scrotum and CAIS labia majora. Transcript levels of 612 genes identified by SAM analysis as differing between fibroblasts derived from normal male scrotum (green) and labia majora (more ...)
The new topology-controlled data set showed some similarities to the AR-dependent gene set we had identified previously, with 42 unique transcripts found in both gene sets, including 34 transcripts that were up-regulated in normal male derived fibroblasts, and 8 up-regulated in the CAIS female-derived fibroblasts. Genes up-regulated in normal male fibroblasts in both data sets included TBX3 (T-box 3), CBX6 (Chromobox homologue 6), IGFBP5 (Insulin-like growth factor binding protein 5), and EGFR (epidermal growth factor receptor) while several others were no longer identified as significantly different between the data sets, such as TBX2 (T-box 2), TBX5 (T-box 5), BMP4 (bone morphogenetic protein 4), HOXA13 (Homeobox A13), WNT2 (Wingless-type MMTV integration site family, member 2), and FOXF2 (Forkheadbox F2). The significant change in the gene lists strongly suggested that topology influenced the gene expression signatures identified in our original series of experiments.
Topology independent AR gene expression program classifies diverse AIS samples
To evaluate the relevance of the topology-controlled, AR dependent gene list, we tested its ability to classify 72 microarray experiments performed on fibroblast samples derived from 51 individuals that included normal males, normal females, and individuals with PAIS and CAIS (Fig. and table ). AIS samples were graded according to the system suggested by Sinnecker [10
] wherein phenotypically male genitalia are scored AIS 1, while female external genitalia are scored AIS 5 (Fig. ). Stringent filtering conditions of the combined data sets reduced the number of transcripts from 612 to 259; however, altering the stringency of the filtering conditions and the number of transcripts used did not significantly change the clustering pattern of the individual samples (data not shown).
Figure 2 Cluster analysis of normal male fibroblasts from scrotum and foreskin as well as 46, XY individuals with PAIS and CAIS. Hierarchical clustering analysis of 72 microarray experiments of cultured genital fibroblasts using the SAM derived gene list. The (more ...)
Clustering separated the 72 experiments into two major subgroups. The righthand major branch included predominantly patients with female external genitalia while most of the patients in the lefthand major branch had normal male or highly virilized external genitalia (Fig. and ). All but one of the samples with CAIS (AIS 5) and a normal 46, XX female clustered in the right ("female") branch (Fig. ). Interestingly, the one exception expressed a wild type AR in a portion of cells due to somatic mosaicism (ARD465). Two skin fibroblast samples from normal males derived from regions without an obvious androgen induced sexual dimorphism (abdomen, forearm) also clustered in the right ("female") branch. The left ("male") major branch contained all genital skin fibroblasts derived from normal male controls and 8 of 10 microarry experiments reflecting patients with higher degrees of virilization due to partial AIS (AIS 2). This cluster also included a fibroblast sample from an individual with 5α-reductase type II deficiency, a defect which results in ambiguous genitalia due to lack of conversion of testosterone to dihydrotestosterone. This individual presumably possesses a wild-type AR, meaning that androgen signaling pathways remained intact. Three of the four labioscrotal fibroblast samples from individuals with AIS 3 phenotypes with significant genital ambiguity clustered in the "female" major branch and the remaining in the "male" one (Fig. ). Of note, clustering did not appear to be influenced by array type or RNA reference-type, indicating that normalization procedures did not influence data quality.
Structure within the cluster dendrogram suggested that there was some residual influence of topology on gene expression in the samples. In the righthand "female" branch, fibroblast samples derived from AIS gonads clustered separately from all skin-derived samples (Fig. ). Similarly, the lefthand "male" branch showed a subcluster that contained all the foreskin-derived fibroblasts, including the AIS 2 fibroblasts originating from the foreskin. Interestingly, this branch also contained two strains of labia minora fibroblasts from two 46, XX individuals, one of whom had ambiguous genitalia (Prader stage 3) due to 21-hydroxylase deficiency (female pseudohermaphroditism), while the other individual was a normal female. Since the labia minora are analogous to the urethral folds that participate in penile morphogenesis, this finding suggests that topographic origin influenced expression within the selected set of genes of these two samples more than AR signaling. In some cases, the anatomic origin of biopsy was not well documented (table ). This might explain why some samples did not cluster as expected (e.g., ARD380 and ARD659, Fig. ) although other factors might have contributed to these findings.
We also wanted to reconsider whether the new topology-independent, AR gene list better classified samples than our previous gene list that did not control for the locations from which samples were harvested. After removing the microarray experiments that were used to define either the previous [8
] or the new gene set by SAM, the remaining samples were clustered. As expected, in case of both gene lists the samples sorted into two main branches that separated primarily male and female samples (Fig. and ). Moreover, when we considered only topology-controlled samples originating from the labioscrotal swellings excluding the mosaic samples and those of insufficiently described biopsy localization, the mean AIS-grades between the two branches differed significantly using both gene lists (new gene list: AIS-grades: 2.5 ± 0.55 (male); 4.17 ± 1.12 (female); p < 0.001 by t-test; previous gene list (Holterhus et al. 2003): AIS-grades: 1.6 + 0.79 (male); 3.3 + 1.6 (female); p < 0.01 by t-test). However, in contrast to the new gene set, the previous gene set that did not account for topology misclassified many individuals. It resulted in incorrectly female classification of most of the highly virilized individuals with AIS 2 (3 of 4 individuals = 75%) and of a large fraction of the normal male scrotal fibroblast controls (3 of 7 individuals = 43%) (Fig. and ). The new gene set misclassified only one individual with AIS 2 (ARD306).
Figure 3 Experiment clustering with topography controlled (A) – versus previous (B)  gene set. Hierarchical clustering analyses of microarray experiments using the new topology-independent AR gene list (A) and our previous gene list  that did not (more ...)
Biological processes in the AR-dependent gene expression program
We performed a systematic analysis for enrichment of genes belonging to defined biological processes and cellular pathways using the PANTHER classification system [11
]. PANTHER classifies genes by their functions, based on published experimental evidence and on evolutionary relationships. The 612 significant transcripts corresponded to 527 named transcripts, of which PANTHER recognized 440 unique gene IDs. Several related biological processes were significantly over-represented in the AR-dependent gene list including "control of cell proliferation and differentiation" (p = 0.00001, "developmental processes" (p = 0.00013) and "cell cycle control" (p = 0.00041) (table ). Analysis of cellular pathways also revealed several interesting signaling pathways including "angiogenesis" (p = 0.00001) and WNT-signaling" (p = 0.00002) (table ). These processes and pathways were reflected in the major branches of the cluster dendrogram revealing differential expression in the phenotypically male and female samples (Figures and ). For instance, samples in the "male" branch showed high expression of CCN1 (cyclin 1), CCND1 (cyclin D1), IGF2 (insulin-like growth factor 2), IGFBP5 (Insulin-like growth factor binding protein 5), MYC (V-myc myelocytomatosis viral oncogene homolog), MAFF (V-maff musculoaponeurotic fibrosarcoma oncogene homolog F), EGFR (epidermal growth factor receptor), PTPN3 (protein tyrosine phophatase), MET (hepatocyte growth factor receptor) and several other genes important in cell growth and proliferation (Fig. and ). Transcripts expressed at high levels in the "female" branch included ANAPC7 (anaphase promoting complex subunit 7), FZD8 (frizzled homolog 8) and FZD6 (frizzled homolog 6) (Fig. ).
The AR-dependent gene set showed enrichment for a number of genes related to maintenance and modification of tissue shape and structural identity (Tables and , and Fig. ). Genes that were up-regulated in the predominately male branch were SDC1 (syndecan 1), FMOD (fibromodulin) and ADAMTS2 (A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 2). Those up-regulated mainly in the female branch included ADAM12 (A disintegrin and metalloproteinase 12), TNC (tenascin C), CSPG2 (chondroitin sulfate proteoglycan 2, versican), FBN1 (fibrillin 1), ELN (elastin), COL5A2 (collagen, type V, alpha2), ECM2 (extracellular matrix protein 2, female organ and adipocyte specific), DAG1 (dystroglycan 1), and SPOCK (sparc/osteonectin, cwcv kazal-like domains proteoglycan, testican).
Confirmation of selected transcripts by RT-PCR
Four transcripts were selected for confirmation of the microarray data by semi-quantitative RT-PCR: TNC (tenascin), FZD8 (frizzled 8), ADAM12 (ADAM metallopeptidase 12), and CSPG2 (chondroitin sulfate proteoglycane 2, versican). RNA from the labia majora fibroblasts derived from CAIS affected individuals (ARD402, ARD411, ARD682, ARD1097), as well as four samples of the normal male scrotal fibroblasts (S4, S5, S8, S9) were used for analysis. In agreement with the microarray data, TNC, FZD8, ADAM12 and CSPG2 show significantly higher expression in the labia majora fibroblasts compared to the scrotal fibroblasts (Fig. ).
Figure 4 Verification of selected genes by RT-PCR. The ratio of transcript levels of TNC (Tenascin), FZD8 (Frizzled 8), ADAM12 (ADAM metallopeptidase 12), and CSPG2 (Chondroitin sulfate proteoglycane 2, versican) comparing CAIS cell lines and normal scrotal cell (more ...)