Search tips
Search criteria

Results 1-25 (36)

Clipboard (0)
Year of Publication
Document Types
1.  Cynomolgus monkey testicular cDNAs for discovery of novel human genes in the human genome sequence 
BMC Genomics  2002;3:36.
In order to contribute to the establishment of a complete map of transcribed regions of the human genome, we constructed a testicular cDNA library for the cynomolgus monkey, and attempted to find novel transcripts for identification of their human homologues.
The full-insert sequences of 512 cDNA clones were determined. Ultimately we found 302 non-redundant cDNAs carrying open reading frames of 300 bp-length or longer. Among them, 89 cDNAs were found not to be annotated previously in the Ensembl human database. After searching against the Ensembl mouse database, we also found 69 putative coding sequences have no homologous cDNAs in the annotated human and mouse genome sequences in Ensembl.
We subsequently designed a DNA microarray including 396 non-redundant cDNAs (with and without open reading frames) to examine the expression of the full-sequenced genes. With the testicular probe and a mixture of probes of 10 other tissues, 316 of 332 effective spots showed intense hybridized signals and 75 cDNAs were shown to be expressed very highly in the cynomolgus monkey testis, but not ubiquitously.
In this report, we determined 302 full-insert sequences of cynomolgus monkey cDNAs with enough length of open reading frames to discover novel transcripts as human homologues. Among 302 cDNA sequences, human homologues of 89 cDNAs have not been predicted in the annotated human genome sequence in the Ensembl. Additionally, we identified 75 dominantly expressed genes in testis among the full-sequenced clones by using a DNA microarray. Our cDNA clones and analytical results will be valuable resources for future functional genomic studies.
PMCID: PMC140308  PMID: 12498619
2.  Outliers involving the Poly(A) effect among highly-expressed genes in microarrays 
BMC Genomics  2002;3:35.
The Poly(A) effect is a cross-hybridization artifact in which poly(T)-containing molecules, which are produced by the reverse transcription of a poly(A)+ RNA mixture, bind promiscuously to the poly(A) stretches of the DNA in microarray spots. It is customary to attempt to block such hybridization by adding poly(A) to the hybridization solution. This note describes an experiment intended to evaluate circumstances under which the blocking procedure may not have been successful.
The experiment involves a spot-by-spot comparison between the hybridization signals obtained by hybridizing a microarray to: (1) end-labeled oligo(dT), versus, (2) cDNA prepared from muscle tissue. We found that the blocking appears to be successful for the vast majority of microarray spots, as evidenced by the weakness of the correlation between signals (1) and (2). However, we found that for microarray spots having oligo(dT) hybridization levels greater than a certain threshold, the blocking might be ineffective or incomplete, as evidenced by an exceptionally strong signal (2) whenever signal (1) is greater than the threshold.
The PolyA effect may be more subtle than simply a hybridization signal that is proportional to the PolyA content of each microarray spot. It may instead be present only in spots that hybridize oligo(dT) greater than some threshold level. The strong signal generated at these "outlier" spots by cDNA probes might be due to the formation of hybridization heteropolymers.
PMCID: PMC140022  PMID: 12479797
3.  DNA sequence conservation between the Bacillus anthracis pXO2 plasmid and genomic sequence from closely related bacteria 
BMC Genomics  2002;3:34.
Complete sequencing and annotation of the 96.2 kb Bacillus anthracis plasmid, pXO2, predicted 85 open reading frames (ORFs). Bacillus cereus and Bacillus thuringiensis isolates that ranged in genomic similarity to B. anthracis, as determined by amplified fragment length polymorphism (AFLP) analysis, were examined by PCR for the presence of sequences similar to 47 pXO2 ORFs.
The two most distantly related isolates examined, B. thuringiensis 33679 and B. thuringiensis AWO6, produced the greatest number of ORF sequences similar to pXO2; 10 detected in 33679 and 16 in AWO6. No more than two of the pXO2 ORFs were detected in any one of the remaining isolates. Dot-blot DNA hybridizations between pXO2 ORF fragments and total genomic DNA from AWO6 were consistent with the PCR assay results for this isolate and also revealed nine additional ORFs shared between these two bacteria. Sequences similar to the B. anthracis cap genes or their regulator, acpA, were not detected among any of the examined isolates.
The presence of pXO2 sequences in the other Bacillus isolates did not correlate with genomic relatedness established by AFLP analysis. The presence of pXO2 ORF sequences in other Bacillus species suggests the possibility that certain pXO2 plasmid gene functions may also be present in other closely related bacteria.
PMCID: PMC140023  PMID: 12473162
4.  The catalytic domains of thiamine triphosphatase and CyaB-like adenylyl cyclase define a novel superfamily of domains that bind organic phosphates 
BMC Genomics  2002;3:33.
The CyaB protein from Aeromonas hydrophila has been shown to possess adenylyl cyclase activity. While orthologs of this enzyme have been found in some bacteria and archaea, it shows no detectable relationship to the classical nucleotide cyclases. Furthermore, the actual biological functions of these proteins are not clearly understood because they are also present in organisms in which there is no evidence for cyclic nucleotide signaling.
We show that the CyaB like adenylyl cyclase and the mammalian thiamine triphosphatases define a novel superfamily of catalytic domains called the CYTH domain that is present in all three superkingdoms of life. Using multiple alignments and secondary structure predictions, we define the catalytic core of these enzymes to contain a novel α+β scaffold with 6 conserved acidic residues and 4 basic residues. Using contextual information obtained from the analysis of gene neighborhoods and domain fusions, we predict that members of this superfamily may play a central role in the interface between nucleotide and polyphosphate metabolism. Additionally, based on contextual information, we identify a novel domain (called CHAD) that is predicted to functionally interact with the CYTH domain-containing enzymes in bacteria and archaea. The CHAD is predicted to be an alpha helical domain, and contains conserved histidines that may be critical for its function.
The phyletic distribution of the CYTH domain suggests that it is an ancient enzymatic domain that was present in the Last Universal Common Ancestor and was involved in nucleotide or organic phosphate metabolism. Based on the conservation of catalytic residues, we predict that CYTH domains are likely to chelate two divalent cations, and exhibit a reaction mechanism that is dependent on two metal ions, analogous to nucleotide cyclases, polymerases and certain phosphoesterases. Our analysis also suggests that the experimentally characterized members of this superfamily, namely adenylyl cyclase and thiamine triphosphatase, are secondary derivatives of proteins that performed an ancient role in polyphosphate and nucleotide metabolism.
PMCID: PMC138802  PMID: 12456267
5.  Domain-oriented functional analysis based on expression profiling 
BMC Genomics  2002;3:32.
Co-regulation of genes may imply involvement in similar biological processes or related function. Many clusters of co-regulated genes have been identified using microarray experiments. In this study, we examined co-regulated gene families using large-scale cDNA microarray experiments on the human transcriptome.
We present a simple model, which, for each probe pair, distills expression changes into binary digits and summarizes the expression of multiple members of a gene family as the Family Regulation Ratio. The set of Family Regulation Ratios for each protein family across multiple experiments is called a Family Regulation Profile. We analyzed these Family Regulation Profiles using Pearson Correlation Coefficients and derived a network diagram portraying relationships between the Family Regulation Profiles of gene families that are well represented on the microarrays. Our strategy was cross-validated with two randomly chosen data subsets and was proven to be a reliable approach.
This work will help us to understand and identify the functional relationships between gene families and the regulatory pathways in which each family is involved. Concepts presented here may be useful for objective clustering of protein functions and deriving a comprehensive protein interaction map. Functional genomic approaches such as this may also be applicable to the elucidation of complex genetic regulatory networks.
PMCID: PMC137579  PMID: 12456268
6.  Optimization and evaluation of T7 based RNA linear amplification protocols for cDNA microarray analysis 
BMC Genomics  2002;3:31.
T7 based linear amplification of RNA is used to obtain sufficient antisense RNA for microarray expression profiling. We optimized and systematically evaluated the fidelity and reproducibility of different amplification protocols using total RNA obtained from primary human breast carcinomas and high-density cDNA microarrays.
Using an optimized protocol, the average correlation coefficient of gene expression of 11,123 cDNA clones between amplified and unamplified samples is 0.82 (0.85 when a virtual array was created using repeatedly amplified samples to minimize experimental variation). Less than 4% of genes show changes in expression level by 2-fold or greater after amplification compared to unamplified samples. Most changes due to amplification are not systematic both within one tumor sample and between different tumors. Amplification appears to dampen the variation of gene expression for some genes when compared to unamplified poly(A)+ RNA. The reproducibility between repeatedly amplified samples is 0.97 when performed on the same day, but drops to 0.90 when performed weeks apart. The fidelity and reproducibility of amplification is not affected by decreasing the amount of input total RNA in the 0.3–3 micrograms range. Adding template-switching primer, DNA ligase, or column purification of double-stranded cDNA does not improve the fidelity of amplification. The correlation coefficient between amplified and unamplified samples is higher when total RNA is used as template for both experimental and reference RNA amplification.
T7 based linear amplification reproducibly generates amplified RNA that closely approximates original sample for gene expression profiling using cDNA microarrays.
PMCID: PMC137577  PMID: 12445333
7.  Mutation screening of two candidate genes from 13q32 in families affected with Bipolar disorder: human peptide transporter (SLC15A1) and human glypican5 (GPC5) 
BMC Genomics  2002;3:30.
Multiple candidate regions as sites for Schizophrenia and Bipolar susceptibility genes have been reported, suggesting heterogeneity of susceptibility genes or oligogenic inheritance. Linkage analysis has suggested chromosome 13q32 as one of the regions with evidence of linkage to Schizophrenia and, separately, to Bipolar disorder (BP). SLC15A1 and GPC5 are two of the candidate genes within an approximately 10-cM region of linkage on chromosome 13q32. In order to identify a possible role for these candidates as susceptibility genes, we performed mutation screening on the coding regions of these two genes in 7 families (n-20) affected with Bipolar disorder showing linkage to 13q32.
Genomic organization revealed 23 exons in SLC15A1 and 8 exons in GPC5 gene respectively. Sequencing of the exons did not reveal mutations in the GPC5 gene in the 7 families affected with BP. Two polymorphic variants were discovered in the SLC15A1 gene. One was T to C substitution in the third position of codon encoding alanine at 1403 position of mRNA in exon 17, and the other was A to G substitution in the untranslated region at position 2242 of mRNA in exon 23.
Mutation analysis of 2 candidate genes for Bipolar disorder on chromosome 13q32 did not identify any potentially causative mutations within the coding regions or splice junctions of the SLC15A1 or GPC5 genes in 7 families showing linkage to 13q32. Further studies of the regulatory regions are needed to completely exclude these genes as causative for Bipolar disorder.
PMCID: PMC140024  PMID: 12392603
Bipolar disorder; Mutation screening; SLC15A1; GPC5
8.  A reference database for tumor-related genes co-expressed with interleukin-8 using genome-scale in silico analysis 
BMC Genomics  2002;3:29.
The EST database provides a rich resource for gene discovery and in silico expression analysis. We report a novel computational approach to identify co-expressed genes using EST database, and its application to IL-8.
IL-8 is represented in 53 dbEST cDNA libraries. We calculated the frequency of occurrence of all the genes represented in these cDNA libraries, and ranked the candidates based on a Z-score. Additional analysis suggests that most IL-8 related genes are differentially expressed between non-tumor and tumor tissues. To focus on IL-8's function in tumor tissues, we further analyzed and ranked the genes in 16 IL-8 related tumor libraries.
This method generated a reference database for genes co-expressed with IL-8 and could facilitate further characterization of functional association among genes.
PMCID: PMC131052  PMID: 12377104
9.  Assessment of differential gene expression in human peripheral nerve injury 
BMC Genomics  2002;3:28.
Microarray technology is a powerful methodology for identifying differentially expressed genes. However, when thousands of genes in a microarray data set are evaluated simultaneously by fold changes and significance tests, the probability of detecting false positives rises sharply. In this first microarray study of brachial plexus injury, we applied and compared the performance of two recently proposed algorithms for tackling this multiple testing problem, Significance Analysis of Microarrays (SAM) and Westfall and Young step down adjusted p values, as well as t-statistics and Welch statistics, in specifying differential gene expression under different biological states.
Using SAM based on t statistics, we identified 73 significant genes, which fall into different functional categories, such as cytokines / neurotrophin, myelin function and signal transduction. Interestingly, all but one gene were down-regulated in the patients. Using Welch statistics in conjunction with SAM, we identified an additional set of up-regulated genes, several of which are engaged in transcription and translation regulation. In contrast, the Westfall and Young algorithm identified only one gene using a conventional significance level of 0.05.
In coping with multiple testing problems, Family-wise type I error rate (FWER) and false discovery rate (FDR) are different expressions of Type I error rates. The Westfall and Young algorithm controls FWER. In the context of this microarray study, it is, seemingly, too conservative. In contrast, SAM, by controlling FDR, provides a promising alternative. In this instance, genes selected by SAM were shown to be biologically meaningful.
PMCID: PMC137578  PMID: 12354329
10.  Cross-species hybridisation of pig RNA to human nylon microarrays 
BMC Genomics  2002;3:27.
The objective of this research was to investigate the reproducibility of cross-species microarray hybridisation. Comparisons between same- and cross-species hybridisations were also made. Nine hybridisations between a single pig skeletal muscle RNA sample and three human cDNA nylon microarrays were completed. Three replicate hybridisations of two different amounts of pig RNA, and of human skeletal muscle RNA were completed on three additional microarrays.
Reproducibility of microarray hybridisations of pig cDNA to human microarrays was high, as determined by Spearman and Pearson correlation coefficients and a Kappa statistic. Variability among replicate hybridisations was similar for human and pig data, indicating the reproducibility of results were not compromised in cross-species hybridisations. The concordance between data generated from hybridisations using pig and human skeletal muscle RNA was high, further supporting the use of human microarrays for the analysis of gene expression in the pig. No systematic effect of stripping and re-using nylon microarrays was found, and variability across microarrays was minimal.
The majority of genes generated highly reproducible data in cross-species microarray hybridisations, although approximately 6% were identified as highly variable. Experimental designs that include at least three replicate hybridisations for each experimental treatment will enable the variability of individual genes to be considered appropriately. The use of cross-species microarray analysis looks promising. However, additional validation is needed to determine the specificity of cross-species hybridisations, and the validity of results.
PMCID: PMC130049  PMID: 12354330
11.  In silico and in situ characterization of the zebrafish (Danio rerio) gnrh3 (sGnRH) gene 
BMC Genomics  2002;3:25.
Gonadotropin releasing hormone (GnRH) is responsible for stimulation of gonadotropic hormone (GtH) in the hypothalamus-pituitary-gonadal axis (HPG). The regulatory mechanisms responsible for brain specificity make the promoter attractive for in silico analysis and reporter gene studies in zebrafish (Danio rerio).
We have characterized a zebrafish [Trp7, Leu8] or salmon (s) GnRH variant, gnrh3. The gene includes a 1.6 Kb upstream regulatory region and displays the conserved structure of 4 exons and 3 introns, as seen in other species. An in silico defined enhancer at -976 in the zebrafish promoter, containing adjacent binding sites for Oct-1, CREB and Sp1, was predicted in 2 mammalian and 5 teleost GnRH promoters. Reporter gene studies confirmed the importance of this enhancer for cell specific expression in zebrafish. Interestingly the promoter of human GnRH-I, known as mammalian GnRH (mGnRH), was shown capable of driving cell specific reporter gene expression in transgenic zebrafish.
The characterized zebrafish Gnrh3 decapeptide exhibits complete homology to the Atlantic salmon (Salmo salar) GnRH-III variant. In silico analysis of mammalian and teleost GnRH promoters revealed a conserved enhancer possessing binding sites for Oct-1, CREB and Sp1. Transgenic and transient reporter gene expression in zebrafish larvae, confirmed the importance of the in silico defined zebrafish enhancer at -976. The capability of the human GnRH-I promoter of directing cell specific reporter gene expression in zebrafish supports orthology between GnRH-I and GnRH-III.
PMCID: PMC126252  PMID: 12188930
12.  Mouse ribonuclease III. cDNA structure, expression analysis, and chromosomal location 
BMC Genomics  2002;3:26.
Members of the ribonuclease III superfamily of double-stranded(ds)-RNA-specific endoribonucleases participate in diverse RNA maturation and decay pathways in eukaryotic and prokaryotic cells. A human RNase III orthologue has been implicated in ribosomal RNA maturation. To better understand the structure and mechanism of mammalian RNase III and its involvement in RNA metabolism we determined the cDNA structure, chromosomal location, and expression patterns of mouse RNase III.
The predicted mouse RNase III polypeptide contains 1373 amino acids (~160 kDa). The polypeptide exhibits a single C-terminal dsRNA-binding motif (dsRBM), tandem catalytic domains, a proline-rich region (PRR) and an RS domain. Northern analysis and RT-PCR reveal that the transcript (4487 nt) is expressed in all tissues examined, including extraembryonic tissues and the midgestation embryo. Northern analysis indicates the presence of an additional, shorter form of the transcript in testicular tissue. Fluorescent in situ hybridization demonstrates that the mouse RNase III gene maps to chromosome 15, region B, and that the human RNase III gene maps to a syntenic location on chromosome 5p13-p14.
The broad transcript expression pattern indicates a conserved cellular role(s) for mouse RNase III. The putative polypeptide is highly similar to human RNase III (99% amino acid sequence identity for the two catalytic domains and dsRBM), but is distinct from other eukaryotic orthologues, including Dicer, which is involved in RNA interference. The mouse RNase III gene has a chromosomal location distinct from the Dicer gene.
PMCID: PMC122089  PMID: 12191433
13.  High-resolution physical map for chromosome 16q12.1-q13, the Blau syndrome locus 
BMC Genomics  2002;3:24.
The Blau syndrome (MIM 186580), an autosomal dominant granulomatous disease, was previously mapped to chromosome 16p12-q21. However, inconsistent physical maps of the region and consequently an unknown order of microsatellite markers, hampered us from further refining the genetic locus for the Blau syndrome. To address this problem, we constructed our own high-resolution physical map for the Blau susceptibility region.
We generated a high-resolution physical map that provides more than 90% coverage of a refined Blau susceptibility region. The map consists of four contigs of sequence tagged site-based bacterial artificial chromosomes with a total of 124 bacterial artificial chromosomes, and spans approximately 7.5 Mbp; however, three gaps still exist in this map with sizes of 425, 530 and 375 kbp, respectively, estimated from radiation hybrid mapping.
Our high-resolution map will assist genetic studies of loci in the interval from D16S3080, near D16S409, and D16S408 (16q12.1 to 16q13).
PMCID: PMC122098  PMID: 12186634
14.  Pervasive properties of the genomic signature 
BMC Genomics  2002;3:23.
The dinucleotide relative abundance profile can be regarded as a genomic signature because, despite diversity between species, it varies little between 50 kilobase or longer windows on a given genome. Both the causes and the functional significance of this phenomenon could be illuminated by determining if it persists on smaller scales. The profile is computed from the base step "odds ratios" that compare dinucleotide frequencies to those expected under the assumption of stochastic equilibrium (thorough shuffling). Analysis is carried out on 22 sequences, representing 19 species and comprised of about 53 million bases all together, to assess stability of the signature in windows ranging in size from 50 kilobases down to 125 bases.
Dinucleotide relative abundance distance from the global signature is computed locally for all non-overlapping windows on each sequence. These distances are log-normally distributed with nearly constant variance and with means that tend to zero slower than reciprocal square root of window size. The mean distance within genomes is larger for protist, plant, and human chromosomes, and smaller for archaea, bacteria, and yeast, for any window size.
The imprint of the global signature is locally pervasive on all scales considered in the sequences (either genomes or chromosomes) that were scanned.
PMCID: PMC126251  PMID: 12171605
15.  Multigene family isoform profiling from blood cell lineages 
BMC Genomics  2002;3:22.
Analysis of cell-selective gene expression for families of proteins of therapeutic interest is crucial when deducing the influence of genes upon complex traits and disease susceptibility. Presently, there is no convenient tool for examining isoform-selective expression for large gene families. A multigene isoform profiling strategy was developed and used to investigate the inwardly rectifying K+ (Kir) channel family in human leukocytes. Comprised of seven subfamilies, Kir channels have important roles in setting the resting membrane potential in excitable and non-excitable cells.
Gene sequence alignment allowed determination of "islands" of amino acid homology, and sub-family "centred" priming permitted simultaneous co-amplification of each family member. Validation and cross-priming analysis was performed against a panel of cognate Kir channel clones. Radiolabelling and diagnostic restriction digestion of pooled PCR products enabled determination of distinct Kir gene expression profiles in pure populations of human neutrophils, eosinophils and lung mast cells, with conservation of Kir2.0 isoforms amongst the leukocyte subsets. We also identified a Kir2.0 channel product, which may potentially represent a novel family member.
We have developed a novel, rapid and flexible strategy for the determination of gene family isoform composition in any cell type with the additional capacity to detect hitherto unidentified family members and verified its application in a study of Kir channel isoform expression in human leukocytes.
PMCID: PMC122081  PMID: 12167175
16.  Comparative analysis of somitogenesis related genes of the hairy/Enhancer of split class in Fugu and zebrafish 
BMC Genomics  2002;3:21.
Members of a class of bHLH transcription factors, namely the hairy (h), Enhancer of split (E(spl)) and hairy-related with YRPW motif (hey) (h/E(spl)/hey) genes are involved in vertebrate somitogenesis and some of them show cycling expression. By sequence comparison, identified orthologues of cycling somitogenesis genes from higher vertebrates do not show an appropriate expression pattern in zebrafish. The zebrafish genomic sequence is not available yet but the genome of Fugu rubripes was recently published. To allow comparative analysis, the currently known Her proteins from zebrafish were used to screen the genomic sequence database of Fugu rubripes.
20 h/E(spl)/hey-related genes were identified in Fugu, which is twice the number of corresponding zebrafish genes known so far. A novel class of c-Hairy proteins was identified in the genomes of Fugu and Tetraodon. A screen of the human genome database with the Fugu proteins yielded 10 h/E(spl)/hey-related genes. By analysing the upstream sequences of the c-hairy class genes in zebrafish, Fugu and Tetraodon highly similar sequence stretches were identified that harbour Suppressor of hairless paired binding sites (SPS). This motif was also discovered in the upstream sequences of the her1 gene in the examined fish species. Here, the Su(h) sites are separated by longer intervening sequences.
Our study indicates that not all her homologues in zebrafish have been isolated. Comparison to the human genome suggests a selective duplication of h/E(spl) genes in pufferfish or loss of members of these genes during evolution to the human lineage.
PMCID: PMC126015  PMID: 12160468
17.  "In-gel" purified ditags direct synthesis of highly efficient SAGE Libraries 
BMC Genomics  2002;3:20.
SAGE (serial analysis of gene expression) is a recently developed technique for systematic analysis of eukaryotic transcriptomes. The most critical step in the SAGE method is large scale amplification of ditags which are then are concatemerized for the construction of representative SAGE libraries. Here, we report a protocol for purifying these ditags via an 'in situ' PAGE purification method. This generates ditags free of linker contaminations, making library construction simpler and more efficient.
Ditags used to generate SAGE libraries were demarcated 'in situ' on preparative polyacrylamide gels using XC and BPB dyes, which precisely straddle the ditag band when a 16% PAGE gel (19:1 acrylamide:bis, 5% cross linker) is used to resolve the DNA bands. Here, the ditag DNA was directly excised from gel without visualization via EtBr or fluorescent dye staining, resulting in highly purified ditag DNA free of contaminating linkers. These ditags could be rapidly self ligated even at 4°C to generate concatemers in a controlled manner, which in turn enabled us to generate highly efficient SAGE libraries. This reduced the labor and time necessary, as well as the cost.
This approach greatly simplified the ditag purification procedure for constructing SAGE libraries. Since the traditional post-run staining with EtBr or fluorescent dyes routinely results in cross contamination of a DNA band of interest by other DNA in the gel, the dry gel DNA excision method described here may also be amenable to other molecular biology techniques in which DNA purity is critically important.
PMCID: PMC122078  PMID: 12153707
18.  Defining signal thresholds in DNA microarrays: exemplary application for invasive cancer 
BMC Genomics  2002;3:19.
Genome-wide or application-targeted microarrays containing a subset of genes of interest have become widely used as a research tool with the prospect of diagnostic application. Intrinsic variability of microarray measurements poses a major problem in defining signal thresholds for absent/present or differentially expressed genes. Most strategies have used fold-change threshold values, but variability at low signal intensities may invalidate this approach and it does not provide information about false-positives and false negatives.
We introduce a method to filter false-positives and false-negatives from DNA microarray experiments. This is achieved by evaluating a set of positive and negative controls by receiver operating characteristic (ROC) analysis. As an advantage of this approach, users may define thresholds on the basis of sensitivity and specificity considerations. The area under the ROC curve allows quality control of microarray hybridizations. This method has been applied to custom made microarrays developed for the analysis of invasive melanoma derived tumor cells. It demonstrated that ROC analysis yields a threshold with reduced missclassified genes in microarray experiments.
Provided that a set of appropriate positive and negative controls is included on the microarray, ROC analysis obviates the inherent problem of arbitrarily selecting threshold levels in microarray experiments. The proposed method is applicable to both custom made and commercially available DNA microarrays and will help to improve the reliability of predictions from DNA microarray experiments.
PMCID: PMC117791  PMID: 12123529
19.  Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein 
BMC Genomics  2002;3:18.
The largest open reading frame in the Saccharomyces genome encodes midasin (MDN1p, YLR106p), an AAA ATPase of 560 kDa that is essential for cell viability. Orthologs of midasin have been identified in the genome projects for Drosophila, Arabidopsis, and Schizosaccharomyces pombe.
Midasin is present as a single-copy gene encoding a well-conserved protein of ~600 kDa in all eukaryotes for which data are available. In humans, the gene maps to 6q15 and encodes a predicted protein of 5596 residues (632 kDa). Sequence alignments of midasin from humans, yeast, Giardia and Encephalitozoon indicate that its domain structure comprises an N-terminal domain (35 kDa), followed by an AAA domain containing six tandem AAA protomers (~30 kDa each), a linker domain (260 kDa), an acidic domain (~70 kDa) containing 35–40% aspartate and glutamate, and a carboxy-terminal M-domain (30 kDa) that possesses MIDAS sequence motifs and is homologous to the I-domain of integrins. Expression of hemagglutamin-tagged midasin in yeast demonstrates a polypeptide of the anticipated size that is localized principally in the nucleus.
The highly conserved structure of midasin in eukaryotes, taken in conjunction with its nuclear localization in yeast, suggests that midasin may function as a nuclear chaperone and be involved in the assembly/disassembly of macromolecular complexes in the nucleus. The AAA domain of midasin is evolutionarily related to that of dynein, but it appears to lack a microtubule-binding site.
PMCID: PMC117441  PMID: 12102729
20.  GPR99, a new G protein-coupled receptor with homology to a new subgroup of nucleotide receptors 
BMC Genomics  2002;3:17.
Based on sequence similarity, the superfamily of G protein-coupled receptors (GPRs) can be subdivided into several subfamilies, the members of which often share similar ligands. The sequence data provided by the human genome project allows us to identify new GPRs by in silico homology screening, and to predict their ligands.
By searching the human genomic database with known nucleotide receptors we discovered the gene for GPR99, a new orphan GPR. The mRNA of GPR99 was found in kidney and placenta. Phylogenetic analysis groups GPR99 into the P2Y subfamily of GPRs. Based on the phylogenetic tree we propose a new classification of P2Y nucleotide receptors into two subgroups predicting a nucleotide ligand for GPR99. By assaying known nucleotide ligands on heterologously expressed GPR99, we could not identify specifically activating substances, indicating that either they are not agonists of GPR99 or that GPR99 was not expressed at the cell surface. Analysis of the chromosomal localization of all genes of the P2Y subfamily revealed that all members of subgroup "a" are encoded by less than 370 kb on chromosome 3q24, and that the genes of subgroup "b" are clustered on one hand to chromosome 11q13.5 and on the other on chromosome 3q24-25.1 close to the subgroup "a" position. Therefore, the P2Y subfamily is a striking example for local gene amplification.
We identified a new orphan receptor, GPR99, with homology to the family of G protein-coupled nucleotide receptors. Phylogenetic analysis separates this family into different subgroups predicting a nucleotide ligand for GPR99.
PMCID: PMC117779  PMID: 12098360
21.  Obtaining reliable information from minute amounts of RNA using cDNA microarrays 
BMC Genomics  2002;3:16.
High density cDNA microarray technology provides a powerful tool to survey the activity of thousands of genes in normal and diseased cells, which helps us both to understand the molecular basis of the disease and to identify potential targets for therapeutic intervention. The promise of this technology has been hampered by the large amount of biological material required for the experiments (more than 50 μg of total RNA per array). We have modified an amplification procedure that requires only 1 μg of total RNA. Analyses of the results showed that most genes that were detected as expressed or differentially expressed using the regular protocol were also detected using the amplification protocol. In addition, many genes that were undetected or weakly detected using the regular protocol were clearly detected using the amplification protocol. We have carried out a series of confirmation studies by northern blotting, western blotting, and immunohistochemistry assays.
Our results showed that most of the new information revealed by the amplification protocol represents real gene activity in the cells.
We have confirmed a powerful and consistent cDNA microarray procedure that can be used to study minute amounts of biological tissue.
PMCID: PMC117130  PMID: 12086591
22.  Complete genome sequence of a novel extrachromosomal virus-like element identified in planarian Girardia tigrina 
BMC Genomics  2002;3:15.
Freshwater planarians are widely used as models for investigation of pattern formation and studies on genetic variation in populations. Despite extensive information on the biology and genetics of planaria, the occurrence and distribution of viruses in these animals remains an unexplored area of research.
Using a combination of Suppression Subtractive Hybridization (SSH) and Mirror Orientation Selection (MOS), we compared the genomes of two strains of freshwater planarian, Girardia tigrina. The novel extrachromosomal DNA-containing virus-like element denoted PEVE (Planarian Extrachromosomal Virus-like Element) was identified in one planarian strain. The PEVE genome (about 7.5 kb) consists of two unique regions (Ul and Us) flanked by inverted repeats. Sequence analyses reveal that PEVE comprises two helicase-like sequences in the genome, of which the first is a homolog of a circoviral replication initiator protein (Rep), and the second is similar to the papillomavirus E1 helicase domain. PEVE genome exists in at least two variant forms with different arrangements of single-stranded and double-stranded DNA stretches that correspond to the Us and Ul regions. Using PCR analysis and whole-mount in situ hybridization, we characterized PEVE distribution and expression in the planarian body.
PEVE is the first viral element identified in free-living flatworms. This element differs from all known viruses and viral elements, and comprises two potential helicases that are homologous to proteins from distant viral phyla. PEVE is unevenly distributed in the worm body, and is detected in specific parenchyma cells.
PMCID: PMC116598  PMID: 12065025
24.  Complex splicing pattern generates great diversity in human NF1 transcripts 
BMC Genomics  2002;3:13.
Mutation analysis of the neurofibromatosis type 1 (NF1) gene has shown that about 30% of NF1 patients carry a splice mutation resulting in the production of one or several shortened transcripts. Some of these transcripts were also found in fresh lymphocytes of healthy individuals, albeit typically at a very low level. Starting from this initial observation, we were interested to gain further insight into the complex nature of NF1 mRNA processing.
We have used a RT-PCR plasmid library based method to identify novel NF1 splice variants. Several transcripts were observed with specific insertions/deletions and a survey was made. This large group of variants detected in one single gene allows to perform a comparative analysis of the factors involved in splice regulation. Exons that are prone to skipping were systematically analysed for 5' and 3' splice site strength, branch point strength and secondary structure.
Our study revealed a complex splicing pattern, generating a great diversity in NF1 transcripts. We found that, on average, exons that are spliced out in part of the mRNA have significantly weaker acceptor sites. Some variants identified in this study could have distinct roles and might expand our knowledge of neurofibromin.
PMCID: PMC115845  PMID: 12057013
25.  Efficacy of SSH PCR in isolating differentially expressed genes 
BMC Genomics  2002;3:12.
Suppression Subtractive Hybridization PCR (SSH PCR) is a sophisticated cDNA subtraction method to enrich and isolate differentially expressed genes. Despite its popularity, the method has not been thoroughly studied for its practical efficacy and potential limitations.
To determine the factors that influence the efficacy of SSH PCR, a theoretical model, under the assumption that cDNA hybridization follows the ideal second kinetic order, is proposed. The theoretical model suggests that the critical factor influencing the efficacy of SSH PCR is the concentration ratio (R) of a target gene between two cDNA preparations. It preferentially enriches "all or nothing" differentially expressed genes, of which R is infinite, and strongly favors the genes with large R. The theoretical predictions were validated by our experiments. In addition, the experiments revealed some practical limitations that are not obvious from the theoretical model. For effective enrichment of differentially expressed genes, it requires fractional concentration of a target gene to be more than 0.01% and concentration ratio to be more than 5 folds between two cDNA preparations.
Our research demonstrated theoretical and practical limitations of SSH PCR, which could be useful for its experimental design and interpretation.
PMCID: PMC115870  PMID: 12033988

Results 1-25 (36)