Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Association of prostate cancer risk with SNPs in regions containing androgen receptor binding sites captured by ChIP-on-chip analyses 
The Prostate  2011;72(4):376-385.
Genome-wide association studies (GWAS) have identified approximately three dozen single nucleotide polymorphisms (SNPs) consistently associated with prostate cancer (PCa) risk. Despite the reproducibility of these associations, the molecular mechanism for most of these SNPs has not been well elaborated as most lie within non-coding regions of the genome. Androgens play a key role in prostate carcinogenesis. Recently, using ChIP-on-chip technology, 22,447 androgen receptor (AR) binding sites have been mapped throughout the genome, greatly expanding the genomic regions potentially involved in androgen-mediated activity.
Methodology/Principal findings
To test the hypothesis that sequence variants in AR binding sites are associated with PCa risk, we performed a systematic evaluation among two existing PCa GWAS cohorts; the Johns Hopkins Hospital and the Cancer Genetic Markers of Susceptibility (CGEMS) study population. We demonstrate that regions containing AR binding sites are significantly enriched for PCa risk-associated SNPs, i.e. more than expected by chance alone. In addition, compared with the entire genome, these newly observed risk-associated SNPs in these regions are significantly more likely to overlap with established PCa risk-associated SNPs from previous GWAS. These results are consistent with our previous finding from a bioinformatics analysis that one-third of the 33 known PCa risk-associated SNPs discovered by GWAS are located in regions of the genome containing AR binding sites.
The results to date provide novel statistical evidence suggesting an androgen-mediated mechanism by which some PCa associated SNPs act to influence PCa risk. However, these results are hypothesis generating and ultimately warrant testing through in-depth molecular analyses.
PMCID: PMC3366362  PMID: 21671247
AR; prostate cancer; GWAS; pathway association study
2.  Evidence for two independent prostate cancer risk associated loci in the HNF1B gene at 17q12 
Nature genetics  2008;40(10):1153-1155.
A fine mapping study in the HNF1B gene at 17q12 among two study populations revealed a second prostate cancer locus, ~26 kb centromeric to the first known locus (rs4430796); these are separated by a recombination hotspot. A SNP in the second locus (rs11649743) was confirmed in five additional populations, and P=1.7×10−9 for an allelic test in the seven combined studies. The association at each SNP remains significant after adjusting for the other SNP.
PMCID: PMC3188432  PMID: 18758462
3.  Statistical Comparison Framework and Visualization Scheme for Ranking-Based Algorithms in High-Throughput Genome-Wide Studies 
Journal of Computational Biology  2009;16(4):565-577.
As a first step in analyzing high-throughput data in genome-wide studies, several algorithms are available to identify and prioritize candidates lists for downstream fine-mapping. The prioritized candidates could be differentially expressed genes, aberrations in comparative genomics hybridization studies, or single nucleotide polymorphisms (SNPs) in association studies. Different analysis algorithms are subject to various experimental artifacts and analytical features that lead to different candidate lists. However, little research has been carried out to theoretically quantify the consensus between different candidate lists and to compare the study specific accuracy of the analytical methods based on a known reference candidate list. Within the context of genome-wide studies, we propose a generic mathematical framework to statistically compare ranked lists of candidates from different algorithms with each other or, if available, with a reference candidate list. To cope with the growing need for intuitive visualization of high-throughput data in genome-wide studies, we describe a complementary customizable visualization tool. As a case study, we demonstrate application of our framework to the comparison and visualization of candidate lists generated in a DNA-pooling based genome-wide association study of CEPH data in the HapMap project, where prior knowledge from individual genotyping can be used to generate a true reference candidate list. The results provide a theoretical basis to compare the accuracy of various methods and to identify redundant methods, thus providing guidance for selecting the most suitable analysis method in genome-wide studies.
PMCID: PMC3148127  PMID: 19361328
genome-wide association studies; candidate lists
4.  Normalization Benefits Microarray-Based Classification 
When using cDNA microarrays, normalization to correct labeling bias is a common preliminary step before further data analysis is applied, its objective being to reduce the variation between arrays. To date, assessment of the effectiveness of normalization has mainly been confined to the ability to detect differentially expressed genes. Since a major use of microarrays is the expression-based phenotype classification, it is important to evaluate microarray normalization procedures relative to classification. Using a model-based approach, we model the systemic-error process to generate synthetic gene-expression values with known ground truth. These synthetic expression values are subjected to typical normalization methods and passed through a set of classification rules, the objective being to carry out a systematic study of the effect of normalization on classification. Three normalization methods are considered: offset, linear regression, and Lowess regression. Seven classification rules are considered: 3-nearest neighbor, linear support vector machine, linear discriminant analysis, regular histogram, Gaussian kernel, perceptron, and multiple perceptron with majority voting. The results of the first three are presented in the paper, with the full results being given on a complementary website. The conclusion from the different experiment models considered in the study is that normalization can have a significant benefit for classification under difficult experimental conditions, with linear and Lowess regression slightly outperforming the offset method.
PMCID: PMC3171318  PMID: 18427588
5.  Noise-injected neural networks show promise for use on small-sample expression data 
BMC Bioinformatics  2006;7:274.
Overfitting the data is a salient issue for classifier design in small-sample settings. This is why selecting a classifier from a constrained family of classifiers, ones that do not possess the potential to too finely partition the feature space, is typically preferable. But overfitting is not merely a consequence of the classifier family; it is highly dependent on the classification rule used to design a classifier from the sample data. Thus, it is possible to consider families that are rather complex but for which there are classification rules that perform well for small samples. Such classification rules can be advantageous because they facilitate satisfactory classification when the class-conditional distributions are not easily separated and the sample is not large. Here we consider neural networks, from the perspectives of classical design based solely on the sample data and from noise-injection-based design.
This paper provides an extensive simulation-based comparative study of noise-injected neural-network design. It considers a number of different feature-label models across various small sample sizes using varying amounts of noise injection. Besides comparing noise-injected neural-network design to classical neural-network design, the paper compares it to a number of other classification rules. Our particular interest is with the use of microarray data for expression-based classification for diagnosis and prognosis. To that end, we consider noise-injected neural-network design as it relates to a study of survivability of breast cancer patients.
The conclusion is that in many instances noise-injected neural network design is superior to the other tested methods, and in almost all cases it does not perform substantially worse than the best of the other methods. Since the amount of noise injected is consequential, the effect of differing amounts of injected noise must be considered.
PMCID: PMC1524820  PMID: 16737545

Results 1-5 (5)