1.  Comparison of Assembled Clostridium botulinum A1 Genomes Revealed Their Evolutionary Relationship 
Genomics  2013;103(1):94-106.
Clostridium botulinum encompasses bacteria that produce at least one of the seven serotypes of botulinum neurotoxin (BoNT/A-G). The availability of genome sequences of four closely related Type A1 or A1(B) strains, as well as the A1-specific microarray, allowed the analysis of their genomic organizations and evolutionary relationship. The four genomes share >90% core genes and >96% functional groups. Phylogenetic analysis based on COG shows closer relations of the A1(B) strain, NCTC 2916, to B1 and F1 than A1 strains. Alignment of the genomes of the three A1 strains revealed a highly similar chromosomal structure with three small gaps in the genome of ATCC 19397 and one additional gap in the genome of Hall A, suggesting ATCC 19379 as an evolutionary intermediate between Hall A and ATCC 3502. Analyses of the four gap regions indicated potential horizontal gene transfer and recombination events important for the evolution of A1 strains.
PMCID: PMC3959226  PMID: 24369123
Clostridium botulinum; genomic comparison; microarray
2.  Comparative gene expression analysis in mouse models for multiple sclerosis, Alzheimer’s disease and stroke for identifying commonly regulated and disease-specific gene changes 
Genomics  2010;96(2):82-91.
The brain responds to injury and infection by activating innate defense and tissue repair mechanisms. Working upon the hypothesis that the brain defense response involves common genes and pathways across diverse pathologies, we analysed global gene expression in brain from mouse models representing three major central nervous system disorders, cerebral stroke, multiple sclerosis and Alzheimer’s disease compared to normal brain using DNA microarray expression profiling. A comparison of dysregulated genes across disease models revealed common genes and pathways including key components of estrogen and TGF-β signaling pathways that have been associated with neuroprotection as well as a neurodegeneration mediator, TRPM7. Further, for each disease model, we discovered collections of differentially expressed genes that provide novel insight into the individual pathology and its associated mechanisms. Our data provide a resource for exploring the complex molecular mechanisms that underlie brain neurodegeneration and a new approach for identifying generic and disease-specific targets for therapy.
PMCID: PMC4205236  PMID: 20435134
cDNA microarrays; Brain inflammation; Cerebral stroke; Alzheimer’s disease; Systems biology; Gene expression profiling
3.  Whole exome sequencing reveals minimal differences between cell line and whole blood derived DNA 
Genomics  2013;102(4):10.1016/j.ygeno.2013.05.005.
Two common sources of DNA for whole exome sequencing (WES) are whole blood (WB) and immortalized lymphoblastoid cell line (LCL). However, it is possible that LCLs have a substantially higher rate of mutation than WB, causing concern for their use in sequencing studies. We compared results from paired WB and LCL DNA samples for 16 subjects, using LCLs of low passage number (<5). Using a standard analysis pipeline we detected a large number of discordant genotype calls (approximately 50 per subject) that we segregated into categories of “confidence” based on read-level quality metrics. From these categories and validation by Sanger sequencing, we estimate that the vast majority of the candidate differences were false positives and that our categories were effective in predicting valid sequence differences, including LCLs with putative mosaicism for the non-reference allele (3–4 per exome). These results validate the use of DNA from LCLs of low passage number for exome sequencing.
PMCID: PMC3812417  PMID: 23743231
graphical diagnostics; lymphoblastoid cell line; mosaicism; sequence variant call; strand bias; somatic mutation
4.  Identification of gene–environment interactions in cancer studies using penalization 
Genomics  2013;102(4):10.1016/j.ygeno.2013.08.006.
High-throughput cancer studies have been extensively conducted, searching for genetic markers associated with outcomes beyond clinical and environmental risk factors. Gene–environment interactions can have important implications beyond main effects. The commonly-adopted single-marker analysis cannot accommodate the joint effects of a large number of markers. The existing joint-effects methods also have limitations. Specifically, they may suffer from high computational cost, do not respect the “main effect, interaction” hierarchical structure, or use ineffective techniques. We develop a penalization method for the identification of important G × E interactions and main effects. It has an intuitive formulation, respects the hierarchical structure, accommodates the joint effects of multiple markers, and is computationally affordable. In numerical study, we analyze prognosis data under the AFT (accelerated failure time) model. Simulation shows satisfactory performance of the proposed method. Analysis of an NHL (non-Hodgkin lymphoma) study with SNP measurements shows that the proposed method identifies markers with important implications and satisfactory prediction performance.
PMCID: PMC3869641  PMID: 23994599
Gene–environment interaction; Penalized marker identification; Cancer prognosis
5.  Genome-wide transcriptional profiling and enrichment mapping reveal divergent and conserved roles of Sko1 in the Candida albicans osmotic stress response 
Genomics  2013;102(4):363-371.
Candida albicans maintains both commensal and pathogenic states in humans. Here, we have defined the genomic response to osmotic stress mediated by transcription factor Sko1. We performed microarray analysis of a sko1Δ/Δ mutant strain subjected to osmotic stress, and we utilized gene sequence enrichment analysis and enrichment mapping to identify Sko1-dependent osmotic stress-response genes. We found that Sko1 regulates distinct gene classes with functions in ribosomal synthesis, mitochondrial function, and vacuolar transport. Our in silico analysis suggests that Sko1 may recognize two unique DNA binding motifs. Our C. albicans genomic analyses and complementation studies in Saccharomyces cerevisiae showed that Sko1 is conserved as a regulator of carbohydrate metabolism, redox metabolism, and glycerol synthesis. Further, our real time-qPCR results showed that osmotic stress-response genes that are dependent on the kinase Hog1 also require Sko1 for full expression. Our findings reveal divergent and conserved aspects of Sko1-dependent osmotic stress signaling.
PMCID: PMC3907168  PMID: 23773966
Yeast; Transcription factor; SKO1; Osmotic stress; Enrichment mapping
6.  REViewer: A tool for linear visualization of repetitive elements within a sequence query 
Genomics  2013;102(4):10.1016/j.ygeno.2013.07.008.
A species-specific population of arrangements of repetitive elements (REs), called RE arrays, exists in the human and mouse genomes. We developed an RE analytical tool, named REViewer, for visualizing RE occurrences within RE arrays and other genomic regions as an interactive line map. REViewer utilizes an RE reference library which is established with two RE types: 1) REMiner-generated undefined REs and 2) RepeatMasker-derived defined REs. RE occurrences within queries are visualized as a line map using these two RE types. The REViewer’s controller provides analytical options, such as zoom, customization of axis unit, and RE type selection. The functionality of REViewer was evaluated using the human chromosome Y sequence. The REViewer is determined to be an efficient tool that facilitates visualization of up to 6000 REs in RE arrays and other genomic regions. The maximum query size is linked to the RE mining tools (e.g., REMiner, RepeatMasker), not to REViewer.
PMCID: PMC3819206  PMID: 23891933
7.  Comparative genome-wide transcriptional analysis of human left and right internal mammary arteries 
Genomics  2014;104(1):36-44.
In coronary artery bypass grafting (CABG), the combined use of left and right internal mammary arteries (LIMA and RIMA) — collectively known as bilateral IMAs (BIMAs) provides a survival advantage over the use of LIMA alone. However, gene expression in RIMA has never been compared to that in LIMA. Here we report a genome-wide transcriptional analysis of BIMA to investigate the expression profiles of these conduits in patients undergoing CABG. As expected, in comparing the BIMAs to the aorta, we found differences in pathways and processes associated with atherosclerosis, inflammation, and cell signaling — pathways which provide biological support for the observation that BIMA grafts deliver long-term benefits to the patients and protect against continued atherosclerosis. These data support the widespread use of BIMAs as the preferred conduits in CABG.
PMCID: PMC4152843  PMID: 24858532
Cardiovascular disease; Transcriptome analysis; Genomics; Coronary artery bypass grafting
8.  Left ventricular global transcriptional profiling in human end-stage dilated cardiomyopathy 
Genomics  2009;94(1):20-31.
We employed ABI high-density oligonucleotide microarrays containing 31,700 sixty-mer probes (representing 27,868 annotated human genes) to determine differential gene expression in idiopathic dilated cardiomyopathy (DCM). We identified 626 up-regulated and 636 down-regulated genes in DCM compared to controls. Most significant changes occurred in the tricarboxylic acid cycle, angiogenesis, and apoptotic signaling pathways, among which 32 apoptosis- and 13 MAPK activity-related genes were altered. Inorganic cation transporter, catalytic activities, energy metabolism and electron transport-related processes were among the most critically influenced pathways. Among the up-regulated genes were HTRA1 (6.9-fold), PDCD8(AIFM1) (5.2) and PRDX2 (4.4) and the down-regulated genes were NR4A2 (4.8), MX1 (4.3), LGALS9 (4), IFNA13 (4), UNC5D (3.6) and HDAC2 (3) (pb0.05), all of which have no clearly defined cardiac-related function yet. Gene ontology and enrichment analysis also revealed significant alterations in mitochondrial oxidative phosphorylation, metabolism and Alzheimer’s disease pathways. Concordance was also confirmed for a significant number of genes and pathways in an independent validation microarray dataset. Furthermore, verification by real-time RT-PCR showed a high degree of consistency with the microarray results. Our data demonstrate an association of DCM with alterations in various cellular events and multiple yet undeciphered genes that may contribute to heart muscle disease pathways.
PMCID: PMC4152850  PMID: 19332114
Idiopathic dilated cardiomyopathy; Global gene expression; Microarray gene expression; Gene up/down-regulation; Mitochondrial function; Apoptotic signaling
9.  Transcriptome dynamics during human erythroid differentiation and development 
Genomics  2013;102(0):431-441.
To explore the mechanisms controlling erythroid differentiation and development, we analyzed the genome-wide transcription dynamics occurring during the differentiation of human embryonic stem cells (HESCs) into the erythroid lineage and development of embryonic to adult erythropoiesis using high throughput sequencing technology. HESCs and erythroid cells at three developmental stages: ESER (embryonic), FLER (fetal), and PBER (adult) were analyzed. Our findings revealed that the number of expressed genes decreased during differentiation, whereas the total expression intensity increased. At each of the three transitions (HESCs–ESERs, ESERs–FLERs, and FLERs–PBERs), many differentially expressed genes were observed, which were involved in maintaining pluripotency, early erythroid specification, rapid cell growth, and cell–cell adhesion and interaction. We also discovered dynamic networks and their central nodes in each transition. Our study provides a fundamental basis for further investigation of erythroid differentiation and development, and has implications in using ESERs for transfusion product in clinical settings.
PMCID: PMC4151266  PMID: 24121002
High-throughput RNA sequencing; Erythropoiesis; Cell differentiation; Development; Gene regulatory networks
10.  Mutational analysis clopidogrel resistance and platelet function in patients scheduled for coronary artery bypass grafting 
Genomics  2013;101(6):313-317.
Clopidogrel is an oral antiplatelet pro-drug prescribed to 40 million patients worldwide who are at risk for thrombotic events or receiving percutaneous coronary intervention (PCI). However about a fifth of patients treated with clopidogrel do not respond adequately to the drug. From a cohort of 105 patients on whom we had functional data on clopidogrel response, we used ultra-high throughput sequencing to assay mutations in CYP2C19 and ABCB1, the two genes genetically linked to respond. Testing for mutations in CYP2C19, as recommended by the FDA, only correctly predicted if a patient would respond to clopidogrel 52.4% of the time. Similarly, testing of the ABCB1 gene only correctly foretold response in 51 (48.6%) patients. These results are clinically relevant and suggest that until additional genetic factors are discovered that predict response more completely, functional assays are more appropriate for clinical use.
PMCID: PMC4149181  PMID: 23462555
Cardiovascular surgery; Ultra-high throughput sequencing; Gene polymorphisms; Platelets
11.  Increased CNV-Region deletions in mild cognitive impairment (MCI) and Alzheimer's disease (AD) subjects in the ADNI sample 
Genomics  2013;102(2):112-122.
We investigated the genome-wide distribution of CNVs in the Alzheimer's disease (AD) Neuroimaging Initiative (ADNI) sample (146 with AD, 313 with Mild Cognitive Impairment (MCI), and 181 controls). Comparison of single CNVs between cases (MCI and AD) and controls shows overrepresentation of large heterozygous deletions in cases (p-value < 0.0001). The analysis of CNV-Regions identifies 44 copy number variable loci of heterozygous deletions, with more CNV-Regions among affected than controls (p = 0.005). Seven of the 44 CNV-Regions are nominally significant for association with cognitive impairment. We validated and confirmed our main findings with genome re-sequencing of selected patients and controls. The functional pathway analysis of the genes putatively affected by deletions of CNV-Regions reveals enrichment of genes implicated in axonal guidance, cell–cell adhesion, neuronal morphogenesis and differentiation. Our findings support the role of CNVs in AD, and suggest an association between large deletions and the development of cognitive impairment
PMCID: PMC4012421  PMID: 23583670
Alzheimer's disease; Copy Number Variable Regions (CNV-Regions); Copy Number Variations (CNVs); Genome-wide scan; Next Generation Sequencing (NGS)
12.  Behavioral and psychosocial responses to genomic testing for colorectal cancer risk 
Genomics  2013;102(2):123-130.
We conducted a translational genomics pilot study to evaluate the impact of genomic information related to colorectal cancer (CRC) risk on psychosocial, behavioral and communication outcomes. In 47 primary care participants, 96% opted for testing of three single nucleotide polymorphisms (SNPs) related to CRC risk. Participants averaged 2.5 of 6 possible SNP risk alleles (10% lifetime risk). At 3-months, participants did not report significant increases in cancer worry/distress; over half reported physical activity and dietary changes. SNP risk scores were unrelated to behavior change at 3-months. Many participants (64%) shared their SNP results, including 28% who shared results with a physician. In this pilot, genomic risk education, including discussion of other risk factors, appeared to impact patients' health behaviors, regardless of the level of SNP risk. Future work can compare risk education with and without SNP results to evaluate if SNP information adds value to existing approaches.
PMCID: PMC3729872  PMID: 23583311
translational genomic research; SNP testing; colorectal cancer risk; genomics education; behavior change
13.  Next-generation sequencing and microarray-based interrogation of microRNAs from formalin-fixed, paraffin-embedded tissue: Preliminary assessment of cross-platform concordance 
Genomics  2013;102(1):8-14.
Next-generation sequencing is increasingly employed in biomedical investigations. Strong concordance between microarray and mRNA-seq levels has been reported in high quality specimens but information is lacking on formalin-fixed, paraffin-embedded (FFPE) tissues, and particularly for microRNA (miRNA) analysis. We conducted a preliminary examination of the concordance between miRNA-seq and cDNA-mediated annealing, selection, extension, and ligation (DASL) miRNA assays. Quantitative agreement between platforms is moderate (Spearman correlation 0.514–0.596) and there is discordance of detection calls on a subset of miRNAs. Quantitative PCR (q-RT-PCR) performed for several discordant miRNAs confirmed the presence of most sequences detected by miRNA-seq but not by DASL but also that miRNA-seq did not detect some sequences, which DASL confidently detected. Our results suggest that miRNA-seq is specific, with few false positive calls, but it may not detect certain abundant miRNAs in FFPE tissue. Further work is necessary to fully address these issues that are pertinent for translational research.
PMCID: PMC4116671  PMID: 23562991
RNA-sequencing; microarrays; paraffin-tissue
14.  A tiered hidden Markov model characterizes multi-scale chromatin states 
Genomics  2013;102(1):1-7.
Precise characterization of chromatin states is an important but difficult task for understanding the regulatory role of chromatin. A number of computational methods have been developed with varying levels of success. However, a remaining challenge is to model epigenomic patterns over multi-scales, as each histone mark is distributed with its own characteristic length scale. We developed a tiered hidden Markov model and applied it to analyze a ChIP-seq dataset in human embryonic stem cells. We identified a two-tier structure containing 15 distinct bin-level chromatin states grouped into three domain-level states. Whereas the bin-level states capture the local variation of histone marks, the domain-level states detect large-scale variations. Compared to bin-level states, the domain-level states are more robust and coherent. We also found active regions in intergenic regions that upon closer examination were expressed non-coding RNAs and pseudogenes. These results provide insights into an additional layer of complexity in chromatin organization.
PMCID: PMC3676702  PMID: 23570996
hidden Markov model; chromatin; computational biology
Genomics  2011;98(4):233-241.
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
PMCID: PMC4074010  PMID: 21839162
Sequencing; functional analysis; computer modeling; genomic variation
16.  Single Nucleotide Polymorphisms Affect both Cis- and Trans-eQTLs 
Genomics  2009;93(6):501-508.
Single Nucleotide Polymorphisms (SNPs) between microarray probes and RNA targets can affect the performance of expression array by weakening the hybridization. In this paper, we examined the effect of the SNPs on Affymetrix GeneChip probe set summaries and the expression quantitative trait loci (eQTL) mapping results in two eQTL datasets, one from mouse and one from human. We showed that removing SNP-containing probes significantly changed the probe set summaries and the more SNP-containing probes we removed the greater the change. Comparison of the eQTL mapping results between with and without SNP-containing probes showed that less than 70% of the significant eQTL peaks were concordant regardless of the significance threshold. These results indicate that SNPs do affect both probe set summaries and eQTLs (both cis and trans), thus SNP-containing probes should be filtered out to improve the performance of eQTL mapping.
PMCID: PMC4041081  PMID: 19248827
microarray; SNP; eQTL; cis-eQTL; trans-eQTL; mouse; human
17.  Gene expression analysis uncovers novel Hedgehog interacting protein (HHIP) effects in human bronchial epithelial cells 
Genomics  2013;101(5):263-272.
Hedgehog Interacting Protein (HHIP) was implicated in chronic obstructive pulmonary disease (COPD) by genome-wide association studies (GWAS). However, it remains unclear how HHIP contributes to COPD pathogenesis. To identify genes regulated by HHIP, we performed gene expression microarray analysis in a human bronchial epithelial cell line (Beas-2B) stably infected with HHIP shRNAs. HHIP silencing led to differential expression of 296 genes; enrichment for variants nominally associated with COPD was found. Eighteen of the differentially expressed genes were validated by real-time PCR in Beas-2B cells. Seven of 11 validated genes tested in human COPD and control lung tissues demonstrated significant gene expression differences. Functional annotation indicated enrichment for extracellular matrix and cell growth genes. Network modeling demonstrated that the extracellular matrix and cell proliferation genes influenced by HHIP tended to be interconnected. Thus, we identified potential HHIP targets in human bronchial epithelial cells that may contribute to COPD pathogenesis.
PMCID: PMC3659826  PMID: 23459001
Hedgehog interacting protein (HHIP); Gene expression profiling; COPD (Chronic obstructive pulmonary disease); extracellular matrix (ECM); network modeling
18.  Linkage Disequilibrium Mapping in Domestic Dog Breeds Narrows the Progressive Rod-Cone Degeneration (prcd) Interval and Identifies Ancestral Disease Transmitting Chromosome 
Genomics  2006;88(5):541-550.
Canine progressive rod-cone degeneration (prcd) is a retinal disease previously mapped to a broad, gene-rich centromeric region of canine chromosome 9. As allelic disorders are present in multiple breeds, we used linkage disequilibrium (LD) to narrow the ∼6.4 Mb interval candidate region. Multiple dog breeds, each representing genetically isolated populations, were typed for SNPs and other polymorphisms identified from BACs. The candidate region was initially localized to a 1.5 Mb zero recombination interval between growth factor receptor-bound protein 2 (GRB2) and SEC14-like 1 (SEC14L). A fine-scale haplotype of the region was developed which reduced the LD interval to 106 Kb, and identified a conserved haplotype of 98 polymorphisms present in all prcd-affected chromosomes from 14 different dog breeds. The findings strongly suggest that a common ancestor transmitted the prcd disease allele to many of the modern dog breeds, and demonstrate the power of LD approach in the canine model.
PMCID: PMC4006154  PMID: 16859891
Disease Models; Animal; Genetic Diversity; Genetic Linkage; Genetic Markers; Genetic Predisposition to Disease; Genetic Variation; Retinal Degeneration
19.  Identical Mutation in a Novel Retinal Gene Causes Progressive Rod-Cone Degeneration (prcd) in Dogs and Retinitis Pigmentosa in Man 
Genomics  2006;88(5):551-563.
Progressive rod-cone degeneration (prcd) is a late-onset, autosomal recessive photoreceptor degeneration of dogs, and a homolog for some forms of human retinitis pigmentosa (RP). Previously, the disease relevant interval was reduced to a 106 Kb region on CFA9, and a common phenotype-specific haplotype was identified in all affected dogs from several different breeds, and breed varieties. Screening of a canine retinal EST library identified partial cDNAs for novel candidate genes in the disease relevant interval. The complete cDNA of one of these, PRCD, was cloned in dog, human and mouse. The gene codes for a 54 amino acid (aa) protein in dog and human, and 53 aa protein in the mouse; the first 24 aa, coded for by exon 1, are highly conserved in 14 vertebrate species. A homozygous mutation (TGC → TAC) in the second codon shows complete concordance with the disorder in 18 different dog breeds/breed varieties tested. The same homozygous mutation was identified in a human patient from Bangladesh with autosomal recessive (ar) RP. Expression studies support the predominant expression of this gene in the retina, with equal expression in the retinal pigment epithelium (RPE), photoreceptors and ganglion cell layers. This study provides strong evidence that a mutation in the novel gene, PRCD, is the cause of autosomal recessive retinal degeneration in both dogs and man.
PMCID: PMC3989879  PMID: 16938425
Dogs; Disease Models, Animal; Genetic diversity; Genetic linkage; Genetic markers; Genetic predisposition to disease; Genetic variation; Mutation; Retinal Degeneration; Retinitis Pigmentosa
20.  Intrachromosomal homologous recombination between inverted amplicons on opposing Y-chromosome arms 
Genomics  2013;102(4):10.1016/j.ygeno.2013.04.018.
Amplicons – large, nearly identical repeats in direct or inverted orientation – are abundant in the male-specific region of the human Y chromosome (MSY) and provide targets for intrachromosomal non-allelic homologous recombination (NAHR). Thus far, NAHR events resulting in deletions, duplications, inversions, or isodicentric chromosomes have been reported only for amplicon pairs located exclusively on the short arm (Yp) or the long arm (Yq). Here we report our finding of four men with Y chromosomes that evidently formed by intrachromosomal NAHR between inverted repeat pairs comprising one amplicon on Yp and one amplicon on Yq. In two men with spermatogenic failure, sister-chromatid crossing-over resulted in pseudoisoYp chromosome formation and loss of distal Yq. In two men with normal spermatogenesis, intrachromatid crossing-over generated pericentric inversions. These findings highlight the recombinogenic nature of the MSY, as intrachromosomal NAHR occurs for nearly all Y-chromosome amplicon pairs, even those located on opposing chromosome arms.
PMCID: PMC3785290  PMID: 23643616
Human Y chromosome; Non-allelic homologous recombination; Structural variation; Male infertility
21.  Pathway-Based Genetic Analysis of Preterm Birth 
Genomics  2013;101(3):163-170.
Preterm birth in the United States is now 12%. Multiple genes, gene networks, variants have been associated with this disease. Using a custom database for preterm birth (dbPTB) with a refined set of genes extensively curated from literature and biological databases, we analyzed a GWAS of preterm birth for complete genotype data on nearly 2000 preterm and term mothers. We used both the curated genes and a genome-wide approach to carry out a pathway-based analysis. There were 19 significant pathways, which withstood FDR correction for multiple testing that were identified using both the curated genes and the genome-wide approach. The analysis based on the curated genes was more significant than genome-wide in 15 out of 19 pathways. This approach demonstrates the use of a validated set of genes, in the analysis of otherwise unsuccessful GWAS data, to identify gene-gene interactions in a way that enhances statistical power and discovery.
PMCID: PMC3570639  PMID: 23298525
Preterm birth; Pathway analysis; GWAS
22.  09/15: Comparative Genomics of a Conserved Chromosomal Region associated with a Complex Human Phenotype 
Genomics  2001;73(2):171-178.
Three genes that encode related Ig-superfamily molecules have recently been mapped to human chromosome 15 in the region q22.3-23, and to the syntenic region on mouse chromosome 9. These genes presumably derived from gene duplications and they are highly similar to Deleted in Colorectal Cancer (DCC), which functions as an axon guidance molecule during development of the nervous system. In order to find out whether additional genes of this class were present in a chromosomal cluster, we produced a comparative physical map within the region of synteny between mouse chromosome 9 and human chromosome 15. This interval overlaps the critical region for the fourth genetic locus for Bardet-Biedl Syndrome (BBS4) in humans. Bardet-Biedl Syndrome (OMIM 600374) is characterized by poly/syn/brachydactyly, retinal degeneration, hypogonadism, mental retardation, obesity, diabetes, and kidney abnormalities. A detailed map of this locus will help to identify candidate genes for this disorder.
PMCID: PMC3938171  PMID: 11318607
23.  Identifying human-rhesus macaque gene orthologs using heterospecific SNP probes 
Genomics  2012;101(1):30-37.
We genotyped a Chinese and an Indian-origin rhesus macaque using the Affymetrix Genome-Wide Human SNP Array 6.0 and catalogued 85,473 uniquely mapping heterospecific SNPs. These SNPs were assigned to rhesus chromosomes according to their probe sequence alignments as displayed in the human and rhesus reference sequences. The conserved gene order (synteny) revealed by heterospecific SNP maps is in concordance with that of the published human and rhesus macaque genomes.
Using these SNPs’ original human rs numbers, we identified 12,328 genes annotated in humans that are associated with these SNPs, 3,674 of which were found in at least one of the two rhesus macaques studied. Due to their density, the heterospecific SNPs allow fine-grained comparisons, including approximate boundaries of intra- and extra-chromosomal rearrangements involving gene orthologs, which can be used to distinguish rhesus macaque chromosomes from human chromosomes.
PMCID: PMC3534948  PMID: 22982528
Macaca mulatta; single nucleotide polymoprhisms (SNPs); Homo sapiens; heterospecific sequence maps
24.  Gene set analysis of genome-wide association studies: methodological issues and perspectives 
Genomics  2011;98(1):10.1016/j.ygeno.2011.04.006.
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
PMCID: PMC3852939  PMID: 21565265
Genome-wide association study; Gene set; Pathway; Gene-set enrichment analysis; Statistical significance; Complex disease
25.  A single-sample microarray normalization method to facilitate personalized-medicine workflows 
Genomics  2012;100(6):337-344.
Gene-expression microarrays allow researchers to characterize biological phenomena in a high-throughput fashion but are subject to technological biases and inevitable variabilities that arise during sample collection and processing. Normalization techniques aim to correct such biases. Most existing methods require multiple samples to be processed in aggregate; consequently, each sample's output is influenced by other samples processed jointly. However, in personalized-medicine workflows, samples may arrive serially, so renormalizing all samples upon each new arrival would be impractical. We have developed Single Channel Array Normalization (SCAN), a single-sample technique that models the effects of probe-nucleotide composition on fluorescence intensity and corrects for such effects, dramatically increasing the signal-to-noise ratio within individual samples while decreasing variation across samples. In various benchmark comparisons, we show that SCAN performs as well as or better than competing methods yet has no dependence on external reference samples and can be applied to any single-channel microarray platform.
PMCID: PMC3508193  PMID: 22959562
Method; normalization; microarray; linear model; mixture model; single-sample technique

