Search tips
Search criteria

Results 1-25 (91)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data 
BMC Genomics  2014;15:162.
Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data.
We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline.
Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
PMCID: PMC3945939  PMID: 24571581
Polymorphisms; Linkage disequilibrium; Maximum likelihood
2.  Ancient Gene Duplicates in Gossypium (Cotton) Exhibit Near-Complete Expression Divergence 
Genome Biology and Evolution  2014;6(3):559-571.
Whole genome duplication (WGD) is widespread in flowering plants and is a driving force in angiosperm diversification. The redundancy introduced by WGD allows the evolution of novel gene interactions and functions, although the patterns and processes of diversification are poorly understood. We identified ∼2,000 pairs of paralogous genes in Gossypium raimondii (cotton) resulting from an approximately 60 My old 5- to 6-fold ploidy increase. Gene expression analyses revealed that, in G. raimondii, 99.4% of the gene pairs exhibit differential expression in at least one of the three tissues (petal, leaf, and seed), with 93% to 94% exhibiting differential expression on a per-tissue basis. For 1,666 (85%) pairs, differential expression was observed in all tissues. These observations were mirrored in a time series of G. raimondii seed, and separately in leaf, petal, and seed of G. arboreum, indicating expression level diversification before species divergence. A generalized linear model revealed 92.4% of the paralog pairs exhibited expression divergence, with most exhibiting significant gene and tissue interactions indicating complementary expression patterns in different tissues. These data indicate massive, near-complete expression level neo- and/or subfunctionalization among ancient gene duplicates, suggesting these processes are essential in their maintenance over ∼60 Ma.
PMCID: PMC3971588  PMID: 24558256
3.  ADCK4 mutations promote steroid-resistant nephrotic syndrome through CoQ10 biosynthesis disruption  
The Journal of Clinical Investigation  2013;123(12):5179-5189.
Identification of single-gene causes of steroid-resistant nephrotic syndrome (SRNS) has furthered the understanding of the pathogenesis of this disease. Here, using a combination of homozygosity mapping and whole human exome resequencing, we identified mutations in the aarF domain containing kinase 4 (ADCK4) gene in 15 individuals with SRNS from 8 unrelated families. ADCK4 was highly similar to ADCK3, which has been shown to participate in coenzyme Q10 (CoQ10) biosynthesis. Mutations in ADCK4 resulted in reduced CoQ10 levels and reduced mitochondrial respiratory enzyme activity in cells isolated from individuals with SRNS and transformed lymphoblasts. Knockdown of adck4 in zebrafish and Drosophila recapitulated nephrotic syndrome-associated phenotypes. Furthermore, ADCK4 was expressed in glomerular podocytes and partially localized to podocyte mitochondria and foot processes in rat kidneys and cultured human podocytes. In human podocytes, ADCK4 interacted with members of the CoQ10 biosynthesis pathway, including COQ6, which has been linked with SRNS and COQ7. Knockdown of ADCK4 in podocytes resulted in decreased migration, which was reversed by CoQ10 addition. Interestingly, a patient with SRNS with a homozygous ADCK4 frameshift mutation had partial remission following CoQ10 treatment. These data indicate that individuals with SRNS with mutations in ADCK4 or other genes that participate in CoQ10 biosynthesis may be treatable with CoQ10.
PMCID: PMC3859425  PMID: 24270420
4.  NADPH oxidase complex and IBD candidate gene studies: identification of a rare variant in NCF2 that results in reduced binding to RAC2 
Gut  2011;61(7):1028-1035.
The NOX2 NADPH oxidase complex produces reactive oxygen species and plays a critical role in the killing of microbes by phagocytes. Genetic mutations in genes encoding components of the complex result in both X-linked and autosomal recessive forms of chronic granulomatous disease (CGD). Patients with CGD often develop intestinal inflammation that is histologically similar to Crohn's colitis, suggesting a common aetiology for both diseases. The aim of this study is to determine if polymorphisms in NOX2 NADPH oxidase complex genes that do not cause CGD are associated with the development of inflammatory bowel disease (IBD).
Direct sequencing and candidate gene approaches were used to identify susceptibility loci in NADPH oxidase complex genes. Functional studies were carried out on identified variants. Novel findings were replicated in independent cohorts.
Sequence analysis identified a novel missense variant in the neutrophil cytosolic factor 2 (NCF2) gene that is associated with very early onset IBD (VEO-IBD) and subsequently found in 4% of patients with VEO-IBD compared with 0.2% of controls (p=1.3×10−5, OR 23.8 (95% CI 3.9 to 142.5); Fisher exact test). This variant reduced binding of the NCF2 gene product p67phox to RAC2. This study found a novel genetic association of RAC2 with Crohn's disease (CD) and replicated the previously reported association of NCF4 with ileal CD.
These studies suggest that the rare novel p67phox variant results in partial inhibition of oxidase function and are associated with CD in a subgroup of patients with VEO-IBD; and suggest that components of the NADPH oxidase complex are associated with CD.
PMCID: PMC3806486  PMID: 21900546
5.  A Whole-Genome DNA Marker Map for Cotton Based on the D-Genome Sequence of Gossypium raimondii L. 
G3: Genes|Genomes|Genetics  2013;3(10):1759-1767.
We constructed a very-high-density, whole-genome marker map (WGMM) for cotton by using 18,597 DNA markers corresponding to 48,958 loci that were aligned to both a consensus genetic map and a reference genome sequence. The WGMM has a density of one locus per 15.6 kb, or an average of 1.3 loci per gene. The WGMM was anchored by the use of colinear markers to a detailed genetic map, providing recombinational information. Mapped markers occurred at relatively greater physical densities in distal chromosomal regions and lower physical densities in the central regions, with all 1 Mb bins having at least nine markers. Hotspots for quantitative trait loci and resistance gene analog clusters were aligned to the map and DNA markers identified for targeting of these regions of high practical importance. Based on the cotton D genome reference sequence, the locations of chromosome structural rearrangements plotted on the map facilitate its translation to other Gossypium genome types. The WGMM is a versatile genetic map for marker assisted breeding, fine mapping and cloning of genes and quantitative trait loci, developing new genetic markers and maps, genome-wide association mapping, and genome evolution studies.
PMCID: PMC3789800  PMID: 23979945
quantitative trait loci; resistance gene analog; simple sequence repeat; restriction fragment length polymorphism; inversions
6.  Different patterns of gene structure divergence following gene duplication in Arabidopsis 
BMC Genomics  2013;14:652.
Divergence in gene structure following gene duplication is not well understood. Gene duplication can occur via whole-genome duplication (WGD) and single-gene duplications including tandem, proximal and transposed duplications. Different modes of gene duplication may be associated with different types, levels, and patterns of structural divergence.
In Arabidopsis thaliana, we denote levels of structural divergence between duplicated genes by differences in coding-region lengths and average exon lengths, and the number of insertions/deletions (indels) and maximum indel length in their protein sequence alignment. Among recent duplicates of different modes, transposed duplicates diverge most dramatically in gene structure. In transposed duplications, parental loci tend to have longer coding-regions and exons, and smaller numbers of indels and maximum indel lengths than transposed loci, reflecting biased structural changes in transposed duplications. Structural divergence increases with evolutionary time for WGDs, but not transposed duplications, possibly because of biased gene losses following transposed duplications. Structural divergence has heterogeneous relationships with nucleotide substitution rates, but is consistently positively correlated with gene expression divergence. The NBS-LRR gene family shows higher-than-average levels of structural divergence.
Our study suggests that structural divergence between duplicated genes is greatly affected by the mechanisms of gene duplication and may be not proportional to evolutionary time, and that certain gene families are under selection on rapid evolution of gene structure.
PMCID: PMC3848917  PMID: 24063813
Gene structure; Divergence; Transposed duplication; Whole-genome duplication; Selection; Arabidopsis
7.  Pilot Genome Wide Association Search Identifies Potential loci for Risk of Erectile Dysfunction in Type 1 Diabetes Using the DCCT/EDIC Study Cohort 
The Journal of urology  2012;188(2):514-520.
To identify genetic predictors of diabetes-associated ED using genome wide and candidate gene approaches in a cohort of men with type I diabetes.
We examined 528 white men with T1D (125 with ED) from the DCCT and its observational follow up EDIC Study. ED was defined from a single item of the IIEF. An Illumina Human1M BeadChip was used for genotyping. 867,125 single nucleotide polymorphisms (SNPs) were subjected to analysis. Whole genome and candidate gene approaches tested the hypothesis that genetic polymorphisms may predispose men with T1D to ED. Univariate and multivariate models were used controlling for age, HbA1c, diabetes duration, and prior randomization to intensive or conventional insulin therapy during DCCT. A stratified false discovery rate was used to perform the candidate gene approach.
Two SNPs located on chromosome 3 in one genomic loci were associated with ED with p < 1×10−6. rs9810233 had a p-value of 7 × 10−7 and rs1920201 had a p-value of 9×10−7 The nearest gene to these two SNPs is ALCAM. The genetic association results at these loci were similar in univariate and multivariate analysis. No candidate genes met criteria for statistical significance.
Two SNPs, rs9810233 and rs1920101, which are 25 kb apart, are both associated with ED, albeit not meeting the standard GWAS significance criteria of p < 5 × 10−8. Other studies with larger sample sizes will be required to determine whether ALCAM represents a novel gene in the pathogenesis of diabetes associated ED.
PMCID: PMC3764461  PMID: 22704111
Erectile Dysfunction; Diabetes; Genetics
8.  Genome-wide meta-analyses of multi-ethnic cohorts identify multiple new susceptibility loci for refractive error and myopia 
Verhoeven, Virginie J.M. | Hysi, Pirro G. | Wojciechowski, Robert | Fan, Qiao | Guggenheim, Jeremy A. | Höhn, René | MacGregor, Stuart | Hewitt, Alex W. | Nag, Abhishek | Cheng, Ching-Yu | Yonova-Doing, Ekaterina | Zhou, Xin | Ikram, M. Kamran | Buitendijk, Gabriëlle H.S. | McMahon, George | Kemp, John P. | St. Pourcain, Beate | Simpson, Claire L. | Mäkelä, Kari-Matti | Lehtimäki, Terho | Kähönen, Mika | Paterson, Andrew D. | Hosseini, S. Mohsen | Wong, Hoi Suen | Xu, Liang | Jonas, Jost B. | Pärssinen, Olavi | Wedenoja, Juho | Yip, Shea Ping | Ho, Daniel W. H. | Pang, Chi Pui | Chen, Li Jia | Burdon, Kathryn P. | Craig, Jamie E. | Klein, Barbara E. K. | Klein, Ronald | Haller, Toomas | Metspalu, Andres | Khor, Chiea-Chuen | Tai, E-Shyong | Aung, Tin | Vithana, Eranga | Tay, Wan-Ting | Barathi, Veluchamy A. | Chen, Peng | Li, Ruoying | Liao, Jiemin | Zheng, Yingfeng | Ong, Rick T. | Döring, Angela | Evans, David M. | Timpson, Nicholas J. | Verkerk, Annemieke J.M.H. | Meitinger, Thomas | Raitakari, Olli | Hawthorne, Felicia | Spector, Tim D. | Karssen, Lennart C. | Pirastu, Mario | Murgia, Federico | Ang, Wei | Mishra, Aniket | Montgomery, Grant W. | Pennell, Craig E. | Cumberland, Phillippa M. | Cotlarciuc, Ioana | Mitchell, Paul | Wang, Jie Jin | Schache, Maria | Janmahasathian, Sarayut | Igo, Robert P. | Lass, Jonathan H. | Chew, Emily | Iyengar, Sudha K. | Gorgels, Theo G.M.F. | Rudan, Igor | Hayward, Caroline | Wright, Alan F. | Polasek, Ozren | Vatavuk, Zoran | Wilson, James F. | Fleck, Brian | Zeller, Tanja | Mirshahi, Alireza | Müller, Christian | Uitterlinden, Andre’ G. | Rivadeneira, Fernando | Vingerling, Johannes R. | Hofman, Albert | Oostra, Ben A. | Amin, Najaf | Bergen, Arthur A.B. | Teo, Yik-Ying | Rahi, Jugnoo S. | Vitart, Veronique | Williams, Cathy | Baird, Paul N. | Wong, Tien-Yin | Oexle, Konrad | Pfeiffer, Norbert | Mackey, David A. | Young, Terri L. | van Duijn, Cornelia M. | Saw, Seang-Mei | Wilson, Joan E. Bailey | Stambolian, Dwight | Klaver, Caroline C. | Hammond, Christopher J.
Nature genetics  2013;45(3):314-318.
Refractive error is the most common eye disorder worldwide, and a prominent cause of blindness. Myopia affects over 30% of Western populations, and up to 80% of Asians. The CREAM consortium conducted genome-wide meta-analyses including 37,382 individuals from 27 studies of European ancestry, and 8,376 from 5 Asian cohorts. We identified 16 new loci for refractive error in subjects of European ancestry, of which 8 were shared with Asians. Combined analysis revealed 8 additional loci. The new loci include genes with functions in neurotransmission (GRIA4), ion channels (KCNQ5), retinoic acid metabolism (RDH5), extracellular matrix remodeling (LAMA2, BMP2), and eye development (SIX6, PRSS56). We also confirmed previously reported associations with GJD2 and RASGRF1. Risk score analysis using associated SNPs showed a tenfold increased risk of myopia for subjects with the highest genetic load. Our results, accumulated across independent multi-ethnic studies, considerably advance understanding of mechanisms involved in refractive error and myopia.
PMCID: PMC3740568  PMID: 23396134
9.  An enzyme linked immunosorbent assay (ELISA) for the determination of the human haptoglobin phenotype 
Haptoglobin (Hp) is an abundant serum protein which binds extracorpuscular hemoglobin (Hb). Two alleles exist in humans for the Hp gene, denoted 1 and 2. Diabetic individuals with the Hp 2-2 genotype are at increased risk of developing vascular complications including heart attack, stroke, and kidney disease. Recent evidence shows that treatment with vitamin E can reduce the risk of diabetic vascular complications by as much as 50% in Hp 2-2 individuals. We sought to develop a rapid and accurate test for Hp phenotype (which is 100% concordant with the three major Hp genotypes) to facilitate widespread diagnostic testing as well as prospective clinical trials.
A monoclonal antibody raised against human Hp was shown to distinguish between the three Hp phenotypes in an enzyme linked immunosorbent assay (ELISA). Hp phenotypes obtained in over 8000 patient samples using this ELISA method were compared with those obtained by polyacrylamide gel electrophoresis or the TaqMan PCR method.
Our analysis showed that the sensitivity and specificity of the ELISA test for Hp 2-2 phenotype is 99.0% and 98.1%, respectively. The positive predictive value and the negative predictive value for Hp 2-2 phenotype is 97.5% and 99.3%, respectively. Similar results were obtained for Hp 2-1 and Hp 1-1 phenotypes. In addition, the ELISA was determined to be more sensitive and specific than the TaqMan method.
The Hp ELISA represents a user-friendly, rapid and highly accurate diagnostic tool for determining Hp phenotypes. This test will greatly facilitate the typing of thousands of samples in ongoing clinical studies.
PMCID: PMC3717392  PMID: 23492570
diabetes; ELISA; haptoglobin phenotype; pharmacogenomics; vitamin E
10.  Mammographic density and breast cancer: a comparison of related and unrelated controls in the Breast Cancer Family Registry 
Percent mammographic density (PMD) is a strong and highly heritable risk factor for breast cancer. Studies of the role of PMD in familial breast cancer may require controls, such as the sisters of cases, selected from the same 'risk set' as the cases. The use of sister controls would allow control for factors that have been shown to influence risk of breast cancer such as race/ethnicity, socioeconomic status and a family history of breast cancer, but may introduce 'overmatching' and attenuate case-control differences in PMD.
To examine the potential effects of using sister controls rather than unrelated controls in a case-control study, we examined PMD in triplets, each comprised of a case with invasive breast cancer, an unaffected full sister control, and an unaffected unrelated control. Both controls were matched to cases on age at mammogram. Total breast area and dense area in the mammogram were measured in the unaffected breast of cases and a randomly selected breast in controls, and the non-dense area and PMD calculated from these measurements.
The mean difference in PMD between cases and controls, and the standard deviation (SD) of the difference, were slightly less for sister controls (4.2% (SD = 20.0)) than for unrelated controls (4.9% (SD = 25.7)). We found statistically significant correlations in PMD between cases (n = 228) and sister controls (n = 228) (r = 0.39 (95% CI: 0.28, 0.50; P <0.0001)), but not between cases and unrelated controls (n = 228) (r = 0.04 (95% CI: -0.09, 0.17; P = 0.51)). After adjusting for other risk factors, square root transformed PMD was associated with an increased risk of breast cancer when comparing cases to sister controls (adjusted odds ratio (inter-quintile odds ratio (IQOR) = 2.19, 95% CI = 1.20, 4.00) or to unrelated controls (adjusted IQOR = 2.62, 95% CI = 1.62, 4.25).
The use of sister controls in case-control studies of PMD resulted in a modest attenuation of case-control differences and risk estimates, but showed a statistically significant association with risk and allowed control for race/ethnicity, socioeconomic status and family history.
PMCID: PMC3706877  PMID: 23705888
Mammographic density; case-control study; overmatching; case control
11.  Identification of genetic loci underlying the phenotypic constructs of autism spectrum disorders Running head: Genetic loci for latent factors in ASD 
To investigate the underlying phenotypic constructs in autism spectrum disorders (ASD) and to identify genetic loci that are linked to these empirically derived factors.
Exploratory factor analysis was applied to two datasets with 28 selected Autism Diagnostic Interview-Revised (ADI-R) algorithm items. The first dataset was from the Autism Genome Project (AGP) phase I (1,236 ASD subjects from 618 families); the second was from the AGP phase II (804 unrelated ASD subjects). Variables derived from the factor analysis were then used as quantitative traits in genome-wide variance components linkage analyses.
Six factors, joint attention, social interaction and communication, non-verbal communication, repetitive sensory-motor behaviour, peer interaction, and compulsion/restricted interests, were retained for both datasets. There was good agreement between the factor loading patterns from the two datasets. All factors showed familial aggregation. Suggestive evidence for linkage was obtained for the joint attention factor on 11q23. Genome-wide significant evidence for linkage was obtained for the repetitive sensory-motor behaviour factor on 19q13.3.
This study demonstrates that the underlying phenotypic constructs based on the ADI-R algorithm items are replicable in independent datasets; and the empirically derived factors are suitable and informative in genetic studies of ASD.
PMCID: PMC3593812  PMID: 21703496
autism; ADI-R; factor analysis; linkage analysis; quantitative trait
12.  Outcome of repeat surgery for genital prolapse using prolift-mesh 
Urogenital prolapse can have a significant impact on quality of life. The life time risk of requiring surgery for urogenital prolapse is 11%. Prolift mesh has recently been introduced to reduce repeat operation rate and for long-term benefit.
To evaluate the outcome of the treatment of urogenital prolapse with synthetic mesh.
A retrospective review of case notes of all women who underwent prolift mesh insertion for prolapse between July 2004 and June 2005, at Royal Alexandra Hospital Paisley UK. We looked at the presenting complaints, previous operation, intraoperative complications and complications at six weeks and six months follow-up.
Twenty-two procedures were carried out in the twelve months period. Age of the patients ranged from 55 to 82 years (median 64 yrs). Eleven had anterior Prolift (50%), Seven had posterior Prolift 31.8% and four total Prolift 18%. There were no intraoperative complications. All the patients had previous surgery for prolapse. Eight patients had anterior repair, six patients had posterior repair, and three patients had abdominal hysterectomy. Vaginal hysterectomy was carried out with mesh insertion as a concomitant procedure in seven cases (31.25%). All patients were seen at six weeks and six months after the surgery. Complications rate included mesh erosion one patient and suture material protruding in the vagina one patient, one patient had failed prolift operation. All the twenty-one patients were cured giving 95.4% success rate.
The use of prolene mesh in pelvic reconstructive surgery was associated with good outcome and minimal complications in this study.
PMCID: PMC3694470  PMID: 23497532
Prolift; Mesh; Urogenital prolapse
13.  Genetic Analysis of Recombinant Inbred Lines for Sorghum bicolor × Sorghum propinquum 
G3: Genes|Genomes|Genetics  2013;3(1):101-108.
We describe a recombinant inbred line (RIL) population of 161 F5 genotypes for the widest euploid cross that can be made to cultivated sorghum (Sorghum bicolor) using conventional techniques, S. bicolor × Sorghum propinquum, that segregates for many traits related to plant architecture, growth and development, reproduction, and life history. The genetic map of the S. bicolor × S. propinquum RILs contains 141 loci on 10 linkage groups collectively spanning 773.1 cM. Although the genetic map has DNA marker density well-suited to quantitative trait loci mapping and samples most of the genome, our previous observations that sorghum pericentromeric heterochromatin is recalcitrant to recombination is highlighted by the finding that the vast majority of recombination in sorghum is concentrated in small regions of euchromatin that are distal to most chromosomes. The advancement of the RIL population in an environment to which the S. bicolor parent was well adapted (indeed bred for) but the S. propinquum parent was not largely eliminated an allele for short-day flowering that confounded many other traits, for example, permitting us to map new quantitative trait loci for flowering that previously eluded detection. Additional recombination that has accrued in the development of this RIL population also may have improved resolution of apices of heterozygote excess, accounting for their greater abundance in the F5 than the F2 generation. The S. bicolor × S. propinquum RIL population offers advantages over early-generation populations that will shed new light on genetic, environmental, and physiological/biochemical factors that regulate plant growth and development.
PMCID: PMC3538335  PMID: 23316442
quantitative trait locus; simple-sequence repeat; DNA marker; recombination; segregation distortion
14.  PGDD: a database of gene and genome duplication in plants 
Nucleic Acids Research  2012;41(D1):D1152-D1158.
Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at, a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD.
PMCID: PMC3531184  PMID: 23180799
15.  Mammographic breast density and breast cancer: evidence of a shared genetic basis 
Cancer research  2012;72(6):1478-1484.
Percent mammographic breast density (PMD) is a strong heritable risk factor for breast cancer. However, the pathways through which this risk is mediated are still unclear. To explore whether PMD and breast cancer have a shared genetic basis, we identified genetic variants most strongly associated with PMD in a published meta-analysis of five genome-wide association studies (GWAS) and used these to construct risk scores for 3628 breast cancer cases and 5190 controls from the UK2 GWAS of breast cancer. The signed per-allele effect estimates of SNPs were multiplied with the respective allele counts in the individual and summed over all SNPs to derive the risk score for an individual. These scores were included as the exposure variable in a logistic regression model with breast cancer case-control status as the outcome. This analysis was repeated using ten different cut-offs for the most significant density SNPs (1-10% representing 5,222-50,899 SNPs). Permutation analysis was also performed across all 10 cut-offs. The association between risk score and breast cancer was significant for all cut-offs from 3-10% of top density SNPs, being most significant for the 6% (2-sided P=0.002) to 10% (P=0.001) cut-offs (overall permutation P=0.003). Women in the top 10% of the risk score distribution had a 31% increased risk of breast cancer [OR= 1.31 (95%CI 1.08-1.59)] compared to women in the bottom 10%. Together, our results demonstrate that PMD and breast cancer have a shared genetic basis that is mediated through a large number of common variants.
PMCID: PMC3378688  PMID: 22266113
breast cancer; mammographic density; SNPs; polygenic; Mendelian Randomisation
16.  Single Nucleotide Polymorphisms that Increase Expression of the GTPase RAC1 are Associated with Ulcerative Colitis 
Gastroenterology  2011;141(2):633-641.
Background & Aims
RAC1 is a GTPase that has an evolutionarily conserved role in coordinating immune defenses, from plants to mammals. Chronic inflammatory bowel diseases (IBD) are associated with dysregulation of immune defenses. We studied the role of RAC1 in IBD using human genetic and functional studies and animal models of colitis.
We used a candidate gene approach to HapMap-Tag single nucleotide polymorphisms (SNPs) in a discovery cohort; findings were confirmed in 2 additional cohorts. RAC1 mRNA expression was examined from peripheral blood cells of patients. Colitis was induced in mice with conditional disruption of Rac1 in phagocytes by administration of dextran sulphate sodium (DSS).
We observed a genetic association between RAC1 with ulcerative colitis (UC) in a discovery cohort, 2 independent replication cohorts, and in combined analysis for the SNPs rs10951982 (Pcombined UC = 3.3 × 10–8, odds ratio [OR]=1.43 [1.26–1.63]) and rs4720672 (Pcombined UC=4.7 × 10–6, OR=1.36 [1.19–1.58]). Patients with IBD who had the rs10951982 risk allele had increased expression of RAC1, compared to those without this allele. Conditional disruption of Rac1 in macrophage and neutrophils of mice protected them against DSS-induced colitis.
Studies of human tissue samples and knockout mice demonstrated a role for the GTPase RAC1 in the development of UC; increased expression of RAC1 was associated with susceptibility to colitis.
PMCID: PMC3152589  PMID: 21684284
innate immunity; Crohn's disease; CD; Rac-1 knockout
17.  Construction of physical maps for the sex-specific regions of papaya sex chromosomes 
BMC Genomics  2012;13:176.
Papaya is a major fruit crop in tropical and subtropical regions worldwide. It is trioecious with three sex forms: male, female, and hermaphrodite. Sex determination is controlled by a pair of nascent sex chromosomes with two slightly different Y chromosomes, Y for male and Yh for hermaphrodite. The sex chromosome genotypes are XY (male), XYh (hermaphrodite), and XX (female). The papaya hermaphrodite-specific Yh chromosome region (HSY) is pericentromeric and heterochromatic. Physical mapping of HSY and its X counterpart is essential for sequencing these regions and uncovering the early events of sex chromosome evolution and to identify the sex determination genes for crop improvement.
A reiterate chromosome walking strategy was applied to construct the two physical maps with three bacterial artificial chromosome (BAC) libraries. The HSY physical map consists of 68 overlapped BACs on the minimum tiling path, and covers all four HSY-specific Knobs. One gap remained in the region of Knob 1, the only knob structure shared between HSY and X, due to the lack of HSY-specific sequences. This gap was filled on the physical map of the HSY corresponding region in the X chromosome. The X physical map consists of 44 BACs on the minimum tiling path with one gap remaining in the middle, due to the nature of highly repetitive sequences. This gap was filled on the HSY physical map. The borders of the non-recombining HSY were defined genetically by fine mapping using 1460 F2 individuals. The genetically defined HSY spanned approximately 8.5 Mb, whereas its X counterpart extended about 5.4 Mb including a 900 Kb region containing the Knob 1 shared by the HSY and X. The 8.5 Mb HSY corresponds to 4.5 Mb of its X counterpart, showing 4 Mb (89%) DNA sequence expansion.
The 89% increase of DNA sequence in HSY indicates rapid expansion of the Yh chromosome after genetic recombination was suppressed 2–3 million years ago. The genetically defined borders coincide with the common BACs on the minimum tiling paths of HSY and X. The minimum tiling paths of HSY and its X counterpart are being used for sequencing these X and Yh-specific regions.
PMCID: PMC3430574  PMID: 22568889
Bacterial artificial chromosome (BAC); Carica papaya; Sex chromosomes; Sex determination; Suppression of recombination
18.  Association analysis of photoperiodic flowering time genes in west and central African sorghum [Sorghum bicolor (L.) Moench] 
BMC Plant Biology  2012;12:32.
Photoperiod-sensitive flowering is a key adaptive trait for sorghum (Sorghum bicolor) in West and Central Africa. In this study we performed an association analysis to investigate the effect of polymorphisms within the genes putatively related to variation in flowering time on photoperiod-sensitive flowering in sorghum. For this purpose a genetically characterized panel of 219 sorghum accessions from West and Central Africa was evaluated for their photoperiod response index (PRI) based on two sowing dates under field conditions.
Sorghum accessions used in our study were genotyped for single nucleotide polymorphisms (SNPs) in six genes putatively involved in the photoperiodic control of flowering time. Applying a mixed model approach and previously-determined population structure parameters to these candidate genes, we found significant associations between several SNPs with PRI for the genes CRYPTOCHROME 1 (CRY1-b1) and GIGANTEA (GI).
The negative values of Tajima's D, found for the genes of our study, suggested that purifying selection has acted on genes involved in photoperiodic control of flowering time in sorghum. The SNP markers of our study that showed significant associations with PRI can be used to create functional markers to serve as important tools for marker-assisted selection of photoperiod-sensitive cultivars in sorghum.
PMCID: PMC3364917  PMID: 22394582
19.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity 
Nucleic Acids Research  2012;40(7):e49.
MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at
PMCID: PMC3326336  PMID: 22217600
20.  Effect of Linkage Disequilibrium on the Identification of Functional Variants 
Genetic epidemiology  2011;35(Suppl 1):S115-S119.
We summarize the contributions of Group 9 of Genetic Analysis Workshop 17. This group addressed the problems of linkage disequilibrium and other longer range forms of allelic association when evaluating the effects of genotypes on phenotypes. Issues raised by long-range associations, whether a result of selection, stratification, possible technical errors, or chance, were less expected but proved to be important. Most contributors focused on regression methods of various types to illustrate problematic issues or to develop adaptations for dealing with high-density genotype assays. Study design was also considered, as was graphical modeling. Although no method emerged as uniformly successful, most succeeded in reducing false-positive results either by considering clusters of loci within genes or by applying smoothing metrics that required results from adjacent loci to be similar. Two unexpected results that questioned our assumptions of what is required to model linkage disequilibrium were observed. The first was that correlations between loci separated by large genetic distances can greatly inflate single-locus test statistics, and, whether the result of selection, stratification, possible technical errors, or chance, these correlations seem overabundant. The second unexpected result was that applying principal components analysis to genome-wide genotype data can apparently control not only for population structure but also for linkage disequilibrium.
PMCID: PMC3248791  PMID: 22128051
score tests; two-stage study designs; robust regression; higher criticism; principal components analysis; graphical modeling
21.  Azospirillum Genomes Reveal Transition of Bacteria from Aquatic to Terrestrial Environments 
PLoS Genetics  2011;7(12):e1002430.
Fossil records indicate that life appeared in marine environments ∼3.5 billion years ago (Gyr) and transitioned to terrestrial ecosystems nearly 2.5 Gyr. Sequence analysis suggests that “hydrobacteria” and “terrabacteria” might have diverged as early as 3 Gyr. Bacteria of the genus Azospirillum are associated with roots of terrestrial plants; however, virtually all their close relatives are aquatic. We obtained genome sequences of two Azospirillum species and analyzed their gene origins. While most Azospirillum house-keeping genes have orthologs in its close aquatic relatives, this lineage has obtained nearly half of its genome from terrestrial organisms. The majority of genes encoding functions critical for association with plants are among horizontally transferred genes. Our results show that transition of some aquatic bacteria to terrestrial habitats occurred much later than the suggested initial divergence of hydro- and terrabacterial clades. The birth of the genus Azospirillum approximately coincided with the emergence of vascular plants on land.
Author Summary
Genome sequencing and analysis of plant-associated beneficial soil bacteria Azospirillum spp. reveals that these organisms transitioned from aquatic to terrestrial environments significantly later than the suggested major Precambrian divergence of aquatic and terrestrial bacteria. Separation of Azospirillum from their close aquatic relatives coincided with the emergence of vascular plants on land. Nearly half of the Azospirillum genome has been acquired horizontally, from distantly related terrestrial bacteria. The majority of horizontally acquired genes encode functions that are critical for adaptation to the rhizosphere and interaction with host plants.
PMCID: PMC3245306  PMID: 22216014
22.  A genome-wide linkage study of mammographic density, a risk factor for breast cancer 
Breast Cancer Research : BCR  2011;13(6):R132.
Mammographic breast density is a highly heritable (h2 > 0.6) and strong risk factor for breast cancer. We conducted a genome-wide linkage study to identify loci influencing mammographic breast density (MD).
Epidemiological data were assembled on 1,415 families from the Australia, Northern California and Ontario sites of the Breast Cancer Family Registry, and additional families recruited in Australia and Ontario. Families consisted of sister pairs with age-matched mammograms and data on factors known to influence MD. Single nucleotide polymorphism (SNP) genotyping was performed on 3,952 individuals using the Illumina Infinium 6K linkage panel.
Using a variance components method, genome-wide linkage analysis was performed using quantitative traits obtained by adjusting MD measurements for known covariates. Our primary trait was formed by fitting a linear model to the square root of the percentage of the breast area that was dense (PMD), adjusting for age at mammogram, number of live births, menopausal status, weight, height, weight squared, and menopausal hormone therapy. The maximum logarithm of odds (LOD) score from the genome-wide scan was on chromosome 7p14.1-p13 (LOD = 2.69; 63.5 cM) for covariate-adjusted PMD, with a 1-LOD interval spanning 8.6 cM. A similar signal was seen for the covariate adjusted area of the breast that was dense (DA) phenotype. Simulations showed that the complete sample had adequate power to detect LOD scores of 3 or 3.5 for a locus accounting for 20% of phenotypic variance. A modest peak initially seen on chromosome 7q32.3-q34 increased in strength when only the 513 families with at least two sisters below 50 years of age were included in the analysis (LOD 3.2; 140.7 cM, 1-LOD interval spanning 9.6 cM). In a subgroup analysis, we also found a LOD score of 3.3 for DA phenotype on chromosome 12.11.22-q13.11 (60.8 cM, 1-LOD interval spanning 9.3 cM), overlapping a region identified in a previous study.
The suggestive peaks and the larger linkage signal seen in the subset of pedigrees with younger participants highlight regions of interest for further study to identify genes that determine MD, with the goal of understanding mammographic density and its involvement in susceptibility to breast cancer.
PMCID: PMC3326574  PMID: 22188651
23.  Modes of Gene Duplication Contribute Differently to Genetic Novelty and Redundancy, but Show Parallels across Divergent Angiosperms 
PLoS ONE  2011;6(12):e28150.
Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored.
In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution.
Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution.
PMCID: PMC3229532  PMID: 22164235
24.  Physical Mapping in a Triplicated Genome: Mapping the Downy Mildew Resistance Locus Pp523 in Brassica oleracea L. 
G3: Genes|Genomes|Genetics  2011;1(7):593-601.
We describe the construction of a BAC contig and identification of a minimal tiling path that encompass the dominant and monogenically inherited downy mildew resistance locus Pp523 of Brassica oleracea L. The selection of BAC clones for construction of the physical map was carried out by screening gridded BAC libraries with DNA overgo probes derived from both genetically mapped DNA markers flanking the locus of interest and BAC-end sequences that align to Arabidopsis thaliana sequences within the previously identified syntenic region. The selected BAC clones consistently mapped to three different genomic regions of B. oleracea. Although 83 BAC clones were accurately mapped within a ∼4.6 cM region surrounding the downy mildew resistance locus Pp523, a subset of 33 BAC clones mapped to another region on chromosome C8 that was ∼60 cM away from the resistance gene, and a subset of 63 BAC clones mapped to chromosome C5. These results reflect the triplication of the Brassica genomes since their divergence from a common ancestor shared with A. thaliana, and they are consonant with recent analyses of the C genome of Brassica napus. The assembly of a minimal tiling path constituted by 13 (BoT01) BAC clones that span the Pp523 locus sets the stage for map-based cloning of this resistance gene.
PMCID: PMC3276173  PMID: 22384370
genetic resistance; plant disease resistance; map-based cloning; BAC contig; genome triplication
25.  Identifying rare variants from exome scans: the GAW17 experience 
BMC Proceedings  2011;5(Suppl 9):S1.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
PMCID: PMC3287821  PMID: 22373325

Results 1-25 (91)