Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data.
We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline.
Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
Polymorphisms; Linkage disequilibrium; Maximum likelihood
Whole genome duplication (WGD) is widespread in flowering plants and is a driving force in angiosperm diversification. The redundancy introduced by WGD allows the evolution of novel gene interactions and functions, although the patterns and processes of diversification are poorly understood. We identified ∼2,000 pairs of paralogous genes in Gossypium raimondii (cotton) resulting from an approximately 60 My old 5- to 6-fold ploidy increase. Gene expression analyses revealed that, in G. raimondii, 99.4% of the gene pairs exhibit differential expression in at least one of the three tissues (petal, leaf, and seed), with 93% to 94% exhibiting differential expression on a per-tissue basis. For 1,666 (85%) pairs, differential expression was observed in all tissues. These observations were mirrored in a time series of G. raimondii seed, and separately in leaf, petal, and seed of G. arboreum, indicating expression level diversification before species divergence. A generalized linear model revealed 92.4% of the paralog pairs exhibited expression divergence, with most exhibiting significant gene and tissue interactions indicating complementary expression patterns in different tissues. These data indicate massive, near-complete expression level neo- and/or subfunctionalization among ancient gene duplicates, suggesting these processes are essential in their maintenance over ∼60 Ma.
Identification of single-gene causes of steroid-resistant nephrotic syndrome (SRNS)
has furthered the understanding of the pathogenesis of this disease. Here, using a
combination of homozygosity mapping and whole human exome resequencing, we identified
mutations in the aarF domain containing kinase 4 (ADCK4) gene in 15
individuals with SRNS from 8 unrelated families. ADCK4 was highly similar to ADCK3,
which has been shown to participate in coenzyme Q10 (CoQ10)
biosynthesis. Mutations in ADCK4 resulted in reduced
CoQ10 levels and reduced mitochondrial respiratory enzyme activity in
cells isolated from individuals with SRNS and transformed lymphoblasts. Knockdown of
adck4 in zebrafish and Drosophila recapitulated
nephrotic syndrome-associated phenotypes. Furthermore, ADCK4 was expressed in
glomerular podocytes and partially localized to podocyte mitochondria and foot
processes in rat kidneys and cultured human podocytes. In human podocytes, ADCK4
interacted with members of the CoQ10 biosynthesis pathway, including COQ6,
which has been linked with SRNS and COQ7. Knockdown of ADCK4 in podocytes resulted in
decreased migration, which was reversed by CoQ10 addition. Interestingly,
a patient with SRNS with a homozygous ADCK4 frameshift mutation had
partial remission following CoQ10 treatment. These data indicate that
individuals with SRNS with mutations in ADCK4 or other genes that
participate in CoQ10 biosynthesis may be treatable with CoQ10.
The NOX2 NADPH oxidase complex produces reactive oxygen species and plays a critical role in the killing of microbes by phagocytes. Genetic mutations in genes encoding components of the complex result in both X-linked and autosomal recessive forms of chronic granulomatous disease (CGD). Patients with CGD often develop intestinal inflammation that is histologically similar to Crohn's colitis, suggesting a common aetiology for both diseases. The aim of this study is to determine if polymorphisms in NOX2 NADPH oxidase complex genes that do not cause CGD are associated with the development of inflammatory bowel disease (IBD).
Direct sequencing and candidate gene approaches were used to identify susceptibility loci in NADPH oxidase complex genes. Functional studies were carried out on identified variants. Novel findings were replicated in independent cohorts.
Sequence analysis identified a novel missense variant in the neutrophil cytosolic factor 2 (NCF2) gene that is associated with very early onset IBD (VEO-IBD) and subsequently found in 4% of patients with VEO-IBD compared with 0.2% of controls (p=1.3×10−5, OR 23.8 (95% CI 3.9 to 142.5); Fisher exact test). This variant reduced binding of the NCF2 gene product p67phox to RAC2. This study found a novel genetic association of RAC2 with Crohn's disease (CD) and replicated the previously reported association of NCF4 with ileal CD.
These studies suggest that the rare novel p67phox variant results in partial inhibition of oxidase function and are associated with CD in a subgroup of patients with VEO-IBD; and suggest that components of the NADPH oxidase complex are associated with CD.
We constructed a very-high-density, whole-genome marker map (WGMM) for cotton by using 18,597 DNA markers corresponding to 48,958 loci that were aligned to both a consensus genetic map and a reference genome sequence. The WGMM has a density of one locus per 15.6 kb, or an average of 1.3 loci per gene. The WGMM was anchored by the use of colinear markers to a detailed genetic map, providing recombinational information. Mapped markers occurred at relatively greater physical densities in distal chromosomal regions and lower physical densities in the central regions, with all 1 Mb bins having at least nine markers. Hotspots for quantitative trait loci and resistance gene analog clusters were aligned to the map and DNA markers identified for targeting of these regions of high practical importance. Based on the cotton D genome reference sequence, the locations of chromosome structural rearrangements plotted on the map facilitate its translation to other Gossypium genome types. The WGMM is a versatile genetic map for marker assisted breeding, fine mapping and cloning of genes and quantitative trait loci, developing new genetic markers and maps, genome-wide association mapping, and genome evolution studies.
quantitative trait loci; resistance gene analog; simple sequence repeat; restriction fragment length polymorphism; inversions
Divergence in gene structure following gene duplication is not well understood. Gene duplication can occur via whole-genome duplication (WGD) and single-gene duplications including tandem, proximal and transposed duplications. Different modes of gene duplication may be associated with different types, levels, and patterns of structural divergence.
In Arabidopsis thaliana, we denote levels of structural divergence between duplicated genes by differences in coding-region lengths and average exon lengths, and the number of insertions/deletions (indels) and maximum indel length in their protein sequence alignment. Among recent duplicates of different modes, transposed duplicates diverge most dramatically in gene structure. In transposed duplications, parental loci tend to have longer coding-regions and exons, and smaller numbers of indels and maximum indel lengths than transposed loci, reflecting biased structural changes in transposed duplications. Structural divergence increases with evolutionary time for WGDs, but not transposed duplications, possibly because of biased gene losses following transposed duplications. Structural divergence has heterogeneous relationships with nucleotide substitution rates, but is consistently positively correlated with gene expression divergence. The NBS-LRR gene family shows higher-than-average levels of structural divergence.
Our study suggests that structural divergence between duplicated genes is greatly affected by the mechanisms of gene duplication and may be not proportional to evolutionary time, and that certain gene families are under selection on rapid evolution of gene structure.
Gene structure; Divergence; Transposed duplication; Whole-genome duplication; Selection; Arabidopsis
To identify genetic predictors of diabetes-associated ED using genome
wide and candidate gene approaches in a cohort of men with type I
We examined 528 white men with T1D (125 with ED) from the DCCT and
its observational follow up EDIC Study. ED was defined from a single item of
the IIEF. An Illumina Human1M BeadChip was used for genotyping. 867,125
single nucleotide polymorphisms (SNPs) were subjected to analysis. Whole
genome and candidate gene approaches tested the hypothesis that genetic
polymorphisms may predispose men with T1D to ED. Univariate and multivariate
models were used controlling for age, HbA1c, diabetes duration, and prior
randomization to intensive or conventional insulin therapy during DCCT. A
stratified false discovery rate was used to perform the candidate gene
Two SNPs located on chromosome 3 in one genomic loci were associated
with ED with p < 1×10−6. rs9810233 had a
p-value of 7 × 10−7 and rs1920201 had a p-value
of 9×10−7 The nearest gene to these two SNPs is
ALCAM. The genetic association results at these loci were similar in
univariate and multivariate analysis. No candidate genes met criteria for
Two SNPs, rs9810233 and rs1920101, which are 25 kb apart, are both
associated with ED, albeit not meeting the standard GWAS significance
criteria of p < 5 × 10−8. Other studies with
larger sample sizes will be required to determine whether ALCAM represents a
novel gene in the pathogenesis of diabetes associated ED.
Erectile Dysfunction; Diabetes; Genetics
Refractive error is the most common eye disorder worldwide, and a prominent cause of blindness. Myopia affects over 30% of Western populations, and up to 80% of Asians. The CREAM consortium conducted genome-wide meta-analyses including 37,382 individuals from 27 studies of European ancestry, and 8,376 from 5 Asian cohorts. We identified 16 new loci for refractive error in subjects of European ancestry, of which 8 were shared with Asians. Combined analysis revealed 8 additional loci. The new loci include genes with functions in neurotransmission (GRIA4), ion channels (KCNQ5), retinoic acid metabolism (RDH5), extracellular matrix remodeling (LAMA2, BMP2), and eye development (SIX6, PRSS56). We also confirmed previously reported associations with GJD2 and RASGRF1. Risk score analysis using associated SNPs showed a tenfold increased risk of myopia for subjects with the highest genetic load. Our results, accumulated across independent multi-ethnic studies, considerably advance understanding of mechanisms involved in refractive error and myopia.
Haptoglobin (Hp) is an abundant serum protein which binds extracorpuscular hemoglobin (Hb). Two alleles exist in humans for the Hp gene, denoted 1 and 2. Diabetic individuals with the Hp 2-2 genotype are at increased risk of developing vascular complications including heart attack, stroke, and kidney disease. Recent evidence shows that treatment with vitamin E can reduce the risk of diabetic vascular complications by as much as 50% in Hp 2-2 individuals. We sought to develop a rapid and accurate test for Hp phenotype (which is 100% concordant with the three major Hp genotypes) to facilitate widespread diagnostic testing as well as prospective clinical trials.
A monoclonal antibody raised against human Hp was shown to distinguish between the three Hp phenotypes in an enzyme linked immunosorbent assay (ELISA). Hp phenotypes obtained in over 8000 patient samples using this ELISA method were compared with those obtained by polyacrylamide gel electrophoresis or the TaqMan PCR method.
Our analysis showed that the sensitivity and specificity of the ELISA test for Hp 2-2 phenotype is 99.0% and 98.1%, respectively. The positive predictive value and the negative predictive value for Hp 2-2 phenotype is 97.5% and 99.3%, respectively. Similar results were obtained for Hp 2-1 and Hp 1-1 phenotypes. In addition, the ELISA was determined to be more sensitive and specific than the TaqMan method.
The Hp ELISA represents a user-friendly, rapid and highly accurate diagnostic tool for determining Hp phenotypes. This test will greatly facilitate the typing of thousands of samples in ongoing clinical studies.
diabetes; ELISA; haptoglobin phenotype; pharmacogenomics; vitamin E
Percent mammographic density (PMD) is a strong and highly heritable risk factor for breast cancer. Studies of the role of PMD in familial breast cancer may require controls, such as the sisters of cases, selected from the same 'risk set' as the cases. The use of sister controls would allow control for factors that have been shown to influence risk of breast cancer such as race/ethnicity, socioeconomic status and a family history of breast cancer, but may introduce 'overmatching' and attenuate case-control differences in PMD.
To examine the potential effects of using sister controls rather than unrelated controls in a case-control study, we examined PMD in triplets, each comprised of a case with invasive breast cancer, an unaffected full sister control, and an unaffected unrelated control. Both controls were matched to cases on age at mammogram. Total breast area and dense area in the mammogram were measured in the unaffected breast of cases and a randomly selected breast in controls, and the non-dense area and PMD calculated from these measurements.
The mean difference in PMD between cases and controls, and the standard deviation (SD) of the difference, were slightly less for sister controls (4.2% (SD = 20.0)) than for unrelated controls (4.9% (SD = 25.7)). We found statistically significant correlations in PMD between cases (n = 228) and sister controls (n = 228) (r = 0.39 (95% CI: 0.28, 0.50; P <0.0001)), but not between cases and unrelated controls (n = 228) (r = 0.04 (95% CI: -0.09, 0.17; P = 0.51)). After adjusting for other risk factors, square root transformed PMD was associated with an increased risk of breast cancer when comparing cases to sister controls (adjusted odds ratio (inter-quintile odds ratio (IQOR) = 2.19, 95% CI = 1.20, 4.00) or to unrelated controls (adjusted IQOR = 2.62, 95% CI = 1.62, 4.25).
The use of sister controls in case-control studies of PMD resulted in a modest attenuation of case-control differences and risk estimates, but showed a statistically significant association with risk and allowed control for race/ethnicity, socioeconomic status and family history.
Mammographic density; case-control study; overmatching; case control
To investigate the underlying phenotypic constructs in autism spectrum disorders (ASD) and to identify genetic loci that are linked to these empirically derived factors.
Exploratory factor analysis was applied to two datasets with 28 selected Autism Diagnostic Interview-Revised (ADI-R) algorithm items. The first dataset was from the Autism Genome Project (AGP) phase I (1,236 ASD subjects from 618 families); the second was from the AGP phase II (804 unrelated ASD subjects). Variables derived from the factor analysis were then used as quantitative traits in genome-wide variance components linkage analyses.
Six factors, joint attention, social interaction and communication, non-verbal communication, repetitive sensory-motor behaviour, peer interaction, and compulsion/restricted interests, were retained for both datasets. There was good agreement between the factor loading patterns from the two datasets. All factors showed familial aggregation. Suggestive evidence for linkage was obtained for the joint attention factor on 11q23. Genome-wide significant evidence for linkage was obtained for the repetitive sensory-motor behaviour factor on 19q13.3.
This study demonstrates that the underlying phenotypic constructs based on the ADI-R algorithm items are replicable in independent datasets; and the empirically derived factors are suitable and informative in genetic studies of ASD.
autism; ADI-R; factor analysis; linkage analysis; quantitative trait
Urogenital prolapse can have a significant impact on quality of life. The life time risk of requiring surgery for urogenital prolapse is 11%. Prolift mesh has recently been introduced to reduce repeat operation rate and for long-term benefit.
To evaluate the outcome of the treatment of urogenital prolapse with synthetic mesh.
A retrospective review of case notes of all women who underwent prolift mesh insertion for prolapse between July 2004 and June 2005, at Royal Alexandra Hospital Paisley UK. We looked at the presenting complaints, previous operation, intraoperative complications and complications at six weeks and six months follow-up.
Twenty-two procedures were carried out in the twelve months period. Age of the patients ranged from 55 to 82 years (median 64 yrs). Eleven had anterior Prolift (50%), Seven had posterior Prolift 31.8% and four total Prolift 18%. There were no intraoperative complications. All the patients had previous surgery for prolapse. Eight patients had anterior repair, six patients had posterior repair, and three patients had abdominal hysterectomy. Vaginal hysterectomy was carried out with mesh insertion as a concomitant procedure in seven cases (31.25%). All patients were seen at six weeks and six months after the surgery. Complications rate included mesh erosion one patient and suture material protruding in the vagina one patient, one patient had failed prolift operation. All the twenty-one patients were cured giving 95.4% success rate.
The use of prolene mesh in pelvic reconstructive surgery was associated with good outcome and minimal complications in this study.
Prolift; Mesh; Urogenital prolapse
We describe a recombinant inbred line (RIL) population of 161 F5 genotypes for the widest euploid cross that can be made to cultivated sorghum (Sorghum bicolor) using conventional techniques, S. bicolor × Sorghum propinquum, that segregates for many traits related to plant architecture, growth and development, reproduction, and life history. The genetic map of the S. bicolor × S. propinquum RILs contains 141 loci on 10 linkage groups collectively spanning 773.1 cM. Although the genetic map has DNA marker density well-suited to quantitative trait loci mapping and samples most of the genome, our previous observations that sorghum pericentromeric heterochromatin is recalcitrant to recombination is highlighted by the finding that the vast majority of recombination in sorghum is concentrated in small regions of euchromatin that are distal to most chromosomes. The advancement of the RIL population in an environment to which the S. bicolor parent was well adapted (indeed bred for) but the S. propinquum parent was not largely eliminated an allele for short-day flowering that confounded many other traits, for example, permitting us to map new quantitative trait loci for flowering that previously eluded detection. Additional recombination that has accrued in the development of this RIL population also may have improved resolution of apices of heterozygote excess, accounting for their greater abundance in the F5 than the F2 generation. The S. bicolor × S. propinquum RIL population offers advantages over early-generation populations that will shed new light on genetic, environmental, and physiological/biochemical factors that regulate plant growth and development.
quantitative trait locus; simple-sequence repeat; DNA marker; recombination; segregation distortion
Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD.
Percent mammographic breast density (PMD) is a strong heritable risk factor for breast cancer. However, the pathways through which this risk is mediated are still unclear. To explore whether PMD and breast cancer have a shared genetic basis, we identified genetic variants most strongly associated with PMD in a published meta-analysis of five genome-wide association studies (GWAS) and used these to construct risk scores for 3628 breast cancer cases and 5190 controls from the UK2 GWAS of breast cancer. The signed per-allele effect estimates of SNPs were multiplied with the respective allele counts in the individual and summed over all SNPs to derive the risk score for an individual. These scores were included as the exposure variable in a logistic regression model with breast cancer case-control status as the outcome. This analysis was repeated using ten different cut-offs for the most significant density SNPs (1-10% representing 5,222-50,899 SNPs). Permutation analysis was also performed across all 10 cut-offs. The association between risk score and breast cancer was significant for all cut-offs from 3-10% of top density SNPs, being most significant for the 6% (2-sided P=0.002) to 10% (P=0.001) cut-offs (overall permutation P=0.003). Women in the top 10% of the risk score distribution had a 31% increased risk of breast cancer [OR= 1.31 (95%CI 1.08-1.59)] compared to women in the bottom 10%. Together, our results demonstrate that PMD and breast cancer have a shared genetic basis that is mediated through a large number of common variants.
breast cancer; mammographic density; SNPs; polygenic; Mendelian Randomisation
Background & Aims
RAC1 is a GTPase that has an evolutionarily conserved role in coordinating immune defenses, from plants to mammals. Chronic inflammatory bowel diseases (IBD) are associated with dysregulation of immune defenses. We studied the role of RAC1 in IBD using human genetic and functional studies and animal models of colitis.
We used a candidate gene approach to HapMap-Tag single nucleotide polymorphisms (SNPs) in a discovery cohort; findings were confirmed in 2 additional cohorts. RAC1 mRNA expression was examined from peripheral blood cells of patients. Colitis was induced in mice with conditional disruption of Rac1 in phagocytes by administration of dextran sulphate sodium (DSS).
We observed a genetic association between RAC1 with ulcerative colitis (UC) in a discovery cohort, 2 independent replication cohorts, and in combined analysis for the SNPs rs10951982 (Pcombined UC = 3.3 × 10–8, odds ratio [OR]=1.43 [1.26–1.63]) and rs4720672 (Pcombined UC=4.7 × 10–6, OR=1.36 [1.19–1.58]). Patients with IBD who had the rs10951982 risk allele had increased expression of RAC1, compared to those without this allele. Conditional disruption of Rac1 in macrophage and neutrophils of mice protected them against DSS-induced colitis.
Studies of human tissue samples and knockout mice demonstrated a role for the GTPase RAC1 in the development of UC; increased expression of RAC1 was associated with susceptibility to colitis.
innate immunity; Crohn's disease; CD; Rac-1 knockout
Papaya is a major fruit crop in tropical and subtropical regions worldwide. It is trioecious with three sex forms: male, female, and hermaphrodite. Sex determination is controlled by a pair of nascent sex chromosomes with two slightly different Y chromosomes, Y for male and Yh for hermaphrodite. The sex chromosome genotypes are XY (male), XYh (hermaphrodite), and XX (female). The papaya hermaphrodite-specific Yh chromosome region (HSY) is pericentromeric and heterochromatic. Physical mapping of HSY and its X counterpart is essential for sequencing these regions and uncovering the early events of sex chromosome evolution and to identify the sex determination genes for crop improvement.
A reiterate chromosome walking strategy was applied to construct the two physical maps with three bacterial artificial chromosome (BAC) libraries. The HSY physical map consists of 68 overlapped BACs on the minimum tiling path, and covers all four HSY-specific Knobs. One gap remained in the region of Knob 1, the only knob structure shared between HSY and X, due to the lack of HSY-specific sequences. This gap was filled on the physical map of the HSY corresponding region in the X chromosome. The X physical map consists of 44 BACs on the minimum tiling path with one gap remaining in the middle, due to the nature of highly repetitive sequences. This gap was filled on the HSY physical map. The borders of the non-recombining HSY were defined genetically by fine mapping using 1460 F2 individuals. The genetically defined HSY spanned approximately 8.5 Mb, whereas its X counterpart extended about 5.4 Mb including a 900 Kb region containing the Knob 1 shared by the HSY and X. The 8.5 Mb HSY corresponds to 4.5 Mb of its X counterpart, showing 4 Mb (89%) DNA sequence expansion.
The 89% increase of DNA sequence in HSY indicates rapid expansion of the Yh chromosome after genetic recombination was suppressed 2–3 million years ago. The genetically defined borders coincide with the common BACs on the minimum tiling paths of HSY and X. The minimum tiling paths of HSY and its X counterpart are being used for sequencing these X and Yh-specific regions.
Bacterial artificial chromosome (BAC); Carica papaya; Sex chromosomes; Sex determination; Suppression of recombination
Photoperiod-sensitive flowering is a key adaptive trait for sorghum (Sorghum bicolor) in West and Central Africa. In this study we performed an association analysis to investigate the effect of polymorphisms within the genes putatively related to variation in flowering time on photoperiod-sensitive flowering in sorghum. For this purpose a genetically characterized panel of 219 sorghum accessions from West and Central Africa was evaluated for their photoperiod response index (PRI) based on two sowing dates under field conditions.
Sorghum accessions used in our study were genotyped for single nucleotide polymorphisms (SNPs) in six genes putatively involved in the photoperiodic control of flowering time. Applying a mixed model approach and previously-determined population structure parameters to these candidate genes, we found significant associations between several SNPs with PRI for the genes CRYPTOCHROME 1 (CRY1-b1) and GIGANTEA (GI).
The negative values of Tajima's D, found for the genes of our study, suggested that purifying selection has acted on genes involved in photoperiodic control of flowering time in sorghum. The SNP markers of our study that showed significant associations with PRI can be used to create functional markers to serve as important tools for marker-assisted selection of photoperiod-sensitive cultivars in sorghum.
MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/.
We summarize the contributions of Group 9 of Genetic Analysis Workshop 17. This group addressed the problems of linkage disequilibrium and other longer range forms of allelic association when evaluating the effects of genotypes on phenotypes. Issues raised by long-range associations, whether a result of selection, stratification, possible technical errors, or chance, were less expected but proved to be important. Most contributors focused on regression methods of various types to illustrate problematic issues or to develop adaptations for dealing with high-density genotype assays. Study design was also considered, as was graphical modeling. Although no method emerged as uniformly successful, most succeeded in reducing false-positive results either by considering clusters of loci within genes or by applying smoothing metrics that required results from adjacent loci to be similar. Two unexpected results that questioned our assumptions of what is required to model linkage disequilibrium were observed. The first was that correlations between loci separated by large genetic distances can greatly inflate single-locus test statistics, and, whether the result of selection, stratification, possible technical errors, or chance, these correlations seem overabundant. The second unexpected result was that applying principal components analysis to genome-wide genotype data can apparently control not only for population structure but also for linkage disequilibrium.
score tests; two-stage study designs; robust regression; higher criticism; principal components analysis; graphical modeling
Fossil records indicate that life appeared in marine environments ∼3.5 billion years ago (Gyr) and transitioned to terrestrial ecosystems nearly 2.5 Gyr. Sequence analysis suggests that “hydrobacteria” and “terrabacteria” might have diverged as early as 3 Gyr. Bacteria of the genus Azospirillum are associated with roots of terrestrial plants; however, virtually all their close relatives are aquatic. We obtained genome sequences of two Azospirillum species and analyzed their gene origins. While most Azospirillum house-keeping genes have orthologs in its close aquatic relatives, this lineage has obtained nearly half of its genome from terrestrial organisms. The majority of genes encoding functions critical for association with plants are among horizontally transferred genes. Our results show that transition of some aquatic bacteria to terrestrial habitats occurred much later than the suggested initial divergence of hydro- and terrabacterial clades. The birth of the genus Azospirillum approximately coincided with the emergence of vascular plants on land.
Genome sequencing and analysis of plant-associated beneficial soil bacteria Azospirillum spp. reveals that these organisms transitioned from aquatic to terrestrial environments significantly later than the suggested major Precambrian divergence of aquatic and terrestrial bacteria. Separation of Azospirillum from their close aquatic relatives coincided with the emergence of vascular plants on land. Nearly half of the Azospirillum genome has been acquired horizontally, from distantly related terrestrial bacteria. The majority of horizontally acquired genes encode functions that are critical for adaptation to the rhizosphere and interaction with host plants.
Mammographic breast density is a highly heritable (h2 > 0.6) and strong risk factor for breast cancer. We conducted a genome-wide linkage study to identify loci influencing mammographic breast density (MD).
Epidemiological data were assembled on 1,415 families from the Australia, Northern California and Ontario sites of the Breast Cancer Family Registry, and additional families recruited in Australia and Ontario. Families consisted of sister pairs with age-matched mammograms and data on factors known to influence MD. Single nucleotide polymorphism (SNP) genotyping was performed on 3,952 individuals using the Illumina Infinium 6K linkage panel.
Using a variance components method, genome-wide linkage analysis was performed using quantitative traits obtained by adjusting MD measurements for known covariates. Our primary trait was formed by fitting a linear model to the square root of the percentage of the breast area that was dense (PMD), adjusting for age at mammogram, number of live births, menopausal status, weight, height, weight squared, and menopausal hormone therapy. The maximum logarithm of odds (LOD) score from the genome-wide scan was on chromosome 7p14.1-p13 (LOD = 2.69; 63.5 cM) for covariate-adjusted PMD, with a 1-LOD interval spanning 8.6 cM. A similar signal was seen for the covariate adjusted area of the breast that was dense (DA) phenotype. Simulations showed that the complete sample had adequate power to detect LOD scores of 3 or 3.5 for a locus accounting for 20% of phenotypic variance. A modest peak initially seen on chromosome 7q32.3-q34 increased in strength when only the 513 families with at least two sisters below 50 years of age were included in the analysis (LOD 3.2; 140.7 cM, 1-LOD interval spanning 9.6 cM). In a subgroup analysis, we also found a LOD score of 3.3 for DA phenotype on chromosome 12.11.22-q13.11 (60.8 cM, 1-LOD interval spanning 9.3 cM), overlapping a region identified in a previous study.
The suggestive peaks and the larger linkage signal seen in the subset of pedigrees with younger participants highlight regions of interest for further study to identify genes that determine MD, with the goal of understanding mammographic density and its involvement in susceptibility to breast cancer.
Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored.
In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution.
Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution.
We describe the construction of a BAC contig and identification of a minimal tiling path that encompass the dominant and monogenically inherited downy mildew resistance locus Pp523 of Brassica oleracea L. The selection of BAC clones for construction of the physical map was carried out by screening gridded BAC libraries with DNA overgo probes derived from both genetically mapped DNA markers flanking the locus of interest and BAC-end sequences that align to Arabidopsis thaliana sequences within the previously identified syntenic region. The selected BAC clones consistently mapped to three different genomic regions of B. oleracea. Although 83 BAC clones were accurately mapped within a ∼4.6 cM region surrounding the downy mildew resistance locus Pp523, a subset of 33 BAC clones mapped to another region on chromosome C8 that was ∼60 cM away from the resistance gene, and a subset of 63 BAC clones mapped to chromosome C5. These results reflect the triplication of the Brassica genomes since their divergence from a common ancestor shared with A. thaliana, and they are consonant with recent analyses of the C genome of Brassica napus. The assembly of a minimal tiling path constituted by 13 (BoT01) BAC clones that span the Pp523 locus sets the stage for map-based cloning of this resistance gene.
genetic resistance; plant disease resistance; map-based cloning; BAC contig; genome triplication
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.