Search tips
Search criteria

Results 1-25 (58)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Inferring models of multiscale copy number evolution for single-tumor phylogenetics 
Bioinformatics  2015;31(12):i258-i267.
Motivation: Phylogenetic algorithms have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Developing reliable phylogenies for tumor data requires quantitative models of cancer evolution that include the unusual genetic mechanisms by which tumors evolve, such as chromosome abnormalities, and allow for heterogeneity between tumor types and individual patients. Previous work on inferring phylogenies of single tumors by copy number evolution assumed models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models.
Results: We propose a framework for inferring models of tumor progression from single-cell gene copy number data, including variable rates for different gain and loss events. We propose a new algorithm for identification of most parsimonious combinations of single gene and single chromosome events. We extend it via dynamic programming to include genome duplications. We implement an expectation maximization (EM)-like method to estimate mutation-specific and tumor-specific event rates concurrently with tree reconstruction. Application of our algorithms to real cervical cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for the metastasis of primary cervical cancers and for tongue cancer survival.
Availability and implementation: Our software (FISHtrees) and two datasets are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4481700  PMID: 26072490
2.  Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation 
Nucleic Acids Research  2014;42(20):12367-12379.
While individual non-B DNA structures have been shown to impact gene expression, their broad regulatory role remains elusive. We utilized genomic variants and expression quantitative trait loci (eQTL) data to analyze genome-wide variation propensities of potential non-B DNA regions and their relation to gene expression. Independent of genomic location, these regions were enriched in nucleotide variants. Our results are consistent with previously observed mutagenic properties of these regions and counter a previous study concluding that G-quadruplex regions have a reduced frequency of variants. While such mutagenicity might undermine functionality of these elements, we identified in potential non-B DNA regions a signature of negative selection. Yet, we found a depletion of eQTL-associated variants in potential non-B DNA regions, opposite to what might be expected from their proposed regulatory role. However, we also observed that genes downstream of potential non-B DNA regions showed higher expression variation between individuals. This coupling between mutagenicity and tolerance for expression variability of downstream genes may be a result of evolutionary adaptation, which allows reconciling mutagenicity of non-B DNA structures with their location in functionally important regions and their potential regulatory role.
PMCID: PMC4227770  PMID: 25336616
3.  Accuracy and coverage assessment of Oryctolagus cuniculus (Rabbit) Genes Encoding Immunoglobulins in the Whole Genome Sequence Assembly (OryCun2.0) and Localization of the IGH Locus to Chromosome 20 
Immunogenetics  2013;65(10):749-762.
We report analyses of genes encoding immunoglobulin heavy and light chains in the rabbit 6.51x whole genome assembly. This OryCun2.0 assembly confirms previous mapping of the duplicated IGK1 and IGK2 loci to chromosome 2 and the IGL lambda light chain locus to chromosome 21. The most frequently rearranged and expressed IGHV1 that is closest to IG DH and IGHJ genes encodes rabbit VHa allotypes. The partially inbred Thorbecke strain rabbit used for whole-genome sequencing was homozygous at the IGK but heterozygous with the IGHV1a1 allele in one of 79 IGHV-containing unplaced scaffolds and IGHV1a2, IGHM, IGHG and IGHE sequences in another. Some IGKV, IGLV and IGHA genes are also in other unplaced scaffolds. By fluorescence in situ hybridization, we assigned the previously unmapped IGH locus to the q-telomeric region of rabbit chromosome 20. An approximately 3 Mb segment of human chromosome 14 including IGH genes predicted to map to this telomeric region based on synteny analysis could not be located on assembled chromosome 20. Unplaced scaffold chrUn0053 contains some of the genes that comparative mapping predicts to be missing. We identified discrepancies between previous targeted studies and the OryCun2.0 assembly and some new BAC clones with IGH sequences that can guide other studies to further sequence and improve the OryCun2.0 assembly. Complete knowledge of gene sequences encoding variable regions of rabbit heavy, kappa and lambda chains will lead to better understanding of how and why rabbits produce antibodies of high specificity and affinity through gene conversion and somatic hypermutation.
PMCID: PMC3780782  PMID: 23925440
Rabbit; Immunoglobulin Genes; Heavy Chains; Fluorescence in situ hybridization; Chromosome 20; Light Chains
4.  Endogenous Retrovirus Insertion in the KIT Oncogene Determines White and White spotting in Domestic Cats 
G3: Genes|Genomes|Genetics  2014;4(10):1881-1891.
The Dominant White locus (W) in the domestic cat demonstrates pleiotropic effects exhibiting complete penetrance for absence of coat pigmentation and incomplete penetrance for deafness and iris hypopigmentation. We performed linkage analysis using a pedigree segregating White to identify KIT (Chr. B1) as the feline W locus. Segregation and sequence analysis of the KIT gene in two pedigrees (P1 and P2) revealed the remarkable retrotransposition and evolution of a feline endogenous retrovirus (FERV1) as responsible for two distinct phenotypes of the W locus, Dominant White, and white spotting. A full-length (7125 bp) FERV1 element is associated with white spotting, whereas a FERV1 long terminal repeat (LTR) is associated with all Dominant White individuals. For purposes of statistical analysis, the alternatives of wild-type sequence, FERV1 element, and LTR-only define a triallelic marker. Taking into account pedigree relationships, deafness is genetically linked and associated with this marker; estimated P values for association are in the range of 0.007 to 0.10. The retrotransposition interrupts a DNAase I hypersensitive site in KIT intron 1 that is highly conserved across mammals and was previously demonstrated to regulate temporal and tissue-specific expression of KIT in murine hematopoietic and melanocytic cells. A large-population genetic survey of cats (n = 270), representing 30 cat breeds, supports our findings and demonstrates statistical significance of the FERV1 LTR and full-length element with Dominant White/blue iris (P < 0.0001) and white spotting (P < 0.0001), respectively.
PMCID: PMC4199695  PMID: 25085922
White; domestic cat; deaf; white spotting; retrotransposition; FERV1
5.  Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics 
PLoS Computational Biology  2014;10(7):e1003740.
We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population.
Author Summary
Cancer is an evolutionary system whose growth and development is attributed to aberrations in well-known genes and to cancer-type specific genomic imbalances. Here, we present methods for reconstructing the evolution of individual tumors based on cell-to-cell variations between copy numbers of targeted regions of the genome. The methods are designed to work with fluorescence in situ hybridization (FISH), a technique that allows one to profile copy number changes in potentially thousands of single cells per study. Our work advances the prior art by developing theory and practical algorithms for building evolutionary trees of single tumors that can model gain or loss of genetic regions at the scale of single genes, whole chromosomes, or the entire genome, all common events in tumor evolution. We apply these methods on simulated and real tumor data to demonstrate substantial improvements in tree-building accuracy and in our ability to accurately classify tumors from their inferred evolutionary models. The newly developed algorithms have been released through our publicly available software, FISHtrees.
PMCID: PMC4117424  PMID: 25078894
6.  The BEACH is hot: A LYST of emerging roles for BEACH-domain containing proteins in human disease 
Traffic (Copenhagen, Denmark)  2013;14(7):10.1111/tra.12069.
BEACH (named after ‘Beige and Chediak-Higashi’) is a conserved ~280 residue domain, present in nine human BEACH domain containing proteins (BDCPs). Most BDCPs are large, containing a PH-like domain for membrane association preceding their BEACH domain, and containing WD40 and other domains for ligand binding. Recent studies found that mutations in individual BDCPs cause several human diseases. BDCP alterations affect lysosome size (LYST and NSMAF), apoptosis (NSMAF), autophagy (LYST, WDFY3, LRBA), granule size (LYST, NBEAL2, NBEA), or synapse formation (NBEA). However, the roles of each BDCP in these membrane events remain controversial. After reviewing studies on individual BDCPs, we propose a unifying hypothesis that BDCPs act as scaffolding proteins that facilitate membrane events, including both fission and fusion, determined by their binding partners. BDCPs may also bind each other, enabling fusion or fission of vesicles that are not necessarily of the same type. Such mechanisms explain why different BDCPs may have roles in autophagy; each BDCP is specific for the cell type or the cargo, but not necessarily specific for attaching to the autophagosome. Further elucidation of these mechanisms, preferably carrying out the same experiment on multiple BDCPs, and possibly using patients’ cells, may identify potential targets for therapy.
PMCID: PMC3761935  PMID: 23521701
7.  Linkage analysis of a large African family segregating stuttering suggests polygenic inheritance 
Human genetics  2012;132(4):385-396.
We describe a pedigree of 71 individuals from the Republic of Cameroon in which at least 33 individuals have a clinical diagnosis of stuttering. The high concentration of stuttering individuals suggests that the pedigree either contains a single highly penetrant gene variant or that assortative mating led to multiple stuttering-associated variants being transmitted in different parts of the pedigree. No single locus displayed significant linkage to stuttering in initial genome-wide scans with microsatellite and SNP markers. By dividing the pedigree into five sub-pedigrees, we found evidence for linkage to previously reported loci on 3q and 15q, and to novel loci on 2p, 3p, 14q, and a different region of 15q. Using the two-locus mode of Superlink, we showed that combining the recessive locus on 2p and a single-locus additive representation of the 15q loci is sufficient to achieve a two-locus score over 6 on the entire pedigree. For this 2p+15q analysis, we show LOD scores ranging from 4.69 to 6.57, and the scores are sensitive to which marker is chosen for 15q. Our findings provide strong evidence for linkage at several loci.
PMCID: PMC3600087  PMID: 23239121
8.  PSEUDOMARKER 2.0: efficient computation of likelihoods using NOMAD 
BMC Bioinformatics  2014;15:47.
PSEUDOMARKER is a software package that performs joint linkage and linkage disequilibrium analysis between a marker and a putative disease locus. A key feature of PSEUDOMARKER is that it can combine case-controls and pedigrees of varying structure into a single unified analysis. Thus it maximizes the full likelihood of the data over marker allele frequencies or conditional allele frequencies on disease and recombination fraction.
The new version 2.0 uses the software package NOMAD to maximize likelihoods, resulting in generally comparable or better optima with many fewer evaluations of the likelihood functions.
After being modified substantially to use modern optimization methods, PSEUDOMARKER version 2.0 is more robust and substantially faster than version 1.0. NOMAD may be useful in other bioinformatics problems where complex likelihood functions are optimized.
PMCID: PMC3932042  PMID: 24533837
9.  Animal Models of Human Granulocyte Diseases 
In vivo animal models have proven very useful to understand basic biological pathways of the immune system, a prerequisite for the development of innovate therapies. This manuscript addresses currently available models for defined human monogenetic defects of neutrophil granulocytes, including murine, zebrafish and larger mammalian species. Strengths and weaknesses of each system are summarized, and clinical investigators may thus be inspired to develop further lines of research to improve diagnosis and therapy by use of the appropriate animal model system.
PMCID: PMC3558998  PMID: 23351993
Chronic granulomatous disease; leukocyte adhesion deficiency; severe congenital neutropenia; neutrophils; mouse models; zebrafish models
10.  Analysis of the bereavement effect after the death of a spouse in the Amish: a population-based retrospective cohort study 
BMJ Open  2014;4(1):e003670.
This study investigates the association between bereavement and the mortality of a surviving spouse among Amish couples. We hypothesised that the bereavement effect would be relatively small in the Amish due to the unusually cohesive social structure of the Amish that might attenuate the loss of spousal support.
Population-based cohort study.
The USA.
10 892 Amish couples born during 1725–1900 located in Pennsylvania, Ohio and Indiana. All the participants are deceased.
Outcome measures
The survival time is ‘age’; event is ‘death’. Hazard ratios (HRs) of widowed individuals with respect to gender, age at widowhood, remarriage, the number of surviving children and time since bereavement.
We observed HRs for widowhood ranging from 1.06 to 1.26 over the study period (nearly all differences significant at p<0.05). Mortality risks tended to be higher in men than in women and in younger compared with older bereaved spouses. There were significantly increased mortality risks in widows and widowers who did not remarry. We observed a higher number of surviving children to be associated with increased mortality in men and women. Mortality risk following bereavement was higher in the first 6 months among men and women.
We conclude that bereavement effects remain apparent even in this socially cohesive Amish community. Remarriage is associated with a significant decrease in the mortality risk among Amish individuals. Contrary to results from previous studies, an increase in the number of surviving children was associated with decreased survival rate.
PMCID: PMC3902313  PMID: 24435888
Epidemiology; Geriatric Medicine; Mental Health; Public Health
11.  Digenic Inheritance in Medical Genetics 
Journal of medical genetics  2013;50(10):641-652.
Digenic inheritance (DI) is the simplest form of inheritance for genetically complex diseases. In contrast to the thousands of reports that mutations in single genes cause human diseases, there are only dozens of human disease phenotypes with evidence for DI in some pedigrees. The advent of high-throughput sequencing (HTS) has made it simpler to identify monogenic disease causes and could similarly simplify proving DI because one can simultaneously find mutations in two genes in the same sample. However, through 2012, I could find only one example of human DI in which HTS was used; in that example, HTS found only the second of the two genes. To explore the gap between expectation and reality, I tried to collect all examples of human DI with a narrow definition and characterize them according to the types of evidence collected and whether there has been replication. Two strong trends are that knowledge of candidate genes and knowledge of protein-protein interactions have been helpful in most published examples of human DI. In contrast, the positional method of genetic linkage analysis, has been mostly unsuccessful in identifying genes underlying human DI. Based on the empirical data, I suggest that combining HTS with growing networks of established protein-protein interactions may expedite future discoveries of human DI and strengthen the evidence for them.
PMCID: PMC3778050  PMID: 23785127
Digenic inheritance; protein-protein interactions; high-throughput sequencing; epistasis; facioscapulohumeral muscular dystrophy; deafness; Bardet-Biedl syndrome; nephrotic syndrome; hypogonadotropic hypogonadism; ciliopathies; genetic linkage analysis
12.  A radiation hybrid map of river buffalo (Bubalus bubalis) chromosome one (BBU1) 
Cytogenetic and genome research  2007;119(0):100-104.
The largest chromosome in the river buffalo karyotype, BBU1, is a submetacentric chromosome with reported homology between BBU1q and bovine chromosome 1 and between BBU1p and BTA27. We present the first radiation hybrid map of this chromosome containing 69 cattle derived markers including 48 coding genes, 17 microsatellites and four ESTs distributed in two linkage groups spanning a total length of 1330.1 cR5000. The RH map was constructed based on analysis of a recently developed river buffalo-hamster whole genome radiation hybrid (BBURH5000) panel. The retention frequency of individual markers across the panel ranged from 17.8% to 52.2%. With few exceptions, the order of markers within linkage groups is identical to the order established for corresponding cattle RH maps. The BBU1 map provides a starting point for comparison of gene order rearrangements between river buffalo chromosome 1 and its bovine homologs.
PMCID: PMC3780412  PMID: 18160788
13.  Loss-of-function mutations in the IL-21 receptor gene cause a primary immunodeficiency syndrome 
A primary immunodeficiency syndrome caused by loss-of-function mutations in the IL-21 receptor exhibits impaired B, T, and NK cell function.
Primary immunodeficiencies (PIDs) represent exquisite models for studying mechanisms of human host defense. In this study, we report on two unrelated kindreds, with two patients each, who had cryptosporidial infections associated with chronic cholangitis and liver disease. Using exome and candidate gene sequencing, we identified two distinct homozygous loss-of-function mutations in the interleukin-21 receptor gene (IL21R; c.G602T, p.Arg201Leu and c.240_245delCTGCCA, p.C81_H82del). The IL-21RArg201Leu mutation causes aberrant trafficking of the IL-21R to the plasma membrane, abrogates IL-21 ligand binding, and leads to defective phosphorylation of signal transducer and activator of transcription 1 (STAT1), STAT3, and STAT5. We observed impaired IL-21–induced proliferation and immunoglobulin class-switching in B cells, cytokine production in T cells, and NK cell cytotoxicity. Our study indicates that human IL-21R deficiency causes an immunodeficiency and highlights the need for early diagnosis and allogeneic hematopoietic stem cell transplantation in affected children.
PMCID: PMC3600901  PMID: 23440042
14.  A 1.5 Megabase Resolution Radiation Hybrid Map of the Cat Genome and Comparative Analysis with the Canine and Human Genomes 
Genomics  2006;89(2):189-196.
We report the construction of a 1.5 Mb resolution radiation hybrid map of the domestic cat genome. This new map includes novel microsatellite loci and markers derived from the 2X genome sequence that target previous gaps in the feline-human comparative map. Ninety-six percent of the 1793 cat markers we mapped have identifiable orthologues in the canine and human genome sequences. The updated autosomal and X chromosome comparative maps identify 152 cat-human and 134 cat-dog homologous synteny blocks. Comparative analysis shows the marked change in chromosomal evolution in the canid lineage relative to the felid lineage since divergence from their carnivoran ancestor. The canid lineage has a thirty-fold difference in the number of interchromosomal rearrangments relative to felids, while the felid lineage has primarily undergone intrachromosomal rearrangements. We have also refined the pseudoautosomal region and boundary in the cat and show that it is markedly longer than those of human or mouse. This improved RH comparative map provides a useful tool to facilitate positional cloning studies in the feline model.
PMCID: PMC3760348  PMID: 16997530
domestic cat; radiation hybrid map; canine genome; genome evolution; synteny; chromosome rearrangement
15.  Four Independent Mutations in the Feline Fibroblast Growth Factor 5 Gene Determine the Long-Haired Phenotype in Domestic Cats 
The Journal of heredity  2007;98(6):555-566.
To determine the genetic regulation of hair length in the domestic cat, a whole genome scan was performed in a multi-generational pedigree in which the long-haired phenotype was segregating. The two markers that demonstrated the greatest linkage to the long-haired trait (LOD≥6), flanked an estimated 10 Mb region on cat chromosome B1 containing the Fibroblast Growth Factor 5 gene (FGF5), a candidate gene implicated in regulating hair follicle growth cycle in other species. Sequence analyses of FGF5 in 26 cat breeds and two pedigrees of non-breed cats, revealed four separate mutations predicted to disrupt the biological activity of the FGF5 protein. Pedigree analyses demonstrated that different combinations of paired mutant FGF5 alleles segregated with the long-haired phenotype in an autosomal recessive manner. Association analyses of over 380 genotyped breed and non-breed cats were consistent with mutations in the FGF5 gene causing the long-haired phenotype in an autosomal recessive manner. In combination, these genomic approaches demonstrated that FGF5 is the major genetic determinant of hair length in the domestic cat.
PMCID: PMC3756544  PMID: 17767004
16.  Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations 
Bioinformatics  2013;29(13):i189-i198.
Motivation: Development and progression of solid tumors can be attributed to a process of mutations, which typically includes changes in the number of copies of genes or genomic regions. Although comparisons of cells within single tumors show extensive heterogeneity, recurring features of their evolutionary process may be discerned by comparing multiple regions or cells of a tumor. A useful source of data for studying likely progression of individual tumors is fluorescence in situ hybridization (FISH), which allows one to count copy numbers of several genes in hundreds of single cells. Novel algorithms for interpreting such data phylogenetically are needed, however, to reconstruct likely evolutionary trajectories from states of single cells and facilitate analysis of tumor evolution.
Results: In this article, we develop phylogenetic methods to infer likely models of tumor progression using FISH copy number data and apply them to a study of FISH data from two cancer types. Statistical analyses of topological characteristics of the tree-based model provide insights into likely tumor progression pathways consistent with the prior literature. Furthermore, tree statistics from the resulting phylogenies can be used as features for prediction methods. This results in improved accuracy, relative to unstructured gene copy number data, at predicting tumor state and future metastasis.
Availability: Source code for software that does FISH tree building (FISHtrees) and the data on cervical and breast cancer examined here are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3694640  PMID: 23812984
17.  Genome-wide Expression and Copy Number Analysis Identifies Driver Genes in Gingivobuccal Cancers 
Genes, chromosomes & cancer  2011;51(2):161-173.
The molecular mechanisms contributing to the development and progression of gingivobuccal complex (GBC) cancers–a sub-site of oral cancer, comprising the buccal mucosa, the gingivobuccal sulcus, the lower gingival region and the retromolar trigone-remain poorly understood. Identifying the GBC cancer-related gene expression signature and the driver genes residing on the altered chromosomal regions is critical for understanding the molecular basis of its pathogenesis. Genome-wide expression profiling of 27 GBC cancers with known chromosomal alterations was performed to reveal differentially expressed genes. Putative driver genes were identified by integrating copy number and gene expression data. A total of 315 genes were found differentially expressed (P≤0.05, logFC>2.0) of which eleven genes were validated by real-time quantitative reverse transcriptase-PCR (qRT-PCR) in tumors (n=57) and normal GBC tissues (n=18). Overexpression of LY6K, in chromosome band 8q24.3, was validated by immunohistochemical (IHC) analysis. We found that 78.5% (2,417/3,079) of the genes located in regions of recurrent chromosomal alterations show copy number dependent expression indicating that copy number alteration has a direct effect on global gene expression. The integrative analysis revealed BIRC3 in 11q22.2 as a candidate driver gene associated with poor clinical outcome. Our study identified previously unreported differentially expressed genes in a homogeneous subtype of oral cancer and the candidate driver genes that may contribute to the development and progression of the disease.
PMCID: PMC3233671  PMID: 22072328
18.  On the statistical properties of family-based association tests in datasets containing both pedigrees and unrelated case–control samples 
A common approach to genetic mapping of loci for complex diseases is to perform a genome-wide association study (GWAS) by analyzing a vast number of SNP markers in cohorts of unrelated cases and controls. A direct motivation for the case–control design is that unrelated, affected individuals can be easier to collect than large families with multiple affected persons in the Western world. Despite its higher potential power, investigators have not actively pursued family ascertainment in part because of a dearth of methods for analyzing such correlated data on a large scale. We examine the statistical properties of several commonly used family-based association tests, as to their performance using real-life mixtures of families and singletons taken from our own migraine and schizophrenia studies, as well as population-based data for a complex trait simulated with the evolutionary phenogenetic simulator, ForSim. In virtually every situation, the full likelihood-based methods in the PSEUDOMARKER program outperformed those implemented in FBAT, GENEHUNTER TDT, PLINK (family-based options), HRR/HHRR, QTDT, TRANSMIT, UNPHASED, MENDEL, and LAMP. We further show that GWAS is much more powerful when family samples are used rather than unrelateds, on a genotype-by-genotype basis.
PMCID: PMC3260916  PMID: 21934707
power; type-I error; genetic linkage analysis; linkage disequilibrium; family-based association; genome-wide association studies
19.  Living the Good Life? Mortality and Hospital Utilization Patterns in the Old Order Amish 
PLoS ONE  2012;7(12):e51560.
Lifespan increases observed in the United States and elsewhere throughout the developed world, have been attributed in part to improvements in medical care access and technology and to healthier lifestyles. To differentiate the relative contributions of these two factors, we have compared lifespan in the Old Order Amish (OOA), a population with historically low use of medical care, with that of Caucasian participants from the Framingham Heart Study (FHS), focusing on individuals who have reached at least age 30 years.
Analyses were based on 2,108 OOA individuals from the Lancaster County, PA community born between 1890 and 1921 and 5,079 FHS participants born approximately the same time. Vital status was ascertained on 96.9% of the OOA cohort through 2011 and through systematic follow-up of the FHS cohort. The lifespan part of the study included an enlargement of the Anabaptist Genealogy Database to 539,822 individuals, which will be of use in other studies of the Amish. Mortality comparisons revealed that OOA men experienced better longevity (p<0.001) and OOA women comparable longevity than their FHS counterparts.
We further documented all OOA hospital discharges in Lancaster County, PA during 2002–2004 and compared OOA discharge rates to Caucasian national rates obtained from the National Hospital Discharge Survey for the same time period. Both OOA men and women experienced markedly lower rates of hospital discharges than their non-Amish counterparts, despite the increased lifespan.
We speculate that lifestyle factors may predispose the OOA to greater longevity and perhaps to lesser hospital use. Identifying these factors, which might include behaviors such as lesser tobacco use, greater physical activity, and/or enhanced community assimilation, and assessing their transferability to non-Amish communities may produce significant gains to the public health.
PMCID: PMC3526600  PMID: 23284714
20.  Comparative analysis of genome sequences of the Th2 cytokine region of rabbit (Oryctolagus cuniculus) with those of nine different species 
The regions encoding the coordinately regulated Th2 cytokines IL5, IL4 and IL13 are located on chromosomes 5 of man and 11 of mouse. They have been intensively studied because these interleukins have protective roles in helminth infections, but may lead to detrimental effects such as allergy, asthma, and fibrosis in lung and liver. We added to previous studies by comparing sequences of syntenic regions on chromosome 3 of the rabbit (Oryctolagus cuniculus) genome OryCun 2.0 assembly from a tuberculosis-susceptible strain, with the corresponding region of ENCODE ENm002 from a normal rabbit as well as with 9 other mammalian species. We searched for rabbit transcription factor binding sites in putative promoter and other non-coding regions of IL5, RAD50, IL13 and IL4. Although we identified several differences between the two donor rabbits in coding and non-coding regions of potential functional significance, confirmation awaits additional sequencing of other rabbits.
PMCID: PMC3519392  PMID: 23239928
21.  PSEUDOMARKER: A Powerful Program for Joint Linkage and/or Linkage Disequilibrium Analysis on Mixtures of Singletons and Related Individuals 
Human Heredity  2011;71(4):256-266.
A decade ago, there was widespread enthusiasm for the prospects of genome-wide association studies to identify common variants related to common chronic diseases using samples of unrelated individuals from populations. Although technological advancements allow us to query more than a million SNPs across the genome at low cost, a disappointingly small fraction of the genetic portion of common disease etiology has been uncovered. This has led to the hypothesis that less frequent variants might be involved, stimulating a renaissance of the traditional approach of seeking genes using multiplex families from less diverse populations. However, by using the modern genotyping and sequencing technology, we can now look not just at linkage, but jointly at linkage and linkage disequilibrium (LD) in such samples. Software methods that can look simultaneously at linkage and LD in a powerful and robust manner have been lacking. Most algorithms cannot jointly analyze datasets involving families of varying structures in a statistically or computationally efficient manner. We have implemented previously proposed statistical algorithms in a user-friendly software package, PSEUDOMARKER. This paper is an announcement of this software package. We describe the motivation behind the approach, the statistical methods, and software, and we briefly demonstrate PSEUDOMARKER's advantages over other packages by example.
PMCID: PMC3190175  PMID: 21811076
Computer software; Family-based association; Genome-wide association; Likelihood methods; Linkage analysis; Linkage disequilibrium; Study design
22.  Genomic Aberrations in an African American Colorectal Cancer Cohort Reveals a MSI-Specific Profile and Chromosome X Amplification in Male Patients 
PLoS ONE  2012;7(8):e40392.
DNA aberrations that cause colorectal cancer (CRC) occur in multiple steps that involve microsatellite instability (MSI) and chromosomal instability (CIN). Herein, we studied CRCs from AA patients for their CIN and MSI status.
Experimental Design
Array CGH was performed on 30 AA colon tumors. The MSI status was established. The CGH data from AA were compared to published lists of 41 TSG and oncogenes in Caucasians and 68 cancer genes, proposed via systematic sequencing for somatic mutations in colon and breast tumors. The patient-by-patient CGH profiles were organized into a maximum parsimony cladogram to give insights into the tumors' aberrations lineage.
The CGH analysis revealed that CIN was independent of age, gender, stage or location. However, both the number and nature of aberrations seem to depend on the MSI status. MSI-H tumors clustered together in the cladogram. The chromosomes with the highest rates of CGH aberrations were 3, 5, 7, 8, 20 and X. Chromosome X was primarily amplified in male patients. A comparison with Caucasians revealed an overall similar aberration profile with few exceptions for the following genes; THRB, RAF1, LPL, DCC, XIST, PCNT, STS and genes on the 20q12-q13 cytoband. Among the 68 CAN genes, all showed some level of alteration in our cohort.
Chromosome X amplification in male patients with CRC merits follow-up. The observed CIN may play a distinctive role in CRC in AAs. The clustering of MSI-H tumors in global CGH data analysis suggests that chromosomal aberrations are not random.
PMCID: PMC3412863  PMID: 22879877
23.  Coordinated Conditional Simulation with SLINK and SUP of Many Markers Linked or Associated to a Trait in Large Pedigrees 
Human Heredity  2011;71(2):126-134.
Simulation of genotypes in pedigrees is an important tool to evaluate the power of a linkage or an association study and to assess the empirical significance of results. SLINK is a widely-used package for pedigree simulations, but its implementation has not previously been described in a published paper. SLINK was initially derived from the LINKAGE programs. Over the 20 years since its release, SLINK has been modified to incorporate faster algorithms, notably from the linkage analysis package FASTLINK, also derived from LINKAGE. While SLINK can simulate genotypes on pedigrees of high complexity, one limitation of SLINK, as with most methods based on peeling algorithms to evaluate pedigree likelihoods, is the small number of linked markers that can be generated. The software package SUP includes an elegant wrapper for SLINK that circumvents the limitation on number of markers by using descent markers generated by SLINK to simulate a much larger number of markers on the same chromosome, linked and possibly associated with a trait locus. We have released new coordinated versions of SLINK (3.0; available from and SUP (v090804; available from or that integrate the two software packages. Thereby, we have removed some of the previous limitations on the joint functionality of the programs, such as the number of founders in a pedigree. We review the history of SLINK and describe how SLINK and SUP are now coordinated to permit the simulation of large numbers of markers linked and possibly associated with a trait in large pedigrees.
PMCID: PMC3136384  PMID: 21734403
Coordinated conditional simulation; SLINK; SUP; Linkage study; Association study; Pedigree, large; Pedigree, complex
24.  Domain enhanced lookup time accelerated BLAST 
Biology Direct  2012;7:12.
BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i + 1. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch.
We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI’s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST.
DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the “Protein BLAST” link at
This article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.
PMCID: PMC3438057  PMID: 22510480
25.  Case-Control Study of Vitamin D, dickkopf homolog 1 (DKK1) Gene Methylation, VDR Gene Polymorphism and the Risk of Colon Adenoma in African Americans 
PLoS ONE  2011;6(10):e25314.
There are sparse data on genetic, epigenetic and vitamin D exposure in African Americans (AA) with colon polyp. Consequently, we evaluated serum 25(OH) D levels, vitamin D receptor (VDR) polymorphisms and the methylation status of the tumor suppressor gene dickkopf homolog 1 (DKK1) as risk factors for colon polyp in this population.
The case-control study consisted of 93 patients with colon polyp (cases) and 187 healthy individuals (controls) at Howard University Hospital. Serum levels of 25(OH)D (including D3, D2, and total) were measured by liquid chromatography-mass spectrometry. DNA analysis focused on 49 single nucleotide polymorphisms (SNPs) in the VDR gene. Promoter methylation analysis of DKK1 was also performed. The resulting data were processed in unadjusted and multivariable logistic regression analyses.
Cases and controls differed in vitamin D status (D3<50 nmol/L: Median of 35.5 in cases vs. 36.8 in controls nmol/L; P = 0.05). Low levels of 25(OH)D3 (<50 nmol/L) were observed in 86% of cases and 68% of controls and it was associated with higher risks of colon polyp (odds ratio of 2.7, 95% confidence interval 1.3–3.4). The SNP analysis showed no association between 46 VDR polymorphisms and colon polyp. The promoter of the DKK1 gene was unmethylated in 96% of the samples.
We found an inverse association between serum 25(OH)D3 and colon polyp in AAs. VDR SNPs and DKK1 methylation were not associated with colon polyp. Vitamin D levels may in part explain the higher incidence of polyp in AAs.
PMCID: PMC3192764  PMID: 22022386

Results 1-25 (58)