1.  Intra-abdominal Infections: The Role of Anaerobes, Enterococci, Fungi, and Multidrug-Resistant Organisms 
Open Forum Infectious Diseases  2016;3(4):ofw232.
Intra-abdominal infections (IAI) constitute a common reason for hospitalization. However, there is lack of standardization in empiric management of (1) anaerobes, (2) enterococci, (3) fungi, and (4) multidrug-resistant organisms (MDRO). The recommendation is to institute empiric coverage for some of these organisms in “high-risk community-acquired” or in “healthcare-associated” infections (HCAI), but exact definitions are not provided.
Epidemiological study of IAI was conducted at Assaf Harofeh Medical Center (May–November 2013). Logistic and Cox regressions were used to analyze predictors and outcomes of IAI, respectively. The performances of established HCAI definitions to predict MDRO-IAI upon admission were calculated by receiver operating characteristic (ROC) curve analyses.
After reviewing 8219 discharge notes, 253 consecutive patients were enrolled (43 [17%] children). There were 116 patients with appendicitis, 93 biliary infections, and 17 with diverticulitis. Cultures were obtained from 88 patients (35%), and 44 of them (50%) yielded a microbiologically confirmed IAI: 9% fungal, 11% enterococcal, 25% anaerobic, and 34% MDRO. Eighty percent of MDRO-IAIs were present upon admission, but the area under the ROC curve of predicting MDRO-IAI upon admission by the commonly used HCAI definitions were low (0.73 and 0.69). Independent predictors for MDRO-IAI were advanced age and active malignancy.
Multidrug-resistant organism-IAIs are common, and empiric broad-spectrum coverage is important among elderly patients with active malignancy, even if the infection onset was outside the hospital setting, regardless of current HCAI definitions. Outcomes analyses suggest that empiric regimens should routinely contain antianaerobes (except for biliary IAI); however, empiric antienterococcal or antifungals regimens are seldom needed.
PMCID: PMC5170494  PMID: 28018930
biliary infection; MDRO; epidemiology; nosocomial infections; surgical infection.
2.  Genetic variants in Cell Adhesion Molecule 1 (CADM1): a validation study of a novel endothelial cell venous thrombosis risk factor 
Thrombosis research  2014;134(6):1186-1192.
In a protein C deficient family, we recently identified a candidate gene, CADM1, which interacted with protein C deficiency in increasing the risk of venous thrombosis (VT). This study aimed to determine whether CADM1 variants also interact with protein C pathway abnormalities in increasing VT risk outside this family.
Materials and methods
We genotyped over 300 CADM1 variants in the population-based MEGA case-control study. We compared VT risks between cases with low protein C activity (n=194), low protein S levels (n=23), high factor VIII activity (n=165) or factor V Leiden carriers (n=580), and all 4004 controls. Positive associations were repeated in all 3496 cases and 4004 controls.
We found 22 variants which were associated with VT in one of the protein C pathway risk groups. After mutual adjustment, six variants remained associated with VT. The strongest evidence was found for rs220842 and rs11608105. For rs220842, the odds ratio (OR) for VT was 3.2 (95% CI 1.2–9.0) for cases with high factor VIII activity compared with controls. In addition, this variant was associated with an increased risk of VT in the overall study population (OR: 1.5, 95% CI 1.0–2.2). The other variant, rs11608105, was not associated with VT in the overall study population (OR: 1.0, 95% CI 0.8–1.1), but showed a strong effect on VT risk (OR: 21, 95% CI 5.1–88) when combined with low protein C or S levels.
In a population-based association study, we confirm a role for CADM1 variants in increasing the risk of VT by interaction with protein C pathway abnormalities.
PMCID: PMC4252856  PMID: 25306186
venous thrombosis; genetic variation; CADM1; protein C pathway
3.  Gain-of-Function ADCY5 Mutations in Familial Dyskinesia with Facial Myokymia 
Annals of neurology  2014;75(4):542-549.
To identify the cause of childhood onset involuntary paroxysmal choreiform and dystonic movements in 2 unrelated sporadic cases and to investigate the functional effect of missense mutations in adenylyl cyclase 5 (ADCY5) in sporadic and inherited cases of autosomal dominant familial dyskinesia with facial myokymia (FDFM).
Whole exome sequencing was performed on 2 parent–child trios. The effect of mutations in ADCY5 was studied by measurement of cyclic adenosine monophosphate (cAMP) accumulation under stimulatory and inhibitory conditions.
The same de novo mutation (c.1252C>T, p.R418W) in ADCY5 was found in both studied cases. An inherited missense mutation (c.2176G>A, p.A726T) in ADCY5 was previously reported in a family with FDFM. The significant phenotypic overlap with FDFM was recognized in both cases only after discovery of the molecular link. The inherited mutation in the FDFM family and the recurrent de novo mutation affect residues in different protein domains, the first cytoplasmic domain and the first membrane-spanning domain, respectively. Functional studies revealed a statistically significant increase in β-receptor agonist-stimulated intracellular cAMP consistent with an increase in adenylyl cyclase activity for both mutants relative to wild-type protein, indicative of a gain-of-function effect.
FDFM is likely caused by gain-of-function mutations in different domains of ADCY5—the first definitive link between adenylyl cyclase mutation and human disease. We have illustrated the power of hypothesis-free exome sequencing in establishing diagnoses in rare disorders with complex and variable phenotype. Mutations in ADCY5 should be considered in patients with undiagnosed complex movement disorders even in the absence of a family history.
PMCID: PMC4457323  PMID: 24700542
4.  Fusion Transcript Discovery in Formalin-Fixed Paraffin-Embedded Human Breast Cancer Tissues Reveals a Link to Tumor Progression 
PLoS ONE  2014;9(4):e94202.
The identification of gene fusions promises to play an important role in personalized cancer treatment decisions. Many rare gene fusion events have been identified in fresh frozen solid tumors from common cancers employing next-generation sequencing technology. However the ability to detect transcripts from gene fusions in RNA isolated from formalin-fixed paraffin-embedded (FFPE) tumor tissues, which exist in very large sample repositories for which disease outcome is known, is still limited due to the low complexity of FFPE libraries and the lack of appropriate bioinformatics methods. We sought to develop a bioinformatics method, named gFuse, to detect fusion transcripts in FFPE tumor tissues. An integrated, cohort based strategy has been used in gFuse to examine single-end 50 base pair (bp) reads generated from FFPE RNA-Sequencing (RNA-Seq) datasets employing two breast cancer cohorts of 136 and 76 patients. In total, 118 fusion events were detected transcriptome-wide at base-pair resolution across the 212 samples. We selected 77 candidate fusions based on their biological relevance to cancer and supported 61% of these using TaqMan assays. Direct sequencing of 19 of the fusion sequences identified by TaqMan confirmed them. Three unique fused gene pairs were recurrent across the 212 patients with 6, 3, 2 individuals harboring these fusions respectively. We show here that a high frequency of fusion transcripts detected at the whole transcriptome level correlates with poor outcome (P<0.0005) in human breast cancer patients. This study demonstrates the ability to detect fusion transcripts as biomarkers from archival FFPE tissues, and the potential prognostic value of the fusion transcripts detected.
PMCID: PMC3984112  PMID: 24727804
5.  Rapid deep sequencing of patient-derived HIV with ion semiconductor technology 
Journal of virological methods  2013;189(1):232-234.
The development of next-generation sequencing technologies has facilitated the study of HIV drug resistance evolution. However, the high capacity and per-run cost of many sequencers is not ideal for viral sequencing unless many samples are analyzed simultaneously. Ion semiconductor sequencing has recently emerged as a flexible, lower-cost alternative with short runtime. This paper describes the use of Ion Torrent devices for deep sequencing of drug resistant HIV samples. High levels of sequencing coverage were obtained in HIV Gag and protease, allowing the detection of mutations at low frequencies.
PMCID: PMC3608812  PMID: 23384677
HIV-1 drug resistance; deep sequencing; HIV protease; HIV Gag; Ion Torrent
6.  Distinct patterns of somatic alterations in a lymphoblastoid and a tumor genome derived from the same individual 
Nucleic Acids Research  2011;39(14):6056-6068.
Although patterns of somatic alterations have been reported for tumor genomes, little is known on how they compare with alterations present in non-tumor genomes. A comparison of the two would be crucial to better characterize the genetic alterations driving tumorigenesis. We sequenced the genomes of a lymphoblastoid (HCC1954BL) and a breast tumor (HCC1954) cell line derived from the same patient and compared the somatic alterations present in both. The lymphoblastoid genome presents a comparable number and similar spectrum of nucleotide substitutions to that found in the tumor genome. However, a significant difference in the ratio of non-synonymous to synonymous substitutions was observed between both genomes (P = 0.031). Protein–protein interaction analysis revealed that mutations in the tumor genome preferentially affect hub-genes (P = 0.0017) and are co-selected to present synergistic functions (P < 0.0001). KEGG analysis showed that in the tumor genome most mutated genes were organized into signaling pathways related to tumorigenesis. No such organization or synergy was observed in the lymphoblastoid genome. Our results indicate that endogenous mutagens and replication errors can generate the overall number of mutations required to drive tumorigenesis and that it is the combination rather than the frequency of mutations that is crucial to complete tumorigenic transformation.
PMCID: PMC3152357  PMID: 21493686
7.  Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing 
Genome Biology  2010;11(11):R114.
To identify potential tumor suppressor genes, genome-wide data from exome and transcriptome sequencing were combined to search for genes with loss of heterozygosity and allele-specific expression. The analysis was conducted on the breast cancer cell line HCC1954, and a lymphoblast cell line from the same individual, HCC1954BL.
By comparing exome sequences from the two cell lines, we identified loss of heterozygosity events at 403 genes in HCC1954 and at one gene in HCC1954BL. The combination of exome and transcriptome sequence data also revealed 86 and 50 genes with allele specific expression events in HCC1954 and HCC1954BL, which comprise 5.4% and 2.6% of genes surveyed, respectively. Many of these genes identified by loss of heterozygosity and allele-specific expression are known or putative tumor suppressor genes, such as BRCA1, MSH3 and SETX, which participate in DNA repair pathways.
Our results demonstrate that the combined application of high throughput sequencing to exome and allele-specific transcriptome analysis can reveal genes with known tumor suppressor characteristics, and a shortlist of novel candidates for the study of tumor suppressor activities.
PMCID: PMC3156953  PMID: 21108794
8.  Protective effect of an acute oral loading dose of trimetazidine on myocardial injury following percutaneous coronary intervention 
Heart  2007;93(6):703-707.
To evaluate the effect of pre‐procedural acute oral administration of trimetazidine (TMZ) on percutaneous coronary intervention (PCI)‐induced myocardial injury.
Single‐centre, prospective, randomised evaluation study.
Patients with stable angina pectoris and single‐vessel disease undergoing PCI.
582 patients were prospectively randomised. Patients who underwent more than one inflation during PCI were excluded, resulting in 266 patients randomly assigned to 2 groups.
Patients were randomly assigned to receive or not an acute loading dose of 60 mg of TMZ prior to intervention.
Main outcome
The frequency and the increase in the level of cardiac troponin Ic (cTnI) after successful PCI. cTnI levels were measured before and 6, 12, 18 and 24 h after PCI.
136 patients were assigned to the TMZ group and 130 to the control group. Although no statistically significant difference was observed in the frequency of cTnI increase between the two groups, post‐procedural cTnI levels were significantly reduced in the TMZ group at all time points (6 h: mean (SD) 4.2 (0.8) vs 1.7 (0.2), p<0.001; 12 h: 5.5 (1.5) vs 2.3 (0.4), p<0.001; 18 h: 9 (2.3) vs 3 (0.5), p<0.001; and 24 h: 3.2 (1.2) vs 1 (0.5), p<0.001). Moreover, the total amount of cTnI released after PCI, as assessed by the area under the curve of serial measurement, was significantly reduced in the TMZ group (p<0.05).
Pre‐procedural acute oral TMZ administration significantly reduces PCI‐induced myocardial infarction.
PMCID: PMC1955183  PMID: 17488771
9.  VARiD: A variation detection framework for color-space and letter-space platforms 
Bioinformatics  2010;26(12):i343-i349.
Motivation: High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together.
Results: We present VARiD—a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined.
Availability: The toolset is freely available at
PMCID: PMC2881369  PMID: 20529926
10.  Towards a comprehensive structural variation map of an individual human genome 
Genome Biology  2010;11(5):R52.
A comprehensive map of structural variation in the human genome provides a reference dataset for analyses of future personal genomes.
Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.
We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association.
Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.
PMCID: PMC2898065  PMID: 20482838
11.  Expression Profiling of the Ovarian Surface Kinome Reveals Candidate Genes for Early Neoplastic Changes12 
Translational Oncology  2009;2(4):341-349.
OBJECTIVES: We tested the hypothesis that co-coordinated up-regulation or down-regulation of several ovarian cell surface kinases may provide clues for better understanding of the disease and help in rational design of therapeutic targets. STUDY DESIGN: We compared the expression signature of 69 surface kinases in normal ovarian surface epithelial cells (OSE), with OSE from patients at high risk and with ovarian cancer. RESULTS: Seven surface kinases, ALK, EPHA5, EPHB1, ERBB4, INSRR, PTK, and TGFβR1 displayed a distinctive linear trend in expression from normal, highrisk, and malignant epithelium. We confirmed these results using semiquantitative reverse transcription-polymerase chain reaction and tissue array of 202 ovarian cancer samples. A strong correlate was shown between disease-free survival and the expression of ERBB4. DNA sequencing revealed two novel mutations in ERBB4 in two cancer samples. CONCLUSIONS: A distinct subset of the ovarian surface kinome is altered in the transition from high risk to invasive cancer and genetic mutation is not a dominant mechanism for these modifications. These results have significant implications for early detection and targeted therapeutic approaches for women at high risk of developing ovarian cancer.
PMCID: PMC2781076  PMID: 19956396
12.  Superoxide Dismutase 3 Polymorphism Associated with Reduced Lung Function in Two Large Populations 
Rationale: Superoxide dismutase (SOD) 3 inhibits oxidative fragmentation of lung matrix components collagen I, hyaluronan, and heparan sulfate. Inherited change in SOD3 expression or function could affect lung matrix homeostasis and influence pulmonary function.
Objectives: To identify novel SOD3 polymorphisms that are associated with lung function or chronic obstructive pulmonary disease (COPD).
Methods: Resequencing of 182 individuals identified two novel polymorphisms, E1 (rs8192287) and I1 (rs8192288), in a conserved region of the SOD3 gene of potential relationship to lung function. We next genotyped 9,093 individuals from the Copenhagen City Heart Study for the polymorphisms and recorded spirometry, and admissions and deaths due to COPD during 26-year follow-up. Finally, we validated our findings in a cross-sectional analysis of 35,635 individuals from the Copenhagen General Population Study.
Measurements and Main Results: Genotyping the Copenhagen City Heart Study identified 35 E1/I1 homozygotes, 1,050 heterozygotes, and 8,008 noncarriers (Hardy-Weinberg equilibrium: P = 0.93). Using quadruple lung function measurements, we found that E1/I1 homozygotes had 7% lower FVC % predicted (P = 0.006) and 4% lower FEV1 % predicted (P = 0.12) compared with noncarriers. In the Copenhagen General Population Study, E1/I1 homozygotes also had lower FVC % predicted than noncarriers (P = 0.03), confirming an association between E1/I1 genotype and reduced lung function. E1/I1 homozygotes had adjusted hazard ratios for COPD hospitalization and COPD mortality of 2.5 (95% confidence interval, 1.0–5.9) and 3.7 (95% confidence interval, 0.9–15), respectively; the results were independent of influence from the R213G allele of the SOD3 gene.
Conclusions: We identified two novel polymorphisms in a conserved region of the SOD3 gene and show that individuals that are homozygous for these polymorphisms have reduced FVC % predicted in two large, population-based studies.
PMCID: PMC2577726  PMID: 18703790
superoxide dismutase 3; genetics; chronic obstructive pulmonary disease; oxidative stress
13.  Evaluation of next generation sequencing platforms for population targeted sequencing studies 
Genome Biology  2009;10(3):R32.
Human sequence generated from three next-generation sequencing platforms reveals systematic variability in sequence coverage due to local sequence characteristics.
Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.
Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.
Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies.
PMCID: PMC2691003  PMID: 19327155
14.  The HuRef Browser: a web resource for individual human genomics 
Nucleic Acids Research  2008;37(Database issue):D1018-D1024.
The HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional annotations on either genome in the comparative view. Demonstrated here are some potential uses of the browser to enable a better understanding of individual human genetic variation. The browser provides full access to the underlying reads with sequence and quality information, the genome assembly and the evidence supporting the identification of DNA polymorphisms. The HuRef Browser is a unique and versatile tool for browsing genome assemblies and studying individual human sequence variation in a diploid context. The browser is available online at
PMCID: PMC2686481  PMID: 19036787
15.  Genetic Variation in an Individual Human Exome 
PLoS Genetics  2008;4(8):e1000160.
There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's ‘exome,’ which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the ∼12,500 variants that affect the protein coding portion of an individual's genome. We identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which ∼15–20% are rare in the human population. We predict ∼1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the ∼700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of ∼12,500 nonsilent coding variants by ∼8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation.
Author Summary
Characterizing the functional variation in an individual is an important step towards the era of personalized medicine. Protein-coding exons are thought to be especially enriched in functional variation. In 2007, we published the genome sequence of J. Craig Venter. Here we analyze the genetic variation of J. Craig Venter's exome, focusing on variation in the coding portion of genes, which is thought to contribute significantly to a person's physical make-up. We survey ∼12,500 nonsilent coding variants and, by applying multiple bioinformatic approaches, we reduce the number of potential phenotypic variants by ∼8-fold. Our analysis provides a snapshot of the current state of personalized genomics. We find that <1% of variants are linked to any known phenotypes; this demonstrates the dearth of scientific knowledge for phenotype-genotype associations. However, ∼80% of an individual's nonsynonymous variants are commonly found in the human population and, because phenotypic associations to common variants will be elucidated via genome-wide association studies over the next few years, the capability to interpret personalized genomes will expand and evolve. As sequencing of individual genomes becomes more prevalent, the bioinformatic approaches we present in this study can be used as a paradigm to pursue the study of protein-coding variants for the genomes of many individuals.
PMCID: PMC2493042  PMID: 18704161
16.  Novel computational methods for increasing PCR primer design effectiveness in directed sequencing 
BMC Bioinformatics  2008;9:191.
Polymerase chain reaction (PCR) is used in directed sequencing for the discovery of novel polymorphisms. As the first step in PCR directed sequencing, effective PCR primer design is crucial for obtaining high-quality sequence data for target regions. Since current computational primer design tools are not fully tuned with stable underlying laboratory protocols, researchers may still be forced to iteratively optimize protocols for failed amplifications after the primers have been ordered. Furthermore, potentially identifiable factors which contribute to PCR failures have yet to be elucidated. This inefficient approach to primer design is further intensified in a high-throughput laboratory, where hundreds of genes may be targeted in one experiment.
We have developed a fully integrated computational PCR primer design pipeline that plays a key role in our high-throughput directed sequencing pipeline. Investigators may specify target regions defined through a rich set of descriptors, such as Ensembl accessions and arbitrary genomic coordinates. Primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the specified target regions. As part of the tiling process, primer pairs are computationally screened to meet the criteria for success with one of two PCR amplification protocols. In the process of improving our sequencing success rate, which currently exceeds 95% for exons, we have discovered novel and accurate computational methods capable of identifying primers that may lead to PCR failures. We reveal the laboratory protocols and their associated, empirically determined computational parameters, as well as describe the novel computational methods which may benefit others in future primer design research.
The high-throughput PCR primer design pipeline has been very successful in providing the basis for high-quality directed sequencing results and for minimizing costs associated with labor and reprocessing. The modular architecture of the primer design software has made it possible to readily integrate additional primer critique tests based on iterative feedback from the laboratory. As a result, the primer design software, coupled with the laboratory protocols, serves as a powerful tool for low and high-throughput primer design to enable successful directed sequencing.
PMCID: PMC2396641  PMID: 18405373
17.  The Diploid Genome Sequence of an Individual Human 
PLoS Biology  2007;5(10):e254.
Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
Author Summary
We have generated an independently assembled diploid human genomic DNA sequence from both chromosomes of a single individual (J. Craig Venter). Our approach, based on whole-genome shotgun sequencing and using enhanced genome assembly strategies and software, generated an assembled genome over half of which is represented in large diploid segments (>200 kilobases), enabling study of the diploid genome. Comparison with previous reference human genome sequences, which were composites comprising multiple humans, revealed that the majority of genomic alterations are the well-studied class of variants based on single nucleotides (SNPs). However, the results also reveal that lesser-studied genomic variants, insertions and deletions, while comprising a minority (22%) of genomic variation events, actually account for almost 74% of variant nucleotides. Inclusion of insertion and deletion genetic variation into our estimates of interchromosomal difference reveals that only 99.5% similarity exists between the two chromosomal copies of an individual and that genetic variation between two individuals is as much as five times higher than previously estimated. The existence of a well-characterized diploid human genome sequence provides a starting point for future individual genome comparisons and enables the emerging era of individualized genomic information.
Comparison of the DNA sequence of an individual human from the reference sequence reveals a surprising amount of difference.
PMCID: PMC1964779  PMID: 17803354
18.  Predicting transcription factor synergism 
Nucleic Acids Research  2002;30(19):4278-4284.
Transcriptional regulation is mediated by a battery of transcription factor (TF) proteins, that form complexes involving protein–protein and protein–DNA interactions. Individual TFs bind to their cognate cis-elements or transcription factor-binding sites (TFBS). TFBS are organized on the DNA proximal to the gene in groups confined to a few hundred base pair regions. These groups are referred to as modules. Various modules work together to provide the combinatorial regulation of gene transcription in response to various developmental and environmental conditions. The sets of modules constitute a promoter model. Determining the TFs that preferentially work in concert as part of a module is an essential component of understanding transcriptional regulation. The TFs that act synergistically in such a fashion are likely to have their cis-elements co-localized on the genome at specific distances apart. We exploit this notion to predict TF pairs that are likely to be part of a transcriptional module on the human genome sequence. The computational method is validated statistically, using known interacting pairs extracted from the literature. There are 251 TFBS pairs up to 50 bp apart and 70 TFBS pairs up to 200 bp apart that score higher than any of the known synergistic pairs. Further investigation of 50 pairs randomly selected from each of these two sets using PubMed queries provided additional supporting evidence from the existing biological literature suggesting TF synergism for these novel pairs.
PMCID: PMC140535  PMID: 12364607
19.  A study of the medical causes of absence from duty aboard South African merchant ships 
Levy, S. (1972).Brit. J. industr. Med.,29, 196-200. A study of the medical causes of absence from duty aboard South African merchant ships. Over a period of four and a half years 556 instances occurred in which crew members were put off duty on medical grounds for a period of four or more days. Illness accounted for 297 cases whereas accidents were responsible for 259 cases. Illiness and accident cases were off duty for an average period of 28 and 34 days respectively. Slightly more working days were thus lost on account of accidents. Admission to hospital was required in 90% of illnesses compared with only 36% of accidents.
Appendicitis (of questionable veracity), peptic ulceration, and psychiatric disturbances were among the more common causes of incapacity.
Forty percent of accidents occurred on deck and in the cargo holds. Fractures occurred most commonly in the upper limbs, especially the hand. Eleven percent of the accidents occurred ashore, mostly due to assault.
Further study is required to elucidate whether the emotional problems encountered are brought to sea by the personnel or are a result of life on board ship. The high incidence of accidents stresses the fact that a sea career is one of the more dangerous occupations.
PMCID: PMC1009399  PMID: 5067298

