Search tips
Search criteria

Results 1-3 (3)

Clipboard (0)
Year of Publication
Document Types
1.  The multiple personalities of Watson and Crick strands 
Biology Direct  2011;6:7.
In genetics it is customary to refer to double-stranded DNA as containing a "Watson strand" and a "Crick strand." However, there seems to be no consensus in the literature on the exact meaning of these two terms, and the many usages contradict one another as well as the original definition. Here, we review the history of the terminology and suggest retaining a single sense that is currently the most useful and consistent.
The Saccharomyces Genome Database defines the Watson strand as the strand which has its 5'-end at the short-arm telomere and the Crick strand as its complement. The Watson strand is always used as the reference strand in their database. Using this as the basis of our standard, we recommend that Watson and Crick strand terminology only be used in the context of genomics. When possible, the centromere or other genomic feature should be used as a reference point, dividing the chromosome into two arms of unequal lengths. Under our proposal, the Watson strand is standardized as the strand whose 5'-end is on the short arm of the chromosome, and the Crick strand as the one whose 5'-end is on the long arm. Furthermore, the Watson strand should be retained as the reference (plus) strand in a genomic database. This usage not only makes the determination of Watson and Crick unambiguous, but also allows unambiguous selection of reference stands for genomics.
This article was reviewed by John M. Logsdon, Igor B. Rogozin (nominated by Andrey Rzhetsky), and William Martin.
PMCID: PMC3055211  PMID: 21303550
2.  'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve 
Biology Direct  2010;5:10.
The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a "genome order index," was proposed, which is defined as S = a2+ c2+t2+g2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/ contains almost all points corresponding to various genomes, implying that S
In this work, we studied the basic properties of the Z-curve using the "genome order index" as a case study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome.
The "genome order index" S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively.
This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich (nominated by Itai Yanai).
PMCID: PMC2841071  PMID: 20158921
Biology Direct  2008;3:36.
Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes.
We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content.
Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a null model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes.
This article was reviewed by Bill Martin, Itai Yanai, and Mikhail Gelfand.
PMCID: PMC2542354  PMID: 18717987

Results 1-3 (3)