|Home | About | Journals | Submit | Contact Us | Français|
Short-chain dehydrogenases/reductases (SDR) constitute one of the largest enzyme superfamilies with presently over 46 000 members. In phylogenetic comparisons, members of this superfamily show early divergence where the majority have only low pair-wise sequence identity, although sharing common structural properties. The SDR enzymes are present in virtually all genomes investigated, and in humans over 70 SDR genes have been identified. In humans, these enzymes are involved in the metabolism of a large variety of compounds, including steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. It is now clear that SDRs represent one of the oldest protein families and contribute to essential functions and interactions of all forms of life. As this field continues to grow rapidly, a systematic nomenclature is essential for future annotation and reference purposes. A functional subdivision of the SDR superfamily into at least 200 SDR families based upon hidden Markov models forms a suitable foundation for such a nomenclature system, which we present in this paper using human SDRs as examples.
One of the largest protein superfamilies is that of short-chain dehydrogenases/reductases (SDR) and other enzymes , with over 46,000 members in sequence databases and over 300 crystal structures deposited in PDB today. The SDR superfamily encompasses a “classical” type (corresponding to Pfam  entry PF00106) and an “extended” type (including epimerases and dehydratases; Pfam PF01073 and PF01370) [3, 4]. In addition, transcriptional regulators such as fungal NmrA (Pfam PF05368) were shown to be structurally related to the SDR family and constitute a separate branch which we refer to as “atypical” SDRs [5, 6]. These enzymes were established as a separate and new group of oxidoreductase in the 1970/80's [7, 8], and the term SDR was coined in 1991 . The enzyme family is present in all domains of life, from simple organisms to higher eukaryotes , emphasising their versatility and fundamental importance for metabolic processes. A recent survey shows that about 25% of all dehydrogenases belong to the SDR family . SDR enzymes are NAD(P)(H)-dependent oxidoreductases which are distinct from the medium-chain dehydrogenase (MDR) and aldo-keto reductase (AKR) superfamilies [3, 4].
Members of the SDR superfamily show early divergence and have only low pairwise sequence identity, but share common sequence motifs that define the cofactor binding site (TGxxxGxG) and the catalytic tetrad (N-S-Y-K), even though variations on this general theme also exist [11, 12]. The three-dimensional SDR structures are clearly homologous with a common α/β-folding pattern characterised by a central β-sheet typical of a Rossmann-fold with helices on either side .
In humans over 70 SDR genes exist [13, 14]. Human SDRs have physiological roles in steroid hormone, prostaglandin and retinoid metabolism, and hence signalling , or metabolise lipids and xenobiotics . A growing number of single-nucleotide polymorphisms have been identified in SDR genes, and a variety of inherited metabolic diseases have as underlying cause genetic defects in SDR genes .
As the number of SDR sequences grows at an unprecedented pace, a systematic nomenclature is essential for annotation and reference purposes. For example, a recent metagenome analysis showed that classical and extended SDRs combined constitute at present by far the largest protein family . Given this large amount of sequence data, a nomenclature system would prevent either the same protein or gene being given multiple names or the same name being given to multiple proteins or genes. Recently, a functional subdivision of the SDR superfamily into at least 200 SDR families has been reported based on Hidden Markov Models (HMMs), using an iterative approach delineating a set of stable families, described in detail elsewhere . These SDR families form a suitable foundation for the nomenclature system that is presented in this work.
SDR proteins were extracted from the Uniprot database  and from Refseq , using a previously developed HMM  and the Pfam  profiles PF00106, PF01073, PF01370 and PF05368. SDR families were identified using a hidden Markov model approach. Initial HMMs were created based upon SDR clusters aligned using ClustalW . These HMMs were iteratively refined to achieve stable and specific models that could be used for classification and functional assignments of SDR members . In order to avoid bias of the models towards closely related proteins, the alignments were made non-redundant, so that no pair of sequences had more than 80% sequence identity. The iterative clustering process was automated using a series of shell scripts and programs developed in C. Elements of the large-scale computer analysis were carried out on the 805-node Hewlett-Packard DL140 cluster Neolith at the National Supercomputer Centre (Linköping, Sweden). Further details regarding this methodology is described elsewhere . The HMMs will be made available for inclusion in the Pfam  and/or InterPro  databases.
In the nomenclature scheme, each SDR family has been given a unique number from 1 upwards. The 48 known human SDR families have been allocated numbers from 1 to 48…… of hitherto identified members. Thus, the SDR families found in human and the most common families get the lowest numbers. At present, there are 48 human SDR families detected which are listed in Table 1.
After numbering of all human families, priority was given to SDR families having mammalian or other eukaryotic members. Here, families present in all kingdoms were given lower numbers than those present in only two and one kingdom. Next, SDR families that were present in both bacteria and archaea were numbered, according to decreasing size of the family. Finally, SDR families present in bacteria were numbered, also according to family size, beginning with the largest. There is no single family with only archaeal members. All non-human SDR families are listed on the SDR web page http://www.sdr-enzymes.org.
Since sequences of newly characterised genomes are reported every month, and the number of completed genomes is expected to grow considerably over the coming years, thanks to the advances in sequencing technologies, it is likely that the current SDR families will grow and that more SDR families will be identified over time. Thus, new SDR family numbers will be added in the future, and the nomenclature will need to be continuously updated. As a continuous source and service to the scientific community we will update and make the data available through the website indicated above.
We are well aware that even if the majority of the protein-coding regions of the human genome are now known, there might be new hitherto unknown SDR forms identified, which might lead to a higher number. However, this is an inevitable consequence of any nomenclature system and should not preclude the launch of a system that covers the majority of known proteins based on current knowledge.
There are two types of SDR enzymes with many members, and at least four types with fewer members. The two major types are denoted “Classical” and “Extended”  and these are clearly distinguished by subunit size and sequence patterns at the coenzyme binding site and at a segment N-terminally of the active site region. The currently four minor SDR types are denoted “Intermediate”, “Divergent”, “Complex” , and “Atypical”. The latter has SDR topology but no known enzymatic activity. Each of these types is characterised by type-specific sequence patterns at the coenzyme-binding site and/or the active site. In the nomenclature scheme, the family number is followed by one letter designating the SDR type, thus making it clear from a quick glance at the family designation to which type the SDR family belongs to, e.g.. SDR1E represents an SDR of the extended type. The letters used in this scheme are shown in Table 2.
The nomenclature scheme is extended by adding a number after the type letter, so that each member of every SDR family is given an individual designation, e.g.. SDR1E1. This is essential for tracking individual SDR members, since considerable confusion exists in the literature with multiple designations, aliases, names and abbreviations. Such individual numbering of enzyme members have since long been successfully implemented for other enzyme families, e.g.. aldo-keto reductases (AKRs)  and cytochrome P450s , and has been recognised as a key to unambiguous referencing. The numbering for the human SDR enzymes is given in Table 1. SDR forms with neighbouring gene locations were given adjacent numbers, e.g.. SDR16C2 and SDR16C3. An additional P after the number denotes a pseudogene, e.g.. SDR14E1P.
The new SDR nomenclature is gene based. Thus, all splice variants derived from the same gene hold the same main number, but each splice variant is distinguished by a sub-number, separated from the main number by a dash, e.g.. SDR15C1-1, SDR15C1-2. Similarly, polymorphic variants and SNPs are assigned using an asterisk, resulting in a nomenclature such as SDR11E1*1, SDR11E1*2, SDR11E1*3. These are numbered according to the order of the corresponding refSNP (rs) numbers. A continuously updated list of these variants will be available at the SDR web page.
The new nomenclature system is strictly hierarchical so that the designations can be shortened at various stages, but still is clearly informative.
|SDR15C1-1||one splice variant of a particular SDR member from one species|
|SDR15C1||a particular SDR member from one species|
|SDR15C||one specific SDR family of the classical type|
The nomenclature scheme outlined in this paper is part of an international effort to systematise and to facilitate all aspects of SDR related research. This will be described in detail on the web page http://www.sdr-enzymes.org, where continuous updates will be available. In addition, various search functions will also be available here, e.g.. to find the SDR name using an amino acid sequence as input or vice versa. This nomenclature system has been presented, discussed and endorsed on the occasions of the VII European Symposium of The Protein Society 2007, the Endocrine Society meeting (ENDO 2007) and the 14th Carbonyl Metabolism meeting (2008).
Karolinska Institutet, Linköping University and the Carl Trygger Foundation are acknowledged for financial support. This project was supported by the Deutsche Forschungsgemeinschaft (MA 1704/5-1). The Structural Genomics Consortium is a registered charity (number 1097737) that receives funds from the Canadian Institutes for Health Research, the Canadian Foundation for Innovation, Genome Canada through the Ontario Genomics Institute, GlaxoSmithKline, Karolinska Institutet, the Knut and Alice Wallenberg Foundation, the Ontario Innovation Trust, the Ontario Ministry for Research and Innovation, Merck & Co., Inc., the Novartis Research Foundation, the Swedish Agency for Innovation Systems, the Swedish Foundation for Strategic Research and the Wellcome Trust.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.