PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Chem Biol Interact. Author manuscript; available in PMC Jul 5, 2010.
Published in final edited form as:
PMCID: PMC2896744
NIHMSID: NIHMS196511
The SDR (Short-Chain Dehydrogenase/Reductase and Related Enzymes) Nomenclature Initiative
Bengt Persson,1,2,3 James E. Bray,4 Elspeth Bruford,5 Stephen L. Dellaporta,6 Angelo D. Favia,7 Roser Gonzalez Duarte,8 Hans Jörnvall,9 Yvonne Kallberg,1,2 Kathryn L. Kavanagh,4 Natalia Kedishvili,10 Michael Kisiela,11 Edmund Maser,11 Rebekka Mindnich,12 Sandra Orchard,7 Trevor M. Penning,12 Janet M. Thornton,7 Jerzy Adamski,13 and Udo Oppermann4,14
1IFM Bioinformatics, Linköping University, S-58183 Linköping, Sweden
2Dept of Cell and Molecular Biology (CMB), Karolinska Institutet, S-17177 Stockholm, Sweden
3National Supercomputer Centre (NSC), Linköping University, S-58183 Linköping, Sweden
4The Structural Genomics Consortium, University of Oxford, Oxford OX3 7LD, United Kingdom
5HUGO Gene Nomenclature Committee, University College London, London NW1 2HE, United Kingdom
6Yale University, Department of Molecular, Cellular and Developmental Biology, 165 Prospect Street, New Haven, CT 06520-8104, USA
7European Molecular Biology Laboratory–European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
8Departament de Genetica, Facultat de Biologia, Universitat de Barcelona, Spain
9Department of Medical Biochemistry and Biophysics, Karolinska Institutet, S-17177 Stockholm, Sweden
10Department of Biochemistry and Molecular Genetics, Schools of Medicine and Dentistry, University of Alabama at Birmingham, Birmingham, AL 35294, USA
11Institute of Toxicology and Pharmacology for Natural Scientists, University Medical School Schleswig-Holstein, Campus Kiel, D-24105 Kiel, Germany
12Center of Excellence in Environmental Toxicology, Department of Pharmacology, University of Pennsylvania, Philadelphia P1 19104-6084, USA
13Helmholtz Zentrum München, German Research Center for Environmental Health, Institute for Experimental Genetics, Genome Analysis Centre, D-85764 Neuherberg, Germany
14Botnar Research Center, Oxford Biomedical Research Unit, OX3 7LD, UK
Correspondence to: Bengt Persson and Udo Oppermann, bpn/at/ifm.liu.se, udo.oppermann/at/sgc.ox.ac.uk
Short-chain dehydrogenases/reductases (SDR) constitute one of the largest enzyme superfamilies with presently over 46 000 members. In phylogenetic comparisons, members of this superfamily show early divergence where the majority have only low pair-wise sequence identity, although sharing common structural properties. The SDR enzymes are present in virtually all genomes investigated, and in humans over 70 SDR genes have been identified. In humans, these enzymes are involved in the metabolism of a large variety of compounds, including steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. It is now clear that SDRs represent one of the oldest protein families and contribute to essential functions and interactions of all forms of life. As this field continues to grow rapidly, a systematic nomenclature is essential for future annotation and reference purposes. A functional subdivision of the SDR superfamily into at least 200 SDR families based upon hidden Markov models forms a suitable foundation for such a nomenclature system, which we present in this paper using human SDRs as examples.
Keywords: SDR, enzymes, nomenclature, bioinformatics, hidden Markov models
One of the largest protein superfamilies is that of short-chain dehydrogenases/reductases (SDR) and other enzymes [1], with over 46,000 members in sequence databases and over 300 crystal structures deposited in PDB today. The SDR superfamily encompasses a “classical” type (corresponding to Pfam [2] entry PF00106) and an “extended” type (including epimerases and dehydratases; Pfam PF01073 and PF01370) [3, 4]. In addition, transcriptional regulators such as fungal NmrA (Pfam PF05368) were shown to be structurally related to the SDR family and constitute a separate branch which we refer to as “atypical” SDRs [5, 6]. These enzymes were established as a separate and new group of oxidoreductase in the 1970/80's [7, 8], and the term SDR was coined in 1991 [9]. The enzyme family is present in all domains of life, from simple organisms to higher eukaryotes [10], emphasising their versatility and fundamental importance for metabolic processes. A recent survey shows that about 25% of all dehydrogenases belong to the SDR family [1]. SDR enzymes are NAD(P)(H)-dependent oxidoreductases which are distinct from the medium-chain dehydrogenase (MDR) and aldo-keto reductase (AKR) superfamilies [3, 4].
Members of the SDR superfamily show early divergence and have only low pairwise sequence identity, but share common sequence motifs that define the cofactor binding site (TGxxxGxG) and the catalytic tetrad (N-S-Y-K), even though variations on this general theme also exist [11, 12]. The three-dimensional SDR structures are clearly homologous with a common α/β-folding pattern characterised by a central β-sheet typical of a Rossmann-fold with helices on either side [4].
In humans over 70 SDR genes exist [13, 14]. Human SDRs have physiological roles in steroid hormone, prostaglandin and retinoid metabolism, and hence signalling [14], or metabolise lipids and xenobiotics [15]. A growing number of single-nucleotide polymorphisms have been identified in SDR genes, and a variety of inherited metabolic diseases have as underlying cause genetic defects in SDR genes [16].
As the number of SDR sequences grows at an unprecedented pace, a systematic nomenclature is essential for annotation and reference purposes. For example, a recent metagenome analysis showed that classical and extended SDRs combined constitute at present by far the largest protein family [17]. Given this large amount of sequence data, a nomenclature system would prevent either the same protein or gene being given multiple names or the same name being given to multiple proteins or genes. Recently, a functional subdivision of the SDR superfamily into at least 200 SDR families has been reported based on Hidden Markov Models (HMMs), using an iterative approach delineating a set of stable families, described in detail elsewhere [18]. These SDR families form a suitable foundation for the nomenclature system that is presented in this work.
SDR family identification using Hidden Markov Models (HMMs)
SDR proteins were extracted from the Uniprot database [19] and from Refseq [20], using a previously developed HMM [21] and the Pfam [2] profiles PF00106, PF01073, PF01370 and PF05368. SDR families were identified using a hidden Markov model approach. Initial HMMs were created based upon SDR clusters aligned using ClustalW [22]. These HMMs were iteratively refined to achieve stable and specific models that could be used for classification and functional assignments of SDR members [18]. In order to avoid bias of the models towards closely related proteins, the alignments were made non-redundant, so that no pair of sequences had more than 80% sequence identity. The iterative clustering process was automated using a series of shell scripts and programs developed in C. Elements of the large-scale computer analysis were carried out on the 805-node Hewlett-Packard DL140 cluster Neolith at the National Supercomputer Centre (Linköping, Sweden). Further details regarding this methodology is described elsewhere [18]. The HMMs will be made available for inclusion in the Pfam [2] and/or InterPro [23] databases.
A sustainable and expandable nomenclature scheme
In the nomenclature scheme, each SDR family has been given a unique number from 1 upwards. The 48 known human SDR families have been allocated numbers from 1 to 48…… of hitherto identified members. Thus, the SDR families found in human and the most common families get the lowest numbers. At present, there are 48 human SDR families detected which are listed in Table 1.
Table 1
Table 1
SDR families with human members. Uniprot identifiers are given for all human SDRs (one representative per corresponding gene).
After numbering of all human families, priority was given to SDR families having mammalian or other eukaryotic members. Here, families present in all kingdoms were given lower numbers than those present in only two and one kingdom. Next, SDR families that were present in both bacteria and archaea were numbered, according to decreasing size of the family. Finally, SDR families present in bacteria were numbered, also according to family size, beginning with the largest. There is no single family with only archaeal members. All non-human SDR families are listed on the SDR web page http://www.sdr-enzymes.org.
Since sequences of newly characterised genomes are reported every month, and the number of completed genomes is expected to grow considerably over the coming years, thanks to the advances in sequencing technologies, it is likely that the current SDR families will grow and that more SDR families will be identified over time. Thus, new SDR family numbers will be added in the future, and the nomenclature will need to be continuously updated. As a continuous source and service to the scientific community we will update and make the data available through the website indicated above.
We are well aware that even if the majority of the protein-coding regions of the human genome are now known, there might be new hitherto unknown SDR forms identified, which might lead to a higher number. However, this is an inevitable consequence of any nomenclature system and should not preclude the launch of a system that covers the majority of known proteins based on current knowledge.
SDR types
There are two types of SDR enzymes with many members, and at least four types with fewer members. The two major types are denoted “Classical” and “Extended” [21] and these are clearly distinguished by subunit size and sequence patterns at the coenzyme binding site and at a segment N-terminally of the active site region. The currently four minor SDR types are denoted “Intermediate”, “Divergent”, “Complex” [21], and “Atypical”. The latter has SDR topology but no known enzymatic activity. Each of these types is characterised by type-specific sequence patterns at the coenzyme-binding site and/or the active site. In the nomenclature scheme, the family number is followed by one letter designating the SDR type, thus making it clear from a quick glance at the family designation to which type the SDR family belongs to, e.g.. SDR1E represents an SDR of the extended type. The letters used in this scheme are shown in Table 2.
Table 2
Table 2
SDR types and their designations.
Optional numbering of individual family members
The nomenclature scheme is extended by adding a number after the type letter, so that each member of every SDR family is given an individual designation, e.g.. SDR1E1. This is essential for tracking individual SDR members, since considerable confusion exists in the literature with multiple designations, aliases, names and abbreviations. Such individual numbering of enzyme members have since long been successfully implemented for other enzyme families, e.g.. aldo-keto reductases (AKRs) [24] and cytochrome P450s [25], and has been recognised as a key to unambiguous referencing. The numbering for the human SDR enzymes is given in Table 1. SDR forms with neighbouring gene locations were given adjacent numbers, e.g.. SDR16C2 and SDR16C3. An additional P after the number denotes a pseudogene, e.g.. SDR14E1P.
Gene-oriented nomenclature
The new SDR nomenclature is gene based. Thus, all splice variants derived from the same gene hold the same main number, but each splice variant is distinguished by a sub-number, separated from the main number by a dash, e.g.. SDR15C1-1, SDR15C1-2. Similarly, polymorphic variants and SNPs are assigned using an asterisk, resulting in a nomenclature such as SDR11E1*1, SDR11E1*2, SDR11E1*3. These are numbered according to the order of the corresponding refSNP (rs) numbers. A continuously updated list of these variants will be available at the SDR web page.
Hierarchical system
The new nomenclature system is strictly hierarchical so that the designations can be shortened at various stages, but still is clearly informative.
Example:
SDR15C1-1one splice variant of a particular SDR member from one species
SDR15C1a particular SDR member from one species
SDR15Cone specific SDR family of the classical type
Web page and continuous updates
The nomenclature scheme outlined in this paper is part of an international effort to systematise and to facilitate all aspects of SDR related research. This will be described in detail on the web page http://www.sdr-enzymes.org, where continuous updates will be available. In addition, various search functions will also be available here, e.g.. to find the SDR name using an amino acid sequence as input or vice versa. This nomenclature system has been presented, discussed and endorsed on the occasions of the VII European Symposium of The Protein Society 2007, the Endocrine Society meeting (ENDO 2007) and the 14th Carbonyl Metabolism meeting (2008).
Acknowledgments
Karolinska Institutet, Linköping University and the Carl Trygger Foundation are acknowledged for financial support. This project was supported by the Deutsche Forschungsgemeinschaft (MA 1704/5-1). The Structural Genomics Consortium is a registered charity (number 1097737) that receives funds from the Canadian Institutes for Health Research, the Canadian Foundation for Innovation, Genome Canada through the Ontario Genomics Institute, GlaxoSmithKline, Karolinska Institutet, the Knut and Alice Wallenberg Foundation, the Ontario Innovation Trust, the Ontario Ministry for Research and Innovation, Merck & Co., Inc., the Novartis Research Foundation, the Swedish Agency for Innovation Systems, the Swedish Foundation for Strategic Research and the Wellcome Trust.
Footnotes
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1. Kallberg Y, Persson B. Prediction of coenzyme specificity in dehydrogenases/reductases: A hidden Markov model-based method and its application on complete genomes. FEBS J. 2006;273:1177–1184. [PubMed]
2. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. The Pfam protein families database. Nucleic Acids Res. 2004;32(Database issue):D138–141. [PMC free article] [PubMed]
3. Jörnvall H, Persson B, Krook M, Atrian S, Gonzalez-Duarte R, Jeffery J, Ghosh D. Short-chain dehydrogenases/reductases (SDR) Biochemistry. 1995;34(18):6003–6013. [PubMed]
4. Kavanagh KL, Jörnvall H, Persson B, Oppermann U. Functional and structural diversity within the short-chain dehydrogenase/reductase (SDR) superfamily. Cell Mol Life Sci. 2008 in press. [PMC free article] [PubMed]
5. Stammers DK, Ren J, Leslie K, Nichols CE, Lamb HK, Cocklin S, Dodds A, Hawkins AR. The structure of the negative transcriptional regulator NmrA reveals a structural superfamily which includes the short-chain dehydrogenase/reductases. Embo J. 2001;20(23):6619–6626. [PubMed]
6. Zheng X, Dai X, Zhao Y, Chen Q, Lu F, Yao D, Yu Q, Liu X, Zhang C, Gu X, Luo M. Restructuring of the dinucleotide-binding fold in an NADP(H) sensor protein. Proc Natl Acad Sci U S A. 2007;104(21):8809–8814. [PubMed]
7. Schwartz MF, Jörnvall H. Structural analyses of mutant and wild-type alcohol dehydrogenases from Drosophila melanogaster. Eur J Biochem. 1976;68(1):159–168. [PubMed]
8. Jörnvall H, Persson M, Jeffery J. Alcohol and polyol dehydrogenases are both divided into two protein types, and structural properties cross-relate the different enzyme activities within each type. Proc Natl Acad Sci U S A. 1981;78(7):4226–4230. [PubMed]
9. Persson B, Krook M, Jörnvall H. Characteristics of short-chain alcohol dehydrogenases and related enzymes. Eur J Biochem. 1991;200(2):537–543. [PubMed]
10. Jörnvall H, Höög JO, Persson B. SDR and MDR: completed genome sequences show these protein families to be large, of old origin, and of complex nature. FEBS Lett. 1999;445:261–264. [PubMed]
11. Jornvall H, Persson B, Krook M, Atrian S, Gonzalez-Duarte R, Jeffery J, Ghosh D. Short-chain dehydrogenases/reductases (SDR) Biochemistry. 1995;34(18):6003–6013. [PubMed]
12. Filling C, Berndt KD, Benach J, Knapp S, Prozorovski T, Nordling E, Ladenstein R, Jörnvall H, Oppermann U. Critical residues for structure and catalysis in short-chain dehydrogenases/reductases. J Biol Chem. 2002;277(28):25677–25684. [PubMed]
13. Persson B, Adamski J, Bray J, Bruford E, Dellaporta SL, Duarte RG, Jörnvall H, Kallberg Y, Kavanagh KL, Kedishvili N, Maser E, Oppermann U, Orchard S, Penning TM, Thornton J. The Short-Chain Dehydrogenase/Reductase (SDR) Nomenclature Initiative. VII European Symposium of The Protein Society; Stockholm, Sweden. 2007. [PMC free article] [PubMed]
14. Bray JE, Marsden BD, Oppermann U. The Human Short-Chain Dehydrogenase/Reductase (SDR) Superfamily: A Summary of Structural and Functional Information. Chem Biol Interact. 2008 in press. [PubMed]
15. Moeller G, Adamski J, Moeller G, Adamski J. Multifunctionality of 17beta-hydroxysteroid dehydrogenases: an update. Mol Cell Endocrin. 2008 in press.
16. Oppermann UC, Filling C, Jörnvall H. Forms and functions of human SDR enzymes. Chem Biol Interact. 2001;130-132(1-3):699–705. [PubMed]
17. Williamson SJ, Rusch DB, Yooseph S, Halpern AL, Heidelberg KB, Glass JI, Andrews-Pfannkoch C, Fadrosh D, Miller CS, Sutton G, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples. PLoS ONE. 2008;3(1):e1456. [PMC free article] [PubMed]
18. Kallberg Y, Oppermann U, Persson B. Classification of the SDR super-family of short-chain dehydrogenases/reductases and other proteins using hidden Markov models. Febs J Manuscript. 2008
19. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2005;33(Database issue):D154–159. [PMC free article] [PubMed]
20. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35(Database issue):D61–65. [PubMed]
21. Kallberg Y, Oppermann U, Jörnvall H, Persson B. Short-chain dehydrogenases/reductases (SDRs): Coenzyme-based functional assignments in completed genomes. Eur J Biochem. 2002;269:4409–4417. [PubMed]
22. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31(13):3497–3500. [PMC free article] [PubMed]
23. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. New developments in the InterPro database. Nucleic Acids Res. 2007;35(Database issue):D224–228. [PMC free article] [PubMed]
24. Jez JM, Flynn TG, Penning TM. A new nomenclature for the aldo-keto reductase superfamily. Biochem Pharmacol. 1997;54(6):639–647. [PubMed]
25. Nelson DR, Koymans L, Kamataki T, Stegeman JJ, Feyereisen R, Waxman DJ, Waterman MR, Gotoh O, Coon MJ, Estabrook RW, Gunsalus IC, Nebert DW. P450 superfamily: update on new sequences, gene mapping, accession numbers and nomenclature. Pharmacogenetics. 1996;6(1):1–42. [PubMed]