|Home | About | Journals | Submit | Contact Us | Français|
Most protein in hair and wool is of two broad types: keratin intermediate filament-forming proteins (commonly known as keratins) and keratin-associated proteins (KAPs). Keratin nomenclature was reviewed in 2006, but the KAP nomenclature has not been revised since 1993. Recently there has been an increase in the number of KAP genes (KRTAPs) identified in humans and other species, and increasingly reports of variation in these genes. We therefore propose that an updated naming system is needed to accommodate the complexity of the KAPs. It is proposed that the system is founded in the previous nomenclature, but with the abbreviation sp-KAPm-nL*x for KAP proteins and sp-KRTAPm-n(p/L)*x for KAP genes. In this system “sp” is a unique letter-based code for different species as described by the protein knowledge-based UniProt. “m” is a number identifying the gene or protein family, “n” is a constituent member of that family, “p” signifies a pseudogene if present, “L” if present signifies “like” and refers to a temporary “place-holder” until the family is confirmed and “x” signifies a genetic variant or allele. We support the use of non-italicised text for the proteins and italicised text for the genes.
This nomenclature is not that different to the existing system, but it includes species information and also describes genetic variation if identified, and hence is more informative. For example, GenBank sequence JN091630 would historically have been named KRTAP7-1 for the gene and KAP7-1 for the protein, but with the proposed nomenclature would be SHEEP-KRTAP7-1*A and SHEEP-KAP7-1*A for the gene and protein respectively. This nomenclature will facilitate more efficient storage and retrieval of data and define a common language for the KAP proteins and genes from all mammalian species.
Hair and wool fibres typically consist of three major structural components: the cuticle, the cortex and the central medulla. Approximately 90% of the cortical cells contain longitudinally arrayed keratin intermediate filaments (IFs), consisting of keratin (K) proteins. These filaments have a matrix surrounding them, that contains the keratin-associated proteins (KAPs), which cross-link with the IFs through extensive disulfide bonding 1. The role of the KAPs in IF assembly into arrays is considered to be crucial and therefore they most likely affect wool and hair attributes such as strength, inertness and rigidity 2.
The KAPs were originally best understood in sheep, reflecting the then economic importance of wool and the preponderance of wool protein biochemistry undertaken from the mid-twentieth century. Recently our understanding of the KAPs has advanced significantly, with the advent of the large-scale whole-genome sequencing of human KAP genes. Moreover, extensive genetic variation is now being described in the KAP genes (KRTAPs) from sheep 3-6 and humans 7,8. However sequence homology comparison between the KRTAPs from different species adds a new complexity to KAP naming, as obvious homologues are at times difficult to find. Whilst there may be genetic identity, there can also be differences in the nucleotide sequence and the chromosomal arrangement of the genes.
The nomenclature system for KAPs has not been reviewed since the proposition and adoption of the system of Rogers and Powell. This was first proposed in 1993 9 and explained in detail in 1997 10. While this system has served us well until now; the differences now being realised between species, the extensive genetic variation now being documented in some KRTAPs and some misuse or misinterpretation of the nomenclature, suggests it is time for a revision.
The earliest attempts to classify keratins had their origins in the methods used to separate wool proteins. In 1934 these proteins were divided into two extractable classes: those with a lower sulphur content than whole wool and those with a higher sulphur content, otherwise known as SCMK-A and SCMK-B respectively 11. This division was based on the fractional “salting-out” of s-carboxymethylated proteins. Subsequently the former group became the intermediate filament proteins, while the latter group became the KAPs. The advent of amino acid analysis enabled a further sub-division of this high sulphur class (SCMK-B) into high (HS) and ultra-high sulphur (UHS) proteins, and this based on whether their cysteine content was above or below 30 mol% 12,13. Amino acid analysis also had further impact on our knowledge of the KAPs with the discovery of a third class of proteins in wool, one that proved to be rich in glycine and tyrosine, so-called high glycine-tyrosine proteins (HGT) 14,15.
Subsequent attempts to fractionate the HS group of proteins and identify sub-components led to further improvement in our understanding of this class of proteins and also a proliferation of new protein names. The use of fractional precipitation with ammonium sulphate solutions resulted in the definition of two fractions SCMK-B1 and SCMK-B2 16, with subsequent sub-fractionation of the B2 group into a further three components (their names now shortened to B2A, B2B and B2C) by chromatography 17. In parallel with these studies, column electrophoresis was used to fraction the HS components, one component, SCMK-BIII, being split by gel filtration into two new HS protein families: BIIIA and BIIIB 18,19.
The HGT were sub-divided into the Type I and II sub-classes by ion-exchange chromatography 20, the former being of moderate percentage of glycine and tyrosine, and comprising two components C2 and F 21. In contrast, the Type II family proteins, which contain a higher percentage of these two amino acids, were thought to contain up to 10 individual components 10,22, although only one has been fully sequenced to date 23. Finally, although their existence had been known about for some time, members of proteins from the UHS group were identified, the first cuticle UHS proteins being sequenced in 1990 and 1994 24,25 and cortical UHSs in 1994 and 1995 26,27.
The increasing diversity of the KAPs, coupled with their non-uniform naming, led Rogers and Powell 10 to suggest a nomenclature for the KAP proteins and genes using the abbreviation KAPm.nxpL for the protein and KRTAPm.nxpL for the gene. In subsequent iterations of this system the gene name became italicised, although use of this convention is only sporadic in the literature. In the Rogers and Powell 10 system, “m” denotes a family or unique protein, “n” denotes a component number, “x” denotes a variant, “p” denotes a pseudogene and “L” stands for “like”. This nomenclature divides the KAPs of all species into families and further into family members based on similarities in their amino acid sequences. Historically then, what was originally called SCMK-B became SCMK-B2, then HS-B2A and then KAP1.1 for the protein and KRTAP1.1 for the gene. Somewhat strangely through this time, the HS, UHS and HGT classification system persisted, perhaps reflecting that the abbreviations gave some indication of the type of protein being described, although in the last few years the discovery of KAPs that contain moderate amounts of cysteine and glycine has made this older classification system even more inadequate.
A variety of things have arisen since 1997, which leads us to propose that the KAP/KRTAP nomenclature is revised and adjusted. Firstly the 2006 release of a consensus nomenclature for the mammalian keratins 28 identified the need to adhere to guidelines proposed by the Human Genome Nomenclature Committee (HGNC), whose prominence in the area of nomenclature grew following the sequencing of the human genome. Consistency with the recommendations of this organisation seems sensible, especially in the context of what is known about KAPs/KRTAPs. While the Rogers and Powell 10 system was useful, the protein nomenclature included a term “p” for a pseudogene, a term that could only really describe a non-expressed or faulty form of a gene and not a protein. The gene nomenclature was also somewhat confusing as “p” and “L” probably should not be present together. There is also some confusion over the use of punctuation in gene names, with both full stops and hypens being used between the m (family) and n (constituent) in the nomenclature and with seemingly little consistency or pattern.
HGNC suggests punctuation should be avoided, the exception being its use in defining groups of related genes and in this respect, in our proposed nomenclature we attempt to define a system that should lead to greater consistency in naming.
Two other and more substantive issues have also emerged, the increasing diversity of the genes from different species, and the genetic variation there-in. We feel that these matters need to be better accommodated in the nomenclature.
To date, more than 100 KAP genes have been isolated from a range of mammalian species including sheep, humans, mice and rabbits. These genes have been placed in 27 families, each comprising 1-12 members 29-32. In the human genome, the 89 functional KRTAPs identified have been placed into 25 families (Table (Table1),1), although Wu et al 33 suggest up to 122 functional (n=101) and pseudo (n=21) genes based on analysis of sequences lodged in databases. These genes are clustered into five domains on three different human chromosomes (Table (Table2).2). In an analysis of eight species, Wu et al 33 have suggested that humans have a similar number of genes to other primates, but that rodents have an expanded repertoire.
While there appears to be conservation at the sequence level across species 33, it is still not clear as to which of these genes are expressed and where and when this expression occurs. For example, except for families 16, 22, 25 and 27, all of the KRTAPs are expressed in the human hair follicle 29, 30.
In sheep, only 13 functional KRTAPs from 7 families have been reported to date and homologues for the other human genes have not been identified yet (Table (Table1).1). This is probably, in part, a result of the limited amount of research undertaken on sheep KRTAPs in the last decade.
The known sheep genes are clustered into three domains on three chromosomes (Table (Table2).2). Given the similarities in the chemical make-up and structure of wool and human hair, and the similarity of the individual genes or clustered families (Table (Table2),2), it is expected that more individual KRTAPs and families will be found in the sheep genome. Equally it would not be unreasonable to expect that the overall number of putative KRTAPs might be higher than previously thought. While the discovery of more and more KAP genes is not in itself any reason to change the nomenclature system, it does highlight the need to revise its suitability, especially in light of the issues described below.
We believe that the species of origin of any KAP protein or gene sequence needs to be stated in any classification system. Homology comparisons between species reveals that KAP proteins of the same name (and therefore one would assume family), may actually have low inter-species homology, but never-the-less tend to cluster with other family members from the same species. For example, sheep KAP1-3 does not share a high homology with KAP1-3 from either humans or mice, but is more similar to other sheep KAP1 family members (Fig (Fig1).1). All the sheep KAP1 family members tend to cluster, and all the human KAP1 family members tend to cluster, on sequence-based comparison (Fig. (Fig.1),1), and a similar phenomenon can also be seen for the KAP3, KAP4 and KAP5 families (data not shown). Consequently it can be difficult to assign a new sequence to a family, or constituent group within that family, especially if that sequence is one of the first obtained from a species.
The assignment of KAP/KRTAP family membership is already in some instances not supported by homology (Fig. (Fig.1),1), and some assignments seem strange. For example, Liu et al 34 recently reported the presence of a Capra hircus KAP16-6 gene, although it has very low sequence homology with any human KAP genes, including the published human KAP16-1 gene sequence (GenBank accession number AC003958; 35), and in the absence of evidence of there being a KAP16-2, KAP16-3, KAP16-4 or KAP16-5 gene in humans. Hence we believe that at the very least, the species of origin of any given sequence needs to be clearly identified, although the need to identify species might be less important in literature if a publication is focussed on a single species and no conclusions that may have implications across species are drawn.
Researchers finding new KAP or KRTAP sequences probably also need to be more cautious in assigning the sequence to a family. Accordingly we recommend that the “L” term for “like”, is used if any doubt exists at all as to the origin of the sequence, and that this term, which is also used in the Rogers and Powell 10 system, continues in the nomenclature recommended below.
Studies of variation in KRTAPs are limited in most species, and knowledge of the various types of genetic variation is therefore also limited. Currently humans and sheep are the two best studied species and probably because of the recent emphasis on human genome discovery and the historic importance of wool in making textiles. It is however expected that as more genomes are sequenced and in multiple individuals, the need for a comprehensive nomenclature that accommodates genetic variation more effectively, will increase.
In humans, studies of KRTAP variation have only been carried out in Caucasian and Japanese individuals and have been restricted to the KAP1 7 and KAP4 8 families. Four previously identified and apparently different KAP1-n genes (KRTAP1-1A, KRTAP1-1B, KRTAP1-6 and KRTAP1-7) have been shown to be allelic variants of a single gene 7, while four other KAP1-n genes (KRTAP1-8A, KRTAP1-8B, KRTAP1-3 and KRTAP1-9) were revealed to be allelic variants of another KAP1-n gene 7. Two or three allelic variants have also been reported for 10 of the 11 human KAP4-n genes 8.
In sheep, genetic variation has been reported for the KRTAP1 3,4,36, KRTAP3 (37, KRTAP5 5,37, KRTAP6 6,37, KRTAP7 37,38 and KRTAP8 38,39 families. Up to nine alleles have been reported for KRTAP1-3 and KRTAP1-4. It should be noted that the apparently higher degree of variation found in sheep, compared to humans, is possibly due to a greater number of genomes being screened.
The variation detected in the KAP genes includes single nucleotide substitutions and length variation. Variation in length is noted in both sheep and human KAP1-n genes 3,7, and human KAP4-n genes 8. It appears to be the result of having a variable number of cysteine-rich repeated coding sequences and these have probably arisen by intragenic deletion and/or duplication of the repeated segments of the genes during evolution 8.
There is little understanding of how variation in human KAP genes affects hair structure or other keratinagous tissue. However genetic variation in each individual KAP family could be much higher than previously thought based on recent research in sheep 4-6,36,38, and this may underpin some of the variation in hair and wool characteristics. The ability of the nomenclature system to accommodate extensive genetic variation, or what more correctly might be called polymorphism, given the over-use of this word in describing less variable gene systems; is one key driver in our revised system. What is more, while our revised nomenclature is not dissimilar to that of Powell and Rogers 10 in accommodating genetic variation, we feel that the term in the name that denotes genetic or allelic variation should be at the end of the gene name, especially as we believe an increased emphasis on how this variation affects wool and hair traits will emerge with time.
Given the numerous KAP genes identified to date and high levels of diversity among the KAPs from different species of more distant phylogenetic relationship, a revised nomenclature needs to be both flexible to accommodate variation and informative. We propose that the current KAP nomenclature 10 can be easily modified to accommodate the HGNC guidelines to become a consensus system for all mammalian species, as follows:
In this nomenclature, “sp” is a unique letter-based code for different species described by the protein knowledge-based UniProt (www.uniprot.org/docs/speclist), for example, “HUMAN” for Homo sapiens, “SHEEP ” for Ovis aries, “BOVIN” for Bos taurus and “MOUSE” for mus musculus. This would typically only be used in publications and when necessary; “m” is a number identifying the family; “n” is a constituent of that family; “p” signifies a pseudogene if there is an obvious fault in the gene (e.g. presence of an unexpected stop codon), while “L” if present, signifies “like” and refers to a temporary “place-holder” until the family (or constituency within that family) is confirmed; “x” is an alphabetical letter signifying the variant or allele, but preferably at the level of an extended haplotype encompassing the promoter, 5' UTR, exons and introns and 3'UTR. “p” and “L” should probably not be used together as a pseudogene will not produce a protein by definition, and hence genetic homology with any other sequence is of limited value.
The use of KAP for the protein and KRTAP for the gene is consistent with the keratin nomenclature where K is used for a protein and KRT is used for the associated gene.
To allow adequate time for transition and capture historically useful information, we suggest that in future, historic names might be bracketed after the first mention of the gene or protein, such as KRTAP1-1(B2A), for the next few years (see old terminology in Powell and Rogers 10).
This nomenclature preserves the widely used and broadly referred to system proposed by Rogers and Powell 10, but with the important addition of a species identifier to constitute an informative KAP naming system and a minor change in the order of terms in the nomenclature. Accordingly, the new nomenclature should have minimal impact on current publications and databases, but if used correctly should facilitate more informed discussion about the KAP genes and proteins. We strongly would urge authors to use the L-term until such time as the location of the gene can be confirmed on the chromosome and its similarity to a sequence of known classification confirmed, or if they harbour any concerns about the family of origin of the protein or gene they are describing. It may be appropriate for scientists working on KAP genes and proteins to communicate more freely with each other and also with organisations like HGNC to insure that the nomenclature is used appropriately, while also being regularly revised to accommodate any future findings about the KAP genes and proteins.
This work was financially supported by FRST (C10X0710: Keeping New Zealand Wool Products at the Cutting Edge through Enhanced Wool Quality) and the Lincoln University Gene-Marker Laboratory. The Wool Research Organisation of New Zealand Inc and New Zealand Wool Industry Charitable Trust Postgraduate Scholarship to HG is acknowledged.