Leucine-rich repeats (LRRs) are protein-ligand interaction motifs found in a large number of proteins of diverse structure, localization and function in bacteria, fungi, plants and animals [
1]. Many of these have well-known functions in the innate immune system [
2]. Many others, especially those with extracellular LRRs (eLRRs), are involved in various aspects of nervous system development [
3]. In both cases, the nature of the LRR motifs is important for generating a diversity of interactions, with exogenous factors in the immune system and with the huge number of different cell types in the developing nervous system. The structure of LRR motifs and their arrangement in repetitive stretches of variable length generate a versatile and highly evolvable framework for the binding of diverse proteins and non-protein ligands.
Seven classes of LRR have been defined [
1]; (these have been referred to as LRR "subfamilies" [
4]; we use the term subfamily here in the phylogenetic sense to refer to sets of closely-related genes). Within animals, four separate types are recognised, three typically intracellular and one extracellular. Whether all these different classes are evolutionarily related by descent or represent convergent evolution is open to debate [
1] but they all share a characteristic structure. Each repeat is typically 19–29 amino acids long and has a well-conserved N-terminal stretch of 9–12 amino acids that is characterized by precisely-positioned hydrophobic residues (usually leucines) and that forms a β-strand and a C-terminal stretch of 10–19 amino acids that is more variable in length, sequence and structure. The arrangement of multiple repeats in tandem generates a horseshoe-shaped solenoidal structure, with the β-strands stacking to form the concave surface and the variable stretches forming the convex surface [
1,
5-
7]. Most LRR regions typically also have both N-terminal and C-terminal cap regions, which shield the hydrophobic core of the LRR structure. In extracellular proteins these regions (LRR-NT and LRR-CT domains, of which several subtypes exist) are defined by precisely positioned cysteine residues [
4].
LRR proteins, both intracellular and extracellular, have well-characterized functions in the innate immune system that are similar from plants to mammals [
2]. The extracellular LRR (eLRR) proteins in animals include the Toll-like receptors (TLRs), a family of transmembrane proteins characterized by an LRR region, a transmembrane (TM) domain and a cytoplasmic Toll/IL-1 receptor (TIR) domain. This family has expanded in vertebrates to allow detection of a diverse set of antigens [
8]. In flies, the TLR family has also expanded, where, in addition to roles in immunity for some of these proteins [
9], many are required for various aspects of embryonic and nervous system development [
10-
13]. Tol-1 in worms is also important in development, possibly contributing to a code of molecules defining neuronal connectivity [
14,
15]. Recent reports indicate that some mammalian TLR genes may also be expressed and function in neurons [
16,
17].
A large number of other eLRR proteins have been implicated in various aspects of neural development, genetically in flies [
18-
20] and in mammals in assays of neurite outgrowth, [
21-
24], fasciculation [
25] and/or synapse formation [
26,
27]. Some of these contain, in addition to the extracellular LRR domain, immunoglobulin (Ig) or fibronectin type-3 (FN3) domains (for review see [
3]). In some cases, the functions of eLRR proteins are mediated by homophilic interactions [
25,
28-
30]. In other cases they are mediated by the binding of other proteins
in cis [
31-
33] and
in trans [
27,
34-
36]. Several eLRR proteins have been found to modulate the signaling of various growth factor pathways (e.g., [
37-
41]).
Surprisingly, apart from the TLR genes [
42] and small secreted proteoglycans [
43], relatively few eLRR genes have been studied genetically in mice. Among the ones that have, examples of phenotypic effects in the nervous system include increased plasticity, sprouting and nerve regeneration [
44], and defects in axon guidance and cell migration [
45], learning and memory [
46], myelination [
47,
48] and neuronal survival [
35].
The importance of this class of proteins for nervous system development in humans is apparent from the large number of examples implicated in neurological or psychiatric disorders (reviewed in [
49]). These include epilepsy [
50], Tourette's syndrome [
51], night blindness [
52], congenital insensitivity to pain (with mental retardation) [
53], and possible links to Alzheimer's disease [
54].
Despite the growing number of eLRR proteins implicated in nervous system development or disease this family of proteins has received far less attention as a class than other better characterized families like the immunoglobulin [
55,
56] and cadherin [
57] superfamilies. In particular, there have been no systematic surveys of the genomic complement of these proteins or investigation of their evolutionary relationships. We therefore set out to catalogue the entire extracellular leucine-rich repeat proteome of four organisms:
Caenorhabditis elegans,
Drosophila melanogaster,
Mus musculus and
Homo sapiens. We used a hierarchical clustering system to analyse within and between-species relationships, revealing independent diversification and expansion of subfamilies in each species and rapid sequence divergence. These analyses highlight the large number of novel, uncharacterized eLRR proteins in each of these genomes, including several novel subfamilies. A number of these show highly restricted expression in the nervous system in mouse or fly.