|Home | About | Journals | Submit | Contact Us | Français|
The Lupus Family Registry and Repository (LFRR) was established with the goal of assembling and distributing materials and data from families with one or more living members diagnosed with SLE, in order to address SLE genetics. In the present article, we describe the problems and solutions of the registry design and biometric data gathering; the protocols implemented to guarantee data quality and protection of participant privacy and consent; and the establishment of a local and international network of collaborators. At the same time, we illustrate how the LFRR has enabled progress in lupus genetics research, answering old scientific questions while laying out new challenges in the elucidation of the biologic mechanisms that underlie disease pathogenesis. Trained staff ascertain SLE cases, unaffected family members and population-based controls, proceeding in compliance with the relevant laws and standards; participant consent and privacy are central to the LFRR’s effort. Data, DNA, serum, plasma, peripheral blood and transformed B-cell lines are collected and stored, and subject to strict quality control and safety measures. Coded data and materials derived from the registry are available for approved scientific users. The LFRR has contributed to the discovery of most of the 37 genetic associations now known to contribute to lupus through 104 publications. The LFRR contains 2618 lupus cases from 1954 pedigrees that are being studied by 76 approved users and their collaborators. The registry includes difficult to obtain populations, such as multiplex pedigrees, minority patients and affected males, and constitutes the largest collection of lupus pedigrees in the world. The LFRR is a useful resource for the discovery and characterization of genetic associations in SLE.
SLE is a debilitating and potentially fatal autoimmune disease manifested as a wide variety of clinical features and characterized by the production of antibodies to components of the cell. As the immune system in SLE patients is unable to differentiate some specific antigens from healthy tissue and foreign invaders, the consequent inflammatory response affects the joints, skin, blood and many organs, including the kidneys, brain, heart and lungs.
Disease genetics are important, but at this time the biologic mechanisms that link associated genes to disease phenotypes are largely unknown. Genetic associations often introduce us to previously never contemplated biologic possibilities, therefore such genetic variants must be critical components in one or more of the mechanisms that produce the disease phenotype. The Lupus Family Registry and Repository (LFRR), formerly known as the Lupus Multiplex Registry and Repository (LMRR), was established in 1995 with the goal of assembling and distributing materials and data from families with one or more living members diagnosed with lupus in order to address SLE genetics.
This goal is today a reality; the registry and repository have allowed the creation of an interwoven web of efficient and complementary research that has advanced the important work of elucidating the genetics of SLE. There are 76 approved researchers and collaborators that are directly using LFRR resources for their work, 62 of them within the USA and 14 international scientists. Today, the LFRR is operating with 31 full- and part-time staff members who recruit participants; review medical records; staff the laboratory; design, implement and maintain information technology; obtain informed consent from and protect human subjects; analyse the data; and administer the overall effort. Additionally, five affiliated centres across the USA contribute biological samples and clinical information to the registry (details available as supplementary data at Rheumatology Online; supplementary data LFRR Approved Scientific User Application Packet has been published with permission from the Oklahoma Medical Research Foundation).
SLE is known to occur in families more frequently than in the general population; 5–10% of SLE patients have a second family member with SLE  and the sibling recurrence rate (prevalence in siblings of SLE affected/population prevalence) is estimated to be between 8 and 30% . The monozygous twin concordance rate of SLE is 24%, with a lifetime rate concordance of ~30% [3, 4, 5]. Collectively, these observations strongly suggest that an important understanding of the pathogenesis of SLE could be initiated through the discovery of genes that increase familial susceptibility to the disease.
In the early 1990s, Dr John Harley, at the Oklahoma Medical Research Foundation (OMRF), began assembling multiplex families for SLE, and caught the attention of the National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS) at the American College of Rheumatology (ACR) annual meeting in the fall of 1993. This was the conceptual birthplace of what became the Lupus Multiplex Registry and Repository and led to contractual funding from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) in the fall of 1995.
Establishing a formal registry led to an increased effort in recruitment of multiplex families with SLE, along with more rigorous and detailed data collection and organization, eventually becoming a centralized repository of data, sera, plasma, DNA, transformed B-cell lines and peripheral blood mononuclear cells. The LMRR offered approved scientists access to more than 5000000 data points for each sample, including clinical information, demographics, lupus serology and genotyping data from microsatellites, and later, single-nucleotide polymorphisms (SNPs).
Lupus is a disease of disparities: it is estimated that nine out of ten people diagnosed with lupus are women, and four of ten are African-Americans who are at least three times more likely to have lupus than European-Americans of the same sex . Recent data from admixture studies show that American-Indian ancestry is almost 8-fold more important in generating lupus than is European ancestry . Finally, Asian ancestry also leads to a higher prevalence of lupus than European ancestry [8, 9]. Knowing these disparities, the staff and leadership of the LFRR have consistently targeted recruitment efforts towards the population lupus affects the most, maintaining at least 40% minority enrollment since its inception.
A central focus of the repository since 1999 has been identifying the genes responsible for lupus in African-Americans. This effort was boosted by collaboration, in 2003, with Drs Gary Gilkeson and Diane Kamen at the Medical University of South Carolina (MUSC), who spearheaded the enrollment of SLE patients and controls from the Gullah population. The Gullah are a semi-isolated population residing in the Sea Islands along the South Carolina coast and adjacent inland communities. They have lived in this region since the early 1700s when they were transported from West Africa, and constitute a unique population with greater genetic homogeneity than most other African-American communities in the USA. The Gullah maintain stable family units, making assembly of a collection of pedigrees for the study of multigenic diseases like lupus possible.
The LFRR genetics approach has evolved over time; originally, it was focused on linkage in multiplex pedigrees and microsatellite genotyping. The technical revolution in SNP genotyping changed the focus to case–control studies for association, making genome-wide association studies (GWASs) feasible. The LFRR responded to this challenge by expanding recruitment to include simplex families, which are ideal for population-based case–control and family-based association studies.
By 2006, the expanded future goals of the LMRR led to active recruitment of both sporadic and familial SLE and the LMRR was renamed the Lupus Family Registry and Repository (LFRR). In addition, this change allowed for a greater variety in race, ethnicity, gender and age of participants diagnosed with SLE, leading to a collection of lupus patients that more closely reflects the natural distribution of the disease across those same variables (Figs 1–3).
Fifteen years after its modest beginnings, the LFRR has collected 2618 affected SLE individuals from 1954 pedigrees, 8157 controls and information on 82000 individuals and 35 million data points. The bio-bank contains more than 750000 aliquots of biological samples and the genotyping core has generated more than 650 million genotypes. All of this has been translated into 104 peer-reviewed publications based on access to LFRR resources (see supplementary data available at Rheumatology Online) and has made a major contribution toward the transformation of our understanding of the genetics of lupus that, thanks partly to the LFRR, is well under way.
Nowadays, the processes of the LFRR are divided into multiple specialized areas as follows.
The LFRR recruiters receive information, with the permission of the prospective participant, concerning whether they are willing to participate and whether they might satisfy entry criteria. Collaborating investigators, the Centers for Medicare and Medicaid Services, physician referrals, recruitment events, and the LFRR Web site (www.lupus.omrf.org) are ascertainment sources. Each SLE patient who agrees to participate completes a screening interview. Recruiters score the interviews based on the scoring system developed to gauge the ACR classification criteria , enter the information into the database, and alert record reviewers of an interview awaiting evaluation. Recruiters proceed based on the record reviewer’s decision to enrol the participant, to place her/him into pending, or to remove from further consideration.
Enrolment involves obtaining informed consent from the participant and participating family members and controls, providing paperwork and arranging phlebotomies. Weekly follow-up through phone calls, e-mails or letters is critical to ensure sample and data return. Participants who are hesitant about completing participation after enrolment are removed, but not before substantial recruiter effort to facilitate their initial decision to participate.
Potential participants placed into pending (whose eligibility is determined by reviewing medical records before obtaining a blood sample) are provided with an explanation of why they are not proceeding immediately to enrolment, as well as release of information (ROI) forms and the OMRF Notice of Privacy Practices. Recruiters contact pending participants monthly regarding the status of records acquisition and review. After record review, enrolment or removal procedures follow based on the decision of the reviewer.
All enroled participants who complete participation receive a small reimbursement, a copy of their serology report (if desired) and a free subscription to the annual LFRR newsletter, the Lupus Linkage Newsletter.
The clinical data that are collected in the database are based on verifiable medical records. The records pertinent to the diagnosis and treatment of each affected participant are obtained by records acquisition specialists after the participant has sent a signed Health Insurance Portability and Accountability Act (HIPAA)-compliant release of information form along with a list of facilities that will provide the records. Each health care provider is contacted in writing, specifying the patient and time frame for records of interest and a description of the purpose of the request, and provided a copy of the signed ROI form. Collecting all the records often requires repeated contacts with the providers; the database helps track the requests and monitors ROI expiration dates. Minimizing the expenses of medical record retrieval is a continuing goal.
The medical record reviewer (M.D., P.A.-C. or R.N. specifically trained in this area) analyses the interview from the recruiter to determine whether the prospective participant is likely to meet ACR classification criteria for SLE [11, 12]. The medical records are screened by the reviewer and specific information is collected regarding which of the ACR classification criteria have been met and the earliest date that they were observed, as well as more than 200 additional laboratory and clinical data points that characterize the SLE manifestations and treatment of each affected participant.
The biological samples of enrolled participants are usually received by overnight courier and immediately assigned a unique lupus genetics study barcode (LGScode) for sample and data-tracking purposes. The most commonly received bio-specimen is whole peripheral blood, however, occasionally mouthwash samples or preserved tissues are the source of DNA. Up to 66.5ml of blood are collected from female participants; one additional 9.5ml blood tube is collected from affected male participants (for karyotyping). An aliquot of blood is sent to the Clinical Laboratory Improvement Amendments (CLIA)-approved Clinical Immunology Laboratory at OMRF, where a standardized set of serological tests are performed. These include an ANA test on HEp-2 cells (an indirect fluorescent antibody test); anti-dsDNA (with titre by immunofluorescence against Crithidae luciliae); extractable antibodies (ENA) (by precipitin) with detection of antibodies against Ro/SSA, La/SSB, Smith (Sm), nRNP, ribosomal P, PM-Scl, Jo-1, Mi-2, Scl-70 and unidentified precipitins; aPLs (by ELISA) comprising IgG, IgM and IgA cardiolipin antibodies; and in the case of affected participants, an additional aliquot of 100µl is used for determination of the total haemolytic complement (CH-50).
The main components isolated from blood for the bio-specimen bank include serum, plasma, peripheral blood mononuclear cells (PBMCs) for EBV-transformed cell lines (EBVTLCs) and frozen specimens, and granulocytes for DNA isolation. Each of the components is stored in colour-coded aliquots of different volumes that allow for both efficient sample retrieval and minimization of freeze/thaw cycles. Dilutions of DNA are stored in 96-well deep-well storage plates at different dilutions to facilitate distribution to approved users (detailed handling protocols are available as supplementary data at Rheumatology Online).
The physical specimens collected from each participant are divided into two sets and are housed in two independent buildings that are both physically separated by space and electrical control, and monitored by onsite and offsite systems. This arrangement allows for the availability of backup samples for each participant in case of a catastrophic failure at one location.
The data collected from each participant include the interview, questionnaire, serology results from the OMRF Clinical Immunology Laboratory and medical record review. These data are entered into the database; the questionnaires are scanned and the serology is directly imported from the computer system of the OMRF Clinical Immunology Laboratory. The remaining information is manually entered by a specialized member of the LFRR staff and later checked for accuracy by a second member.
The informatics infrastructure is critical to operate, maintain and store data for a successful registry. Day-to-day issues including database schemas, storage requirements and informatics integration with enrolment workflows are indispensable. The two central canons of the informatics approach to the challenges posed by the LFRR are that (i) the accurate identification of subjects and samples is paramount and (ii) we must automate every possible procedure and process.
Every participant and enrollee is related to a central database via a single barcode identification scheme, the LGScode, which is a six-digit identifier. The database server is a relational database, wherein different tables of data are interconnected or related to one another. The MySQL relational database management system has more than 200 tables and 1700 fields, which are available across the network to approved users. The main LFRR interface for staff contains a number of custom forms and code currently housed in a Microsoft Access front end.
The LFRR inventory allows the accurate and precise location of the biological samples geographically in freezers by modelling them within the virtual world of the computer database. Each aliquot is linked back by its LGScode to its respective study subject and sample generation event, the precise coordinates of location in storage and any addition, removal or manipulation. The complete inventory of the registry can be appraised and followed in real time, down to the volume of each aliquot.
The genotypic data collected by the LFRR are also processed through the informatics system, which has participated in the analysis of more than 650000 SNP markers. More than 1 billion genotypes are stored and distributed by a Web-based interface for easy export.
Finally, the informatics team is also responsible for quality control and security of the data. The latest technology is applied to encryption systems, anti-virus screening, firewalls, secured data centre, software patches, multiple checks on data entry, software-level data validation and database-level data constraints.
The information technology (IT) team provides custom-made solutions for the needs of the LFRR staff. By utilizing open standards, mostly free or open source software and avoiding proprietary systems that hold the danger of locking the LFRR into a single vendor or system, we are able to grow and adapt as needed without unnecessary purchasing expenses (detailed information available as supplementary data, available at Rheumatology Online).
Participant consent and privacy are the top priorities of the LFRR. The LFRR operates under the guidance of two institutional review boards (IRBs)—OMRF and the University of Oklahoma Health Sciences Center (OUHSC). The scientific activity requires a full-time IRB coordinator within LFRR, who responds to human subject and privacy-related questions and concerns from the LFRR recruiters, referral sites, users and collaborators. The coordinator oversees all communications between the LFRR and the IRBs, enforces compliance with human subject rules and regulations and supervises the successful completion of the Collaborative Institutional Training Initiative (CITI) Course for the Protection of Human Subjects in Research for all new employees.
All of the forms sent to participants, including the informed consent forms, the LFRR Web site text, advertising documents and revisions or modifications of any of these require IRB approval. An average of 45 new and revised proposals requiring IRB approval have been submitted every year in recent years. In addition, both IRBs require annual continuing review progress reports.
The addition of investigators at other institutions as LFRR affiliate sites also requires approval by the IRBs of the affiliate site and of OMRF and OUHSC; each site uses an informed consent document designed to satisfy the regulations of the IRBs of participating institutions.
One of the main goals of the LFRR is to make the wealth of clinical and biological resources available to the relevant scientific community. It is important to note that the data and material of the LFRR are suitable for addressing some scientific problems but may not be a good resource for others; one of the major reasons for writing this article was to help interested scientists assess whether their scientific questions of interest can be addressed with the data or materials in the LFRR (Fig. 4).
The application process to become an approved scientific user of the LFRR includes completing the application packet, agreeing to the letter of understanding by the investigator and responsible institutional official, and approval by the IRB of the requesting institution. This information is reviewed by the LFRR Scientific Advisory Committee (SAC), the OMRF Director of Research Administration and the NIAMS LFRR Program Officer. Approval is either granted or denied; often, additional information is required. (Application packet is available in supplementary materials at Rheumatology Online and at www.lupus.omrf.org/LFRRApp.html.)
The available data include a database release file, genotyping results from about 300 loci from pedigrees informative for linkage, and pedigree structure diagrams for all pedigrees except sporadic cases. The biological samples that can be requested include DNA, serum and plasma and a renewable source of DNA, which are the primary cultures of transformed B lymphocytes.
To date, more than 100 peer-reviewed scientific publications have resulted from the samples and data collected in the LFRR. We are optimistic that the body of work resulting from the LFRR will continue growing at an ever-increasing pace, paralleling the dramatic advance in genetic technologies.
Some of the early discoveries resulting from LFRR were derived from the first genome scan on 94 multiplex families completed in 1998. This study identified a genetic linkage to chromosome 1q21–22 near the FcγRIIIA locus, which was much stronger in African-American pedigrees than European-American pedigrees . Meanwhile, linkage to 13q32 was identified in African-American families containing male SLE cases , to 4p16-15.2 in European-American but not African-American pedigrees , and to 12q25 in Hispanic and European pedigrees .
A meta-analysis of linkage studies revealed the most powerful effect on chromosome 6 in the region containing HLA and the second strongest effect on chromosome 16 . The appreciation of the relative magnitude of this linkage fostered a candidate gene study that not only identified ITGAM (CD11b, CR3) as an associated gene, but also presented, using trans-ancestral mapping, R77H as the variant of ITGAM that is responsible for this association [18, 19].
In 2003, evidence was found that the risk of SLE in men with Klinefelter’s syndrome (47, XXY) is similar to the risk in 46, XX women, and approximately 14 times higher than in 46, XY men, consistent with the notion that SLE predisposition is partly due to an X chromosome gene–dose effect , a finding replicated in a mouse model of SLE . The groundbreaking International Consortium for Systemic Lupus Erythematosus Genetics (SLEGEN) genome-wide association studies (GWAS) study that identified 13 new genetic effects in lupus  had the LFRR samples as its foundation; nearly all of these effects have subsequently been confirmed. Indeed, at this time there are 37 genes known to be associated with SLE (Table 1). Of these, the LFRR has contributed to the first discovery or to the all-important replication and confirmation of nearly 30 of these associations (in published and unpublished data).
The LFRR is a unique resource that has adapted to changing opportunities (e.g. multiplex to simplex pedigrees, discussed above). In addition, many problems have been encountered and sometimes difficult choices have been made in hopes of optimally applying resources. For example, the LFRR uses its resources to expand the collection instead of attempting periodic reassessment of previously enrolled participants. This means that the collection is larger at the cost of having very little data on the longitudinal disease course of SLE cases or on family members that may have subsequently developed SLE (or another autoimmune disease) (Tables 2 and and33).
A major logistic challenge encountered in the establishment of registries is the accurate collection of biometric registry entry information . A unique feature of the data collection in the LFRR is that the clinical features of the affected individuals are available not only in qualitative form, but we also use a scoring system that allows any user of the data to evaluate the level of certainty with which each data point was obtained . For example, individuals who have apparently convincing SLE based on their interviews and questionnaires, but for whom the available medical records are insufficient to document four or more ACR criteria, are labelled as affected with limited evidence. This allows individual scientists to determine how stringent they want to be in the inclusion or exclusion of a putative case for their study. We also record the level of evidence confirmed by medical records for each individual ACR criterion, again enabling researchers to make their study group more or less homogeneous based on their particular needs. (The scoring system is available in supplementary materials available at Rheumatology Online and published previously [10, 11, 12].)
Information about race, ethnicity or ancestry is collected using the revised guidelines established by the Office of Management and Budget (OMB) at the NIH (2001) for studies recruiting minorities. These include two ethnic categories: Hispanic or Latino, and Not Hispanic or Latino; and five racial categories: American Indian or Alaska Native, Asian, Black or African-American, Native Hawaiian or Other Pacific Islander and White [NOT-OD-01-053]. Our strategy has been to enrol the participants based on self-referred race and ethnicity, but we complement this information with questions about their ancestors’ countries of origin for the previous two generations. Relatedness within the family can sometimes be imprecise, so a pedigree diagram is drawn for each proband and cross-referenced to the information obtained from other participating family members and is then confirmed through genotyping.
Our genotyping studies have included a set of ancestry informative markers (AIMs) that are subject to principal component analysis (PCA) to identify misidentified ancestry, admixture and individuals that have a unique combination of ethnic backgrounds that make them outliers for population-specific studies. The genotyping also permits corroboration of relatedness within families and correct gender assignment. This level of quality control has resulted in a loss of ~6% of participants after genotyping, but these quality control steps ensure that the best possible subjects and subject material are being provided to their experiments.
Recruitment of ethnic minorities with SLE is a priority of the LFRR, with a continuing goal of enrolling 40% of participants of non-European ancestry. In early 2010, the collection was composed of 56% of ethnic minority affected (Fig. 2). The other two focus subgroups have been multiplex pedigrees (27%) and affected males (11%) (Fig. 1 and supplementary figure available at Rheumatology Online). The availability of these more difficult to obtain subjects makes LFRR a unique resource.
After almost three decades, lupus genetics is nearing mid-life, at least for the discovery of genetic effects. In 2010, 37 genes are known to contribute to SLE, mainly in European and Asian ancestries. There must be another 50 genes that will become established with the exploration of other samples or further fine-mapping. Since these considerations do not include the genes that will be found after a careful exploration of other ancestries (especially, African and American-Indian), we predict that well over 100 genes will be convincingly associated with SLE in another decade. Each one of these genes will contribute to the mechanisms causing lupus in its own particular way.
Additional genes, especially those with minor-risk alleles (allele frequencies ≤0.05) and recessive inheritance represent an untapped source of genetic information that remains to be explored with the new genotyping and next-generation DNA-sequencing technologies. Advances in this field may prove that the estimate of only 100 genes is conservative.
The recurrent themes in the genes discovered will direct biologic work on mechanisms, holding the promise of important critical progress through basic understanding. Two such examples are the role of complement in age-related macular degeneration (366 publications since the publication of the GWAS-based discovery of association with complement Factor H in 2005 ), and autophagy in inflammatory bowel disease (83 papers published since the appreciation by GWAS of autophagy’s role in the pathogenesis of Crohn’s disease ).
The scientific goals that we envision for the LFRR are clear and diverse. We are embarking on fine-mapping of 35 of the 37 established genes associated with SLE. At the same time, we are performing GWASs in cases and controls of American-Indian, Asian and African ancestries with the goal of dissecting ancestry-specific effects and of using ancestry to help identify the polymorphisms responsible for disease risk.
Next-generation sequencing is anticipated to identify rare variants associated with lupus, which would be missed by the GWAS common-disease common-variant model. We expect that there will be a return to family studies for the identification of recessive risk variants. There are also experiments under way to understand the role of epigenetic mechanisms such as DNA methylation and microRNA dysregulation in SLE pathogenesis, and we will also make specific excursions into the world of genomic rearrangements, such as copy number variants (CNVs), inversions and indels, in an effort to be comprehensive when trying to find causal variants.
What started as a small lupus study with a staff of three has burgeoned into a multi-institutional worldwide collaborative team of scientists dedicated to the shared goal of understanding the genetic causes of SLE for the purpose of discovering better diagnostics and developing safer and more effective treatments. In 2010, the LFRR remains the largest collection of lupus multiplex and simplex pedigrees in the world and continues to offer its biological and clinical data to scientists whose research will further contribute to unraveling the lupus puzzle.
Supplementary data are available at Rheumatology Online.
The authors thank additional contributors to the LFRR effort: Sharon Johnson, Summer Frank, Amy Butler, Barbara Leatherwood, Patti Grounds, Jacey Bush, Dominique Williams, Adriana Rojas-Villarraga, Lina Amezquita, Ryan Parker, Brandon Scharrer, Tamiko Cabatic, Lauren Evans, Michelle Calvo, Nicole Weber, Kay Davis, Sarah Dawson, Kurt Downing and Neeraj Asundi. We thank the patients and all the other referring health care providers.
Funding: This work was supported by the National Institutes of Health, mainly from the National Institute of Arthritis, Musculoskeletal and Skin Diseases (N01AR62277) with important contributions from additional grants from the National Institutes of Health (R37AI24717, R01AR42460, P20RR020743, R01AR053734, P01AR049084, P20AR046669, RC1AR058554 and R01AR043274). G.S.G. and D.L.K. were supported by the South Carolina Clinical & Translational Research Institute, Medical University of South Carolina’s Clinical and Translational Science Award (CTSA) and National Institutes of Health/National Center for Research Resources (UL1RR029882). L.R.E. was funded by the Board of Regents of the State of Louisiana.
Disclosure statement: The authors have declared no conflicts of interest.