Diabetes is the leading cause of treated ESRD, accounting for almost half of the new cases each year (
1–
3). Among European Americans with type 1 diabetes, approximately one in three develops severe nephropathy that leads to ESRD (
4–
6). Evidence that genetic susceptibility plays an important role in diabetic nephropathy in type 1 diabetes first was presented more than a decade ago by Seaquist
et al. (
7) and Borch-Johnsen (
8), and subsequent studies by researchers at the Joslin Diabetes Center (
9) and The Diabetes Control and Complications Trial Research Group (
10) further characterized the nature of the genetic effect.
Despite the strong evidence for genetic susceptibility factors, success in identifying the responsible genetic variants has been limited by the modest data collections that individual research groups have been able to assemble. The Genetics of Kidneys in Diabetes (GoKinD) study, an initiative supported by the Juvenile Diabetes Research Foundation (JDRF) and by the National Institute of Diabetes and Digestive and Kidney Diseases and the Centers for Disease Control and Prevention, was conceived to address this bottleneck by assembling a large DNA collection that is suitable for genetic association studies of nephropathy in type 1 diabetes.
The resulting collection includes nearly 1900 individuals with long-term (10+ yr) type 1 diabetes, half with nephropathy (943 case patients) and half without (946 control subjects). The set of case patients includes two subgroups: 328 patients with persistent proteinuria and 615 with ESRD. The set of control subjects consists only of individuals with normoalbuminuria despite 15 yr of type 1 diabetes. Both sets can be partitioned into two subsets: Those with neither parent enrolled (singletons) and those with both parents enrolled (trios). The totals as of November 2005 included 671 case singletons, 272 case trios, 623 control singletons, and 323 control trios.
The concept of using family trios to detect genetic association was developed more than a decade ago by various researchers who were wary of implicating a genetic variant simply because it happens to occur with greater frequency in a subset of the study participants who also have a relatively high occurrence of disease. To illustrate, consider a study of osteoporosis in individuals of European descent. If, in general, osteoporosis is more common in those of northern European descent compared with southern European descent, then any genetic variant that is more common in the former will tend to exhibit association with case-control analysis. The gold standard that has emerged for addressing such population stratification is the transmission/disequilibrium test (TDT) (
11). The TDT procedure evaluates case trios in such a way that only relevant genetic variants are identified. An excellent review of the TDT has been written by two of the pioneers of the field, Ewens and Spielman (
12). Recently, Scott and Rogus (
13) examined the utility of control trios and found that they are useful in special situations, such as when a disease is highly prevalent or when certain types of gene—environment interaction exist.
GoKinD uses both case trios and control trios as well as a set of unrelated case and control singletons. The advantage of including singletons is that, in addition to being much easier to identify and ascertain, they offer exceptionally high power to detect genetic association. The tradeoff, of course, is that they are prone to false-positive results if population stratification exists.
The GoKinD Collection of DNA and clinical documentation of case patients and control subjects are available to the research community through an application process that is accessible on the GoKinD web site (
https://www.gokind.org/access). Nonrenewable samples also will become available at a later date. Broad distribution of the collection is intended to spark creativity with regard to both the genetic variants studied and the analytic approaches used. These approaches are not limited to those that require the whole collection. The large collection also may be used as a sampling frame for selecting narrowly defined groups for testing very specific hypotheses. Here, we summarize the clinical characteristics of the study groups and provide detailed power calculations for each of the collection’s design components. Finally, we discuss some analytic challenges that await potential users of the collection.