|Home | About | Journals | Submit | Contact Us | Français|
There has been great promise and expectation for the use of DNA technology in all areas of diagnostic pathology, but its use has not been widely realised. However, there are some notable success stories such as forensic DNA profiling, the identification of bacteria and viruses, human leucocyte antigen typing for organ donation, diagnosis of thalassaemias, blood borne cancers, familial breast and bowel cancer risk prediction and many monogenic diseases. DNA analysis has not reached a state of development where it is widely used for the clinical diagnosis of common complex diseases. Considerable progress has been made in the area of tumour tissue genetics for the characterisation and management of tumour subtypes.1 The technology used for this type of analysis requires further refinement such as standardisation of methodology and validation of techniques before it can be widely used clinically. Pharmacogenomics has held great promise for a number of years but it still has not delivered except for assays such as thiopurine methyltransferase and cholinesterase genotyping for scoline apnoea. It is, however, widely expected that significant developments in this area will be achieved. In the United States the Federal Drug Administration has identified as a priority area the synchronous co-development and approval of therapeutic (mainly oncology) materials in conjunction with pharmacogenomics. There are a number of these applications in the regulatory pipeline.
The purpose of this review is two fold, firstly to demonstrate how molecular biology has improved our understanding of three genetic diseases: Huntington’s disease, cystic fibrosis and hereditary haemochromatosis. These diseases are discussed to show that each is different from the point of view that being a carrier of the disease mutations will cause clinical disease (Huntington’s disease), will affect the severity of the disease (cystic fibrosis) or even if the disease is present (hereditary haemochromatosis). Secondly, to present information on how genetics may be used in the future to determine susceptibility for the development of common complex diseases.
The human genome consists of 3 billion base pairs and contains approximately 20,000 to 25,000 genes. There are single nucleotide polymorphisms (SNPs) on average every 1,250 base pairs and at least 3 million SNPs in the genome.2
Monogenic diseases have been thought of as one mutation in a critical part of a gene that causes a disease. Over recent years, however, it has become clear that this is a simplistic view and that there can be different mutations in the same gene, interactions with environmental factors and other genes that influence the expression of the disease along with the penetrance of the mutated gene. Three different diseases are discussed below to highlight some of these differences in monogenic diseases.
Huntington’s disease (HD) is an autosomal dominant neurological disease in which the patient suffers from progressive chorea, muscle rigidity, dementia and seizures. Death usually occurs 17 years after the appearance of the first symptoms. The age of onset is highly variable and has been reported from the first decade of life to greater than 60 years but usually occurs in the late 40s. The disease prevalence is about 5 in 100,000 and is transmitted in an autosomal dominant manner. The gene for HD was located on chromosome 4p16.3 in 1993.3 The 210 kb gene encodes a protein called huntingtin of 3,140 amino acids in length. The primary cause of HD is the misfolding of the protein due to an expanded polyglutamine region in exon 1 as a result of expanded CAG triplet repeats.4 N-terminal fragments of mutant huntingtin protein accumulate in neurons of the striatum and cause selective destruction of the neurons. The diagnosis of HD is by the determination of the CAG repeat size using either gel or capillary electrophoresis techniques. The interpretation of the repeat size is shown in the Table.
Subjects with ≤26 repeats will not develop the disease. Those with CAG repeats in the mutable normal allele range of 27–35 will not develop the disease but their offspring are at risk.5 During the development of sperm and oocytes the CAG repeat can expand, more so in the sperm than the oocyte. This expansion can take the CAG repeat size to >35 and cause clinical disease. Those with repeats in the range 36 to 39 may or may not develop the disease and those with repeats ≥40 will develop HD. However, it is not possible to predict when this disease will occur. Individuals with a CAG repeat size of 40, the age of HD onset can vary from 20 to 70 years,4–6 but there is a correlation between larger repeat sizes and earlier onset. The CAG repeat length has been reported to account for 65–71% of the variation in the age of onset, while environmental factors accounted for 11–19% with the remaining variation due to as yet unknown modifier genes.6–8 HD gene testing is used for predictive testing of the onset of the disease in the future, diagnostic testing for the symptomatic and prenatal testing using DNA extracted from chorionic villus samples.
It is most important that the HD predictive testing program is part of a comprehensive counselling program that includes a geneticist, psychologist, psychiatrist and social workers. Great care is taken to minimise the potential harm that could be caused by an individual finding they will develop HD in the future and that there is no cure for the condition.9 International best practice suggests that there are 5 counselling sessions and the test result is delivered to the subject at the fourth session. There is follow-up contact with both high and low risk subjects at 2, 6, 12, 18 and 24 months post result delivery.10 A number of studies have looked at the adverse effects of predictive testing.11,12 One of the larger studies reported the adverse clinical findings in a cohort of 134 reduced risk and 68 high-risk subjects.11 There was a significant increase in clinical depression in high-risk (19.0%) compared to low risk subjects (9.1%). However, psychiatric hospitalisation was more common in the low risk group (4.5%) versus the high-risk group (0%). The most striking feature were 3 suicide attempts in the low risk group compared to none in the high-risk group. It is therefore important that both the high and low risk groups are treated in a similar manner.
There are a number of ethical considerations specific to genetic testing compared with other laboratory tests. These include testing a person for HD with an at-risk parent, when the parent does not wish to know their status, and how knowing their risk status may affect relationships with other family members. Genetic discrimination can affect an individual ability to get life insurance, employment and result in social stigma. Due to these important issues each person undergoing predictive testing must be fully informed and decide to have genetic testing of their own free will.
Cystic fibrosis (CF) was first recognised as a separate disease entity in 1938 from autopsy studies of malnourished infants.13 In 1989 the gene was identified by linkage studies, independently of any prior knowledge of the structure of the CF protein.14,15 The cystic fibrosis transmembrane conductance regulator (CFTR) gene is 180 kb in size and encodes a protein of 1,480 amino acids. There have been over 1,300 disease-associated mutations described, with the most common mutation being the F508del, which occurs in approximately 70% of all CF suffers. The structure of the CFTR protein has been postulated from the cDNA sequence to contain 12 α-helices that span the apical plasma cell membrane and form a chloride channel.16 Two nucleotide-binding domains and a regulatory domain that requires phosphorylation control the gating of the channel by a kinase. Sweat ducts in patients with CF differ from those in people without the disease in the ability to reabsorb chloride before the emergence of sweat on the surface of the skin. A major pathway for chloride absorption is through CFTR, situated within luminal plasma membranes of cells lining the duct. Diminished chloride reabsorption in the setting of continued sodium uptake leads to an elevated transepithelial potential difference across the wall of the sweat duct, and the lumen becomes more negatively charged because of a failure to reabsorb chloride. The result is that total sodium chloride flux is markedly decreased, leading to increased salt content. The frequency of gene carriers is 1/25 and the prevalence is 1/2,500 in the Australian population.17 Routine testing for the 25 most common mutations in the CFTR gene will identify 90% of mutations in Caucasians.
The genotype/phenotype correlation for CF is most interesting. In classical CF, individuals have a complete loss of CFTR function.16,18 Severe mutations (including F508del) account for about 92% of mutated chromosomes in this category. They have severe pancreatic exocrine insufficiency early in life, chronic obstructive pulmonary disease, abnormal concentrations of sweat electrolytes and the males are infertile because of obstructive azoospermia due to the congenital bilateral absence of the vas deferens (CBAVD). In this condition there is a block in the transport of spermatozoa from testicular or epididymal structures to the outer genital tract, resulting in azoospermia.
In non-classical CF, individuals have at least one organ system with a CF phenotype and normal or borderline sweat chloride values. They are usually pancreatic sufficient, have at least one mild mutation and are diagnosed after childhood. Males can occasionally demonstrate fertility. Another CF phenotype is CBAVD, where individuals do not have lung or pancreatic disease, and is associated with mild mutations in CFTR. Approximately 10% of obstructive azoospermia is congenital and may be due to mutations in the CFTR gene. It is caused by either compound heterozygosity with at least one ‘mild’ mutation (10%)19 or very occasionally, by homozygosity for two ‘mild’ mutations. CFTR exon 9 can be skipped in humans due to unusual suboptimal 5’ splice sites. A polymorphic site in CFTR intron 8 (IVS8-Tn) influences the splicing of exon 9 and the level of CFTR protein production. Subjects with these changes produce two types of CFTR mRNA, with or without exon 9. If this 5-T tract is included as a mutation, up to 60% of CBAVD males are found to be carriers of one CFTR mutation upon routine screening with presumably a rare second mutation that is not included in the routine screening. Most individuals in this group have normal or borderline sweat chloride values, a significant proportion have mild sinopulmonary disease and nasal ion transport abnormalities. Patients with genotypes resulting in about 1% CFTR activity have classic CF, those with approximately <5% CFTR function are pancreatic sufficient, while those with approximately 5–10% of CFTR function have CBAVD alone and may also be at risk for monosymptomatic diseases such as pancreatitis.18
The development of CF is greatly influenced by environmental factors, different CFTR mutations and genetic factors other than CFTR mutations.20–22 Depending on where the mutations lie in the CFTR gene, the phenotype can range from early mortality with classic CF to only male infertility.
Hereditary haemochromatosis (HH) is an autosomal recessive iron-overload disorder with gradual and highly variable accumulation of iron in many organs.23 The most common form of HH begins in midlife and if untreated presents with liver disease (ranging from an elevated alanine amino-transferase, cirrhosis and hepatocellular carcinoma), endocrine disorders (diabetes mellitus, hypogonadotropic hypogonadism, impotence, and hypothyroidism), cardiac arrhythmias, heart failure and destructive arthritis. The good news is that if detected prior to significant damage to these organs the condition can be very effectively treated by therapeutic phlebotomy and the subject can expect to have a normal life span.
Since 1976 it was suspected that a gene responsible for HH existed on chromosome 6, but it was not until 1996 that the gene was isolated by positional cloning without any knowledge of the protein function.24 In most cases of HH the mutation of a single base causes an exchange of a tyrosine for cysteine at position 282 of the HFE protein (C282Y). Another mutation at position 63 causes the change of an aspartic acid for histidine (H63D). Those who develop HH can be homozygous for C282Y or compound heterozygous for C282Y and H63D. Data from a study on the Busselton population in Western Australia (n = 3,010) demonstrated the prevalence of C282Y homozygosity was 0.5%, compound heterozygosity (C282Y/H63D) was 2.2%, heterozygosity for C282Y/wild-type was 12.2% and the majority of subjects were wild-type/wild-type at 85.2%.25 In contrast, the prevalence of C282Y homozygosity in a haemochromatotic cohort was 81%, compound heterozygosity 8%, heterozygosity for C282Y/ wild-type was 4% and 7% had no detectable mutations.26 The presence of HH in the latter two groups is due to other as yet unidentified mutations causing the disease.
The frequency of C282Y carriers is 1 in 8 people of northern European descent, and 1 in 200 Caucasians are homozygous for this mutation.25 The mechanism of the disease is not well understood. However, the current suggestion is that hepcidin is a key molecule and its activity controls the rate of iron influx into the plasma from macrophages and duodenal enterocytes.23,27 When plasma iron levels are high, the synthesis of hepcidin increases, diminishing the release of iron from enterocytes and macrophages, possibly through interaction with iron-export proteins, such as ferroportin. When plasma iron levels drop, the synthesis of hepcidin is downregulated, allowing these cells to release increased amounts of iron. The stimulus and mechanisms underlying the modulation of hepcidin synthesis are currently unknown, but it is likely that HFE, transferrin receptor 2 (TfR2) and hemojuvelin (HJV) proteins are all likely to be required for hepcidin activation in response to the circulating iron levels. Lack of one of these hepcidin regulators will lead to unrestricted release of iron from macrophages and enterocytes followed by progressive expansion of the plasma iron pool, tissue iron overload and organ damage. The extent of circulatory iron overload will be marginal (HFE-haemochromatosis, or TfR2-haemochromatosis) or massive (HJV haemochromatosis).
One of the most interesting aspects of HH that has emerged since the discovery of the HFE gene is the rate of clinical penetrance of the disease in C282Y homozygous subjects. Studies that used hepatic fibrosis and cirrhosis as an end point for the development of HH have demonstrated that the penetrance rate is variable. A Norwegian study found 10% of C282Y homozygous subjects developed HH, an Australian study found 17% and a study from Utah found 29% of males over 40 years and 11% of post-menopausal females developed HH.26, 28
As a result we can say that being homozygous for C282Y is not diagnostic for the clinical expression of haemochromatosis but this mutation only predisposes for iron loading. However, it is important because C282Y homozygosity identifies subjects who are at risk of developing HH. From the few available reports, clinical penetrance is between 10% and 30% of those who are homozygous for C282Y mutations.26,28 In my experience, as a result of genetic testing for HH, liver biopsy is much less commonly required for the management of the disease.
Along with mutations in the HFE gene, which is the most common cause of HH, mutations in other genes for HJV, hepcidin and TfR2 have all been identified as causes of iron overload.23,29–31 These different genes can cause the age of onset of iron overload to be variable, affect the severity of the disease and the role environmental factors play in the development of the disease.
The discovery of genes that cause disease has changed our understanding of the basic biochemistry involved in disease processes. This is a rapidly evolving area and may lead to the development of new treatments for a number of genetic diseases. The expression of disease varies depending on clinical penetrance, environmental factors and other confounding genes. For example, expanded CAG repeats that cause HD have complete penetrance and subjects will develop the disease if they live long enough.
CF is caused by many different mutations in a single large gene. The severity of the disease depends on where the mutation is located in the gene, environmental factors and other confounding genes. In contrast, HH can be due to several mutations in the HFE gene or by mutations in other genes involved in iron metabolism that produce a similar phenotype. The most striking feature of HH is the variable penetrance of the disease in subjects with C282Y mutations. Our understanding is that these differences will continue to develop as more becomes known about the genetics of these diseases through well-designed clinical research. There are also different requirements for genetic counselling depending on the disease. Not all genetic conditions can have the intensive counselling that takes place with HD. In fact, the more common genetic diseases will be dealt with using a totally different paradigm. Testing for HH is now common place and it is not possible for comprehensive genetic counselling for all subjects apart from that offered by the attending physician.
All common complex diseases such as diabetes, cardiovascular disease (CVD), cancer, asthma, Alzheimer’s disease, arthritis and osteoporosis are caused by a complex interaction of environmental exposures impinging on our genetic predisposition to the disease over many years. Functional genetic variants or SNPs may alter gene expression with either an increased or reduced expression of a protein/enzyme or cause a change in an amino acid sequence and the properties of the protein. An excellent example of an environmental and gene interaction is the association between the common genetic variant apolipoprotein E4 (APOE4) allele with risk of cognitive decline and Alzheimer’s disease. Studies suggest that the risk of cognitive decline is particularly high in APOE4 carriers who have untreated hypertension [odds ratio (OR) = 11.0 (95% confidence interval 1.4–84)], compared to subjects without hypertension and the APOE4 allele.32 These results may provide a greater motivation for blood pressure control among people who are at elevated risk for dementia.
Each myocardial infarction (MI) is unique, due to a different blend of risk factors and genes in each individual, and is an example of a complex disease that develops on a background of genes interacting with environmental factors. In Australia, CVD is a big killer and cause of 38% of all deaths. Currently our ability to predict who will develop CVD is poor and the first signs of CHD are usually the clinical presentation due to the onset of angina, shortness of breath or MI. There are many decades prior to the clinical presentation when fatty streaks and plaques develop in the coronary arteries. If a plaque ruptures in an important part of the vascular tree the result is often fatal. The Holy Grail of CVD genetic research is to identify a number of key genetic variants that increase an individual’s risk of premature heart disease. It would be most likely that metabolic pathways associated with the common risk factors e.g. lipids, blood pressure, obesity, smoking and diabetes would be involved. Once a specific pathway is identified it could be targeted with greater vigour, either in the form of diet and lifestyle changes, or by specific drug therapy to delay the presentation of the disease.
A possible way forward is to look at SNPs in combination with conventional cardiovascular risk factors (CRFs). It has been assumed that genetic variations would be additive so that each would account for a small percentage increase of the risk for CVD. However, this may not be the case with sequence variations in multiple genes expected to act in parallel on diverse biochemical pathways, with the total effect expressed in a highly non-linear fashion. A very large numbers of rare genotypes would be required to explain 50% of a common disease in the population, even if the individual risk ratios are large [relative risk (RR) = 10–20].33 However, it has been estimated that only ~20 genes are needed to explain 50% of the burden of a disease if the predisposing genotypes are common (≥25%), even if the individual risk ratios are small (RR = 1.2–1.5). Therefore, identifying a limited number of disease susceptibility genes with common variants could explain a major proportion of common complex diseases in the population. Understanding the role of common genetic variants will provide the most important information on risk factors that cause predisposition to complex human diseases.
The major underlying cause of CVD is atherosclerosis. The multifactorial nature of atherosclerotic vascular disease is widely recognised, but defining the genetic basis has proved elusive. Relatively weak genetic effects compared with strong environmental effects such as smoking and diet have resulted in low statistical power for identifying relevant genes.33 Assessment of global risk of CVD based on measurement of a composite score of blood pressure, cholesterol level, and smoking habits enhances the statistical power of prediction, and is now recognised as a more logical basis for stratifying risk and prescribing treatment rather than individual risk factors. The metabolic syndrome or the diabetic cluster of risk factors: central obesity, glucose intolerance, insulin resistance disordered high density lipoprotein (HDL) and triglycerides, are important contributing factors to vascular risk. Defining the genomic basis of cardiovascular risk using candidate genes for these individual factors has proven elusive and a more global approach to cardiovascular genotyping is needed.
It has become evident from many studies that a single highly penetrant gene as a cause of CHD in the general population does not exist and as for all common diseases the effect of risk alleles is expected to be modest.34 This is because of genetic heterogeneity and the important influence of environmental factors. For DNA-based tests to have a place in the assessment of genetic predisposition to CHD and to be of clinical value such tests must provide information over and above that provided by conventional risk factors.
Although many candidate genes for CHD have been tested, the optimal set of risk genotypes has yet to be identified.35 There is only a relatively modest risk to be expected in association with any single genotype, published estimates being in the range 1.2–1.436,37 and reviewed by Talmud.38 Furthermore, given the multifactorial nature of CHD, lifestyle will set the operational context for genetic variants. Thus, a genotype may be associated with a high CHD risk only with exposure to a certain environment. The best example of such context dependence is that of smoking and APOE genotype with respect to CHD.39 Recently it has been suggested that only around 20 genes are usually needed to explain 50% of the burden of a disease in the population if the predisposing genotypes are common (>25%), even if the individual risk ratios are relatively small (RR = 1.2–1.5).40 This paper provided modelling based on five genes involved in cancer risk. Using data assembled on conventional risk factors and 14 SNPs in 12 candidate genes in the Northwick Park Health Study II it was demonstrated that it was possible to combine CRFs and genotypes to improve risk prediction.41 Using receiver operating characteristic (ROC) analysis the area under the curve for the CRFs of age, triglyceride, cholesterol, systolic blood pressure, and smoking was 0.66 (0.61–0.70). Combining CRFs and genotypes significantly improved discrimination (P <0.001). Inclusion of previously demonstrated interactions of smoking with lipoprotein lipase (LPL), interleukin-6 (IL-6), and platelet/endothelial cell adhesion molecule-1 (PECAM-1) genotypes increased the area under the ROC curve to 0.72 (0.68–0.76) (P <0.01 vs CRF combined with genotypes). Using a small group of selected genotypes in this study demonstrated that CHD risk estimates incorporating CRFs and genotype interactions were more effective than risk estimates that used CRFs alone.
As traditional gene mapping techniques that have been successful for monogenic diseases have been proven to not work well with complex diseases, other approaches are required. One technique is the study of candidate genes, selected because they are known to be apart of metabolic pathways involved in the development of CVD. For example, genes that are involved with low density lipoprotein (LDL) cholesterol or HDL cholesterol metabolism. Variations in the gene sequence can have a large effect on the function of the protein, e.g. LDL receptor defects causing familial hypercholesterolaemia or variants of cholesteryl ester transfer protein causing reduced HDL levels. An alternative method is to identify new genes by high throughput genetic association studies. To date there have been two completed genome-wide association studies for MI. One study assessed 92,788 SNPs in 13,738 genes and identified a locus on chromosome 6p21 which mapped to a 5-SNP haplotype of the lymphotoxin-α gene (LTA).42 In a very extensive clinical assessment involving 1,133 cases and 1,878 controls, they showed an OR of 1.8 for MI (P <0.0001). The gain-of-function effect of this LTA variant enhanced vascular adhesion molecule expression and thus considered to be pro-atherogenic and pro-inflammatory.
A second genome-wide association study using 11,053 SNPs in 6,891 genes identified four gene variants: the cytoskeletal protein palladin (OR = 1.4), the tyrosine kinase ROS1 (OR = 1.75) and two G-protein-coupled receptors, TAS2R50 (OR = 1.6) and OR13G1 (OR = 1.40).43 In a separate study by the same group a VAMP8 variant of a gene modulating platelet degranulation had an OR = 1.75 and HNRPUL-1, encoding a ribonuclear protein, had an OR = 1.92.44
There will be more genome-wide SNP association studies for MI using state-of-the-art technology that will identify new pathways that previously were not expected to be associated with the development of MIs.
It has been known for some time that DNA is inherited as units or blocks with break points distributed across the genome.45 The international consortium called HapMap has taken advantage of this fact and used SNPs to identify the many DNA blocks.46 Each block of DNA or haplotype can be identified by a set of tagging SNPs. These can be used to indicate which genes and which parts of specific genes are important in influencing the risk of various complex diseases. It has been reported that from the HapMap only 250–350,000 tag-SNPs are needed to identify each individual genomically.47 The use of tag-SNPs is an important development in identifying genes associated with complex diseases.
Another very important development in recent years is the collection of large population biobanks.48 These can be either disease oriented or population based. The former obtains material and information from people after they have developed a specific disease such as a tumour bank. Population-based biobanks store material and information from large numbers of people recruited from the general population to study the effect of lifestyle factors, environment and genes on the development of disease. Of the biobanks currently being collected the smallest plan to collect a minimum of 100,000 people with data on conventional epidemiological information and blood for genetic analysis.48 The largest proposed study is aiming for approximately 2,000,000 subjects. For example, the United Kingdom Biobank will recruit 500,000 people 40–69 years from across Britain.49 A minimum of 5,000 cases and ideally 10,000 is required to provide 80% power to detect moderately sized interaction effect (e.g. an interaction OR around 1.5–2.0 between two binary exposures each with a population prevalence of between 10% and 25%).
The past decade has seen great advances in the area of human genomics with considerable improvement in technical capacities and genomic knowledge. However, the identification of genes associated with complex diseases either by linkage or association studies has been disappointing. In addition, our statistical capacities and ability to interpret and process complex data still lags behind our capacity to generate very large amounts of genomic data.
Most complex diseases involve multiple disease-predisposing genes of modest individual effect, gene-gene interactions, gene-environment interactions and inter-population heterogeneity of both genetic and environmental determinants of disease. All positive results need to be validated in a number of different population cohorts. However, we do have a much greater insight into the difficulty of the task, but there have been a number of successes so we can be cautiously optimistic about the future.
The hope for genetic research is that it will provide a detailed biochemical mechanism of disease processes. So that in the future an individual could be screened for a complex disease well before clinical signs present and treatment could be instituted in the form of lifestyle advice or the most appropriate therapy. Genotyping may become part of the routine management of complex human diseases in the future. The genetic approach to complex diseases holds enormous potential for the future and the race for genes are well and truly underway, fast moving and highly productive.
Competing Interests: None declared.