OUP user menu

★ Frontiers in cardiovascular medicine ★

Personalized medicine: hope or hype?

Keyan Salari, Hugh Watkins, Euan A. Ashley
DOI: http://dx.doi.org/10.1093/eurheartj/ehs112 1564-1570 First published online: 2 June 2012

Abstract

Medicine has always been personalized. For years, physicians have incorporated environmental, behavioural, and genetic factors that affect disease and drug response into patient management decisions. However, until recently, the ‘genetic’ data took the form of family history and self-reported race/ethnicity. As genome sequencing declines in cost, the availability of specific genomic information will no longer be limiting. Rather, our ability to parse these data and our decision whether to use it will become primary. As our understanding of genetic association with drug responses and diseases continues to improve, clinically useful genetic tests may emerge to improve upon our previous methods of assessing genetic risks. Indeed, genetic tests for monogenic disorders have already proven useful. Such changes may usher in a new era of personalized medicine. In this review, we will discuss the utility and limitations of personal genomic data in three domains: pharmacogenomics, assessment of genetic predispositions for common diseases, and identification of rare disease-causing genetic variants.

  • Pharmacogenomics
  • Common disease risk assessment
  • Rare genetic variant discovery
  • Personalized medicine
  • Genomic medicine

See page 1553 for the editorial comment on this article (doi:10.1093/eurheartj/ehs089)

Introduction

Medicine has always been personalized. For years, physicians have incorporated environmental, behavioural, and genetic factors that affect disease and drug response into patient management decisions. However, until recently, the ‘genetic’ data took the form of family history and self-reported race/ethnicity. As genome sequencing declines in cost, the availability of specific genomic information will no longer be limiting. Rather, our ability to parse these data and our decision whether to use it will become primary. As our understanding of genetic association with drug responses and diseases continues to improve, clinically useful genetic tests may emerge to improve upon our previous methods of assessing genetic risks. Indeed, genetic tests for monogenic disorders have already proven useful. Such changes may usher in a new era of personalized medicine.

In considering where personal genomic information has begun to, or has the potential to, impact clinical medicine, three domains emerge (Figure 1): pharmacogenomics, assessment of genetic predispositions for common diseases, and identification of rare disease-causing genetic variants. In this review, we will discuss the utility and limitations of personal genomic data in each of these domains.1

Figure 1

Domains of personalized medicine.

Pharmacogenomics

Pharmacogenomics, the study of how genes modulate drug responses among individuals, is likely to be one of the first direct applications of personal genomics to clinical medicine, with several notable examples already emerging. Genes that underlie differences in drug response can harbour genetic variants involved in the pharmacokinetics of a drug (i.e. how the drug is the absorbed, distributed, metabolized, and excreted) or the pharmacodynamics of a drug (i.e. how the drug interacts with its target and its mechanism of action). Such variation can influence, and hence potentially predict, both efficacy and toxicity. The number of pharmacogenetic associations has steadily increased over the years. Today, evidence for over 2000 genes involved in drug response have been annotated by curators at the Pharmacogenomics Knowledge Base (PharmGKB; http://www.pharmgkb.org).

Early successes in pharmacogenetics have arisen in oncology, where somatic genetic changes in a patient's tumour (which often have more substantial effects than variation in an individual's germline DNA) can markedly elevate or repress gene expression (compared with normal tissues) and provide a wide therapeutic window (Figure 2A). In this context, the genetic analysis of a patient's tumour can help predict therapeutic benefit—or lack thereof—of targeted biologics, such as trastuzumab for ERBB2(HER2)-amplified breast cancers, erlotinib for epidermal growth factor receptor (EGFR) overexpressing lung cancers, or imatinib for Philadelphia chromosome-positive chronic myelogenous leukaemias. Somatic mutations in tumours can also help predict resistance, as in the case of colorectal cancers where activating mutations in KRAS have been established as a predictive marker for resistance to EGFR-specific antibodies cetuximab and panitumumab.24 Finally, somatic tumour mutations can give rise to ‘synthetic lethal’ interactions with drugs, best exemplified by the profound sensitivity of tumours with BRCA1 or BRCA2 dysfunction to the inhibition of poly(ADP-ribose) polymerase (PARP).5,6 PARP functions in the repair of DNA single-strand breaks, and in tumours with defective BRCA1/BRCA2-mediated repair, PARP inhibition seemingly leads to the persistence of DNA lesions, resulting in chromosomal instability, cell cycle arrest, and subsequent apoptosis.6 A recent clinical trial supports the efficacy of PARP inhibition among patients with BRCA-mutated tumours,7 and trials evaluating PARP inhibitors in the context of sporadic basal-like tumours (which often resemble BRCA-mutant tumours) are ongoing.

Figure 2

Clinical use of pharmacogenomic information. (A) Patients who test positive for genetic alterations that are disease-specific (e.g. BCR-ABL gene fusion in chronic myelogenous leukemia) or disease-enriched (e.g. Her2/neu receptor amplification in breast cancer) can select biologic drugs targeted specifically to their disease. (B) Patients with genetic variants affecting the pharmacokinetics or pharmacodynamics of a drug can benefit from improved prediction of drug efficacy and required dosing. (C) Patients with genetic variants associated with drug side effects can benefit by predicting and avoiding such side effects.

In cardiovascular medicine, the anti-coagulant drug warfarin represents an informative case study of how germline genetic information might help personalize a patient's treatment regimen. The appropriate dose of warfarin varies by over 10-fold among patients, and establishing the correct dose for a given patient is critically important because the therapeutic window is narrow. In addition to diet, clinical factors, and demographic variables, genetic variants in three genes—cytochrome P450, family 2, subfamily C, polypeptide 9 (CYP2C9), vitamin K epoxide reductase complex, subunit 1 (VKORC1), and cytochrome P450, family 4, subfamily F, polypeptide 2 (CYP4F2)—contribute significantly to patient variability in response to warfarin811 (Figure 2B). Recently, the International Warfarin Pharmacogenetics Consortium (IWPC) developed a pharmacogenetic algorithm to help estimate warfarin dosing and showed that it produced recommendations significantly closer to the required stable therapeutic dose than those derived from an algorithm based on only clinical variables or a fixed-dose strategy.12 Specifically, the pharmacogenetic algorithm correctly predicted low doses for 54% of all patients (when compared with the clinical algorithm which predicted low doses for 33%), while the pharmacogenetic algorithm accurately predicted high doses for 26% of patients who required high doses (vs. 9% for the clinical algorithm). Thus, the pharmacogenetic algorithm significantly improved the dose prediction for those at the tails of the dosage distribution, a group accounting for 46% of the entire cohort. A recent study reported that using patient genotype data for warfarin dosing reduced the risk of hospitalization in outpatients initiating warfarin by nearly one-third.13 While further outcome studies of pharmacogenetic-based dosing of warfarin are warranted, and while newer agents metabolized by alternative pathways may supersede these strategies, the IWPC study exemplifies an approach that can likely be generalized to other commonly prescribed drugs where individual variability and/or a narrow therapeutic index are factors.

In addition to estimating efficacy and appropriate drug doses, pharmacogenetic information has the potential to be of a clinical value when deciding between multiple treatment options to maximize treatment benefit and limit the risk of side effects (Figure 2C). For example, the cholesterol-lowering drug simvastatin can, in rare cases, cause a myopathy when administered at higher doses or in combination with certain other medications. A recent genome-wide association study (GWAS) identified a greater than 16-fold increased risk of statin-induced myopathy in homozygotes of a common variant within SLCO1B1, a gene known to regulate the hepatic uptake of statins.14 Another recent study identified that in statin-treated men with coronary artery disease (CAD), those with intrinsically low levels of cholesteryl ester transfer protein (carriers of the TaqIB-B2 allele) have increased 10-year mortality, suggesting an adverse pharmacogenetic interaction.15 Finally, several pharmacogenetic variants have been associated with treatment benefit of the angiotensin-converting enzyme-inhibitor perindopril in patients with stable CAD.16 Not all pharmacogenetic associations, however, are appropriate for population-based screening. In the case of severe statin-induced myopathy, the rarity of the reaction lowers the positive predictive value of this particular variant; nonetheless, the large effect size suggests that genetic testing to avoid serious drug toxicity may have clinical utility for some drugs, thereby helping achieve their benefits in patients more safely and effectively.

Clopidogrel, a thienopyridine that inhibits platelet aggregation, represents another illustrative example for cardiovascular pharmacogenomics. Delivered as an inactive prodrug, it is activated in vivo by several cytochrome P450 enzymes in the liver including CYP2C19 (Figure 2B). Variants in CYP2C19 can cause the loss of enzymatic function and thus lower the conversion rate to the active drug, thereby reducing efficacy.17,18 The clinical relevance of this was initially described in two contemporaneous studies: the Trial to Assess Improvement in Therapeutic Outcomes by Optimizing Platelet Inhibition with Prasugrel-Thrombolysis in Myocardial Infarction (TRITON-TIMI) 38 and a French registry study of acute MI patients.19,20 In the TRITON-TIMI 38 trial, among persons treated with clopidogrel, carriers of a reduced-function CYP2C19 allele had significantly lower levels of the active metabolite of clopidogrel, diminished platelet inhibition, and a higher rate of major adverse cardiovascular events, including stent thrombosis, than did non-carriers. In the French study, patients undergoing percutaneous coronary intervention for acute myocardial infarction who had two loss of function alleles had >3 times higher risk of death, myocardial infarction or stroke. The genetics substudy of the Platelet Inhibition and Patient Outcomes (PLATO) trial subsequently confirmed a higher event rate in the patient group with any CYP2C19 loss of function variant.21 In response to these findings, the U.S. Food and Drug Administration (FDA) added a black box warning to clopidogrel describing this patient group as one at higher risk. However, debate has continued. Effects were not seen in the analysis of patients from two more recent studies: the Clopidogrel in Unstable Angina to Prevent Recurrent Events (CURE) trial and the Atrial Fibrillation Clopidogrel Trial with Irbesartan for Prevention of Vascular Events (ACTIVE) trial.22 Both studies compared clopidogrel with placebo in combination with aspirin for reducing cardiovascular events and found no difference in the loss of function allele carriers. Interestingly, the gain of function carriers in the CURE group did appear to obtain more benefit. Adding further to the debate, in the TRITON-TIMI 38 trial, in almost 3000 patients, ABCB1 and CYP2C19 variants were significantly independent predictors of CV death, MI and stroke.23 In sum, most investigators conclude that there is an effect of CYP2C19 genotype on the platelet response to clopidogrel in the setting of percutaneous coronary intervention but there has not emerged sufficient consensus to recommend routine genetic testing in clinical practice. Indeed, most have used the debate surrounding genotyping as a reason to endorse newer alternatives such as ticagrelor that are not metabolized via this route.

Recognizing the value of pharmacogenetic information, the FDA has already approved, or relabelled, several drugs reflecting the variation in response due to genetic factors. In 2005, the FDA approved the combination drug isosorbide dinitrate/hydralazine (BiDil) for congestive heart failure specifically in African-American patients, emphasizing the principle of targeted therapy, albeit ethnicity-based rather than genetically derived. More recently, the FDA has added pharmacogenetic information to the product labels of warfarin, clopidogrel, and carbamazepine, though they have not included specific guidance for incorporating this information into drug choice or dosing. The FDA has also approved the design of a clinical trial for bucindolol, a β-blocker and mild vasodilator, in a genotype-defined heart failure population.24 This prospective trial highlights the value of pharmacogenetic information from the perspective of pharmaceutical companies, whose drugs may not achieve clinical endpoints needed for FDA approval in the general population, but may be successful in genotype-defined subpopulations.

Despite the large number of well-established and putative pharmacogenetic associations, significant challenges remain in incorporating this information into clinical practice. First, it will be important that only robustly replicated associations are exploited. For example, genotype at the KIF6 locus has been proposed, and fully commercialized, as a test of responsiveness to statin therapy on the basis of a number of smaller studies,2527 yet the underlying association did not replicate at all in a far larger analysis.28 Even where the genetic association is robust, the clinical utility of each pharmacogenetic test needs to be evaluated, taking into consideration the sensitivity, specificity, and positive and negative predictive values. For example, if a specific side effect from a drug is exceedingly rare, even a genetic variant that predicts the side effect with high sensitivity and specificity may have a sufficiently low positive predictive value to make testing for the variant cost ineffective on a population level. After appropriate cost-effectiveness analysis, there may remain only few examples of agents where germline genetic effects are large enough, therapeutic index low enough, and costs high enough that genetic testing pre-prescription will be warranted. On the other hand, if, as may increasingly be the case in the future, a patient had their genotype or whole-genome data available, several notable pharmacogenetic findings may be of interest to their physicians. In a recent study evaluating one patient's whole-genome sequence, 63 clinically relevant pharmacogenomic variants were noted as well as six novel, non-conservative, amino acid-changing single-nucleotide polymorphisms (SNPs) in genes that are important for drug response.1 As more pharmacogenetic associations are discovered, validated, and evaluated for clinical utility, we expect pharmacogenomics to continue to be a fertile ground for clinical translation and the practice of personalized medicine (for a detailed review of pharmacogenetic associations that impact cardiovascular medicine, see Verschuren et al.29).

Common disease risk assessment

Over the past two decades, candidate gene association studies and, more recently, GWASs have provided researchers with a large catalogue of genetic variant associations with hundreds of common human diseases and traits.30 While these studies were carried out in large part to identify new biological genes, pathways, and potential drug targets relevant to the studied phenotypes, there has been some interest in extending these findings to the estimation of individual risk and, in some cases, such estimations have been offered as direct to the consumer/patient genetic tests. Here, we discuss four aspects of genetic variants for disease risk estimation: the risk metric, selection of risk variants, integration of multiple risk variants as well as other risk factors of disease, and the clinical utility of the risk predictors.

Risk metric

Most GWASs employ a case–control study design and relate the risk of developing a disease due to a specific genetic variant via an odds ratio (OR). However, the assumption that the OR approximates the true risk only holds for rare diseases (i.e. typically diseases with an incidence <10%). For GWASs evaluating common diseases like cardiovascular disease or type 2 diabetes, the OR will always overestimate the true risk, and the magnitude of the inflation increases as the disease prevalence increases.

A more recently proposed alternative to OR-based risk assessment is the use of likelihood ratios (LRs). In the context of genetic data, an LR is the ratio of the probability of observing a specific genotype in diseased individuals (cases) to that in healthy individuals (controls).31 Likelihood ratios from multiple unlinked genetic variants can be multiplied together and used to transform a pre-test to a post-test probability of disease, metrics that are familiar in the framework of evidence-based medicine.

Variant selection

Selecting the genetic variants that might be informative in estimating individual disease risk is a challenge but critically important. Attributes of GWASs that should be considered when selecting disease-associated variants are the population or ethnicity of the study subjects in which the genetic association was discovered, the sample size of the study, the set of variants scanned in the study, and whether the association has been replicated. Most GWASs have evaluated one form of genetic variation—SNPs—and have focused specifically on those SNPs with a minor allele frequency of at least 5% in populations of European descent. This focus is grounded in the ‘common disease–common variant hypothesis’, which posits that common genetic variants in the population underlie the risk for common genetic diseases. The hypothesis further suggests that such diseases are likely caused by a large number of common variants, each contributing only a small risk and thereby evading negative evolutionary selection. The vast majority of SNP associations confidently identified to date do indeed carry modest effect sizes (i.e. ORs of 1.0–1.5), but collectively they only explain a small proportion of the heritability of most common diseases.32 Thus, even if all the genetic variants discovered to date are selected appropriately for their inclusion in an individual's estimate of disease risk, the predictive power may remain low since much of the genetic variation underlying risk for many diseases remains undiscovered.

Furthermore, most genetic associations have been identified in populations of European descent. Thus, due to differences in the pattern of linkage disequilibrium between different populations, genetic associations identified in these studies may not be applicable across populations. For example, a genetic variant identified to confer the risk of cardiovascular disease among Europeans may be inappropriate to include in the risk estimate for an individual of African descent. Additional studies in diverse populations are thus warranted to identify shared and novel genetic associations in non-European populations.

Caution should also be exercised in selecting genetic variants whose associations with disease have not been replicated. Because of the well-known publication bias of reporting positive results and larger effect sizes (also known as ‘winner's curse’),33 genetic variants are commonly unable to be reproducibly associated with a given disease, and initial reports typically overestimate the effect size of the risk variant.

To find additional determinants of heritability, efforts are being focused towards larger studies of common variation as well as studies of rare variants, other forms of genetic variation aside from SNPs, and interactions between genes or between genes and the environment.34 For example, height is a trait where 80% of phenotypic variation is heritable or attributable to genetic factors; however, the SNPs identified by individual GWASs (each analysing typically 1000–5000 individuals) collectively have only accounted for less than 5% of the variation in height. The largest genetic study to date, a meta-analysis of 46 GWASs (>130 000 individuals), recently identified 180 loci significantly associated with height that extends the proportion of variance explained to 10%.35 Similarly, a recent meta-analysis of 14 GWASs of coronary artery disease (CAD) comprising 22 233 individuals with CAD and 64 762 controls followed by genotyping of top association signals in 56 682 additional individuals identified 13 new susceptibility loci and confirmed 10 of 12 previously reported CAD loci, together explaining ∼10% of the genetic variance of CAD.36 As genotyping technologies continue to become cheaper, such larger studies and meta-analyses (i.e. >100 000 subjects) will surely help identify more common variants associated with common diseases.35,37

However, these studies will still fall grossly short of accounting for the total heritability of height or common human diseases, because many of the common SNPs with modest effect sizes are difficult to identify under the stringent statistical thresholds for significance imposed by correction for multiple hypothesis testing. There are some indications that the accepted standards, historically adopted to minimize false positives, may be contributing to false-negative findings. For instance, when Yang et al.38 used all common SNPs (not just those reaching the standard 5 × 10−8 level of significance) simultaneously to explain variation in height, 45% of the phenotypic variation could be accounted for, indicating that much of the heritability of common traits is not ‘missing’ but rather has not previously been detected because the individual effects of most SNPs are too small to pass stringent significance tests. The model employed by the study authors estimated the total amount of phenotypic variance accounted for by all common SNPs, but the accuracy of prediction from this model is low because the effects of individual SNPs are estimated with much error. Nonetheless, this study demonstrates that with better estimates of individual SNP effects from ever larger studies, risk prediction using common SNPs could become more robust in the future, with potentially important implications for personalized medicine.

To identify rare variants associated with disease, one promising approach is to sequence across GWAS-associated loci. For example, Johansen et al.39 recently sequenced previously reported GWAS loci in individuals with extreme blood lipid profiles and identified rare variants with large effect sizes. While this finding is so far limited to an intermediate trait (i.e. lipid levels), it may prove generalizable to disease phenotypes. In addition to sequencing, genotyping platforms continue to include more SNPs with lower minor allele frequencies, which should help identify additional loci as well as clarify whether some association signals in GWASs attributed to common SNPs are actually the result of synthetic association or incomplete linkage disequilibrium between common SNPs and rarer causal variants;38,40 preliminary evidence, however, suggests that synthetic associations are unlikely to account for many GWA signals.41,42

Also warranting further investigation are additional forms of genetic variation. Copy number variants (CNVs) and other forms of structural variation affect a substantial portion of the genome and likely contribute to disease phenotypes. Early studies have noted that common CNVs account for only a limited proportion of heritability,43,44 yet some notable examples exist.45 Future efforts should examine a broader range of structural variation including those that are individually rare yet in aggregate implicate specific genes.46

Risk integration

To compute an overall risk of disease conferred by multiple genetic variants, results from several GWASs are often integrated. A common standard is to multiply the ORs corresponding to each risk variant to arrive at a composite OR. This method of integration assumes statistical independence between each of the risk variants, a scenario unlikely to remain true as the number of variants being considered increases, potentially leading to further overestimation of disease risk. One accepted approach has been to simply sum up the number of well-replicated risk alleles for a specific disease that any given individual carries. With this ‘allelic dose’ scoring approach, several investigators have evaluated the clinical utility of multilocus genotype scores in predicting disease outcomes (discussed below).4752 Summing the number of risk alleles weighted by their estimated effect sizes has also been proposed as a method for computing multilocus genotype risk scores.51,53

Integrating risk conferred by multiple variants and non-genetic factors remains a challenge for personalized medicine. Due to the complex genetic architecture of many common diseases, it is difficult to identify the interactions and dependencies between distinct genetic and non-genetic risk factors. Similarly, given the large role of environmental factors in the development of these diseases, integrating the contribution of such factors to disease risk is important. Formulating the risk contributions of environmental factors as LRs might serve as one method to compatibly integrate genetic and non-genetic risk factors.

Clinical utility

Finally, the clinical utility of the genetic variants used to estimate disease risk should be evaluated. In order for individual disease risk estimates based on personal genetic data to be clinically useful, they must demonstrate robust performance similar to established biomarkers. Performance of prediction models is commonly evaluated using the sensitivity, specificity, positive predictive value, and negative predictive value, often summarized by a receiver operating characteristic (ROC) curve. This approach has been utilized in most of the initial studies examining the clinical utility of common genetic variation in predicting risk for common diseases. Generally, these studies have shown that the inclusion of common SNPs identified in GWASs only modestly improves disease prediction beyond the standard clinical factors.4752 We believe that these results in part reflect the fact that most of the genetic variation underlying the diseases studied remains undiscovered; for diseases with a substantial component of heritability, the inclusion of a more complete set of predictive genetic variants can be expected to improve the overall risk prediction, albeit with the caveat that the definition of individual small effect variants will require progressively larger studies. However, these studies also draw attention to the need for alternative methods for assessing the contribution of new biomarkers (SNPs or otherwise) to risk prediction. Disease risk prediction is not by nature a binary classification problem, and thus the ROC curve is not as appropriate for evaluating clinical utility of risk predictors as it is for diagnostic tests.54,55 For example, a risk factor with an OR of 3 might have limited impact on the ROC curve, but may result in a substantial shift in absolute risk. Indeed, most established clinical risk factors have ORs <3 and fall in this category. The large change in absolute risk may lead to different patient management decisions. This is exemplified by a recent study of the performance of breast cancer risk predictors using clinical and genetic data, where clinical parameters generated a predictor with an area under the ROC curve (AUC) of only 0.58 (only modestly better than an AUC of 0.5 expected by chance).52 The addition of 10 common genetic variants increased the AUC to 0.618, but more importantly shifted over 50% of patients to a different quintile of disease risk. Therefore, additional studies of clinical utility of SNPs identified in GWA studies are warranted using more relevant statistical metrics for model performance, such as calibration and clinical risk reclassification.56 Along with the continuing development of a more complete catalogue of genetic associations, we anticipate that these efforts are likely to show the clinical utility of genetic variants can be at least comparable with established risk factors, as was shown for breast cancer risk models, and may substantially improve some disease risk predictors.

Rare genetic variant discovery

The identification and characterization of rare genetic variants is a promising approach to elucidating the molecular underpinnings of rare and common genetic diseases. The highest level of ‘causality’ has been ascribed to single variants in single genes detected by family based linkage studies. This reflects the substantial portion of disease risk explained by those variants, albeit with expressivity modulated by modifying genomes and environment. Indeed, as many publications have already demonstrated,5762 the unparalleled power of sequencing to identify segregating variants even in very small families is likely to bring genetic solutions to many families' rare syndromes. As the cost to perform full exome (the collection of all exons or protein-coding sequences in the genome) or whole-genome sequencing precipitously falls63 (Figure 3), this may be the most immediate impact of the wide availability of high-throughput sequencing. While individually rare by definition, rare genetic diseases are collectively quite common, affecting more than 25 million people worldwide.64 Further, if individuals who are carriers for such diseases could be identified early in life, the risk of passing on variants to future generations could be mitigated by pre-conception planning or pre-implantation genetic diagnosis. Indeed, direct-to-consumer genetic testing companies have already begun developing and marketing personal genetic tests for couples planning to conceive.

Figure 3

Cost of genome sequencing. The cost of whole-genome sequencing is plotted as a function of time, from the sequencing of the first human genome to present. Notably, since 2007 genome sequencing costs have declined faster than predicted by Moore's law.

As discussed above, efforts to identify rare variants are increasingly applied to help uncover determinants of heritability of common diseases. In this approach, the focus is on identifying many rare variants that each contribute to risk of the same disease (i.e. genetic heterogeneity). A primary advantage of using the sequencing-based, rather than chip-based, technology is that the former has the capability to discover novel genetic variants that have never been observed before or have been observed so infrequently as to not warrant inclusion on a genotyping chip. Once thought to be relatively unimportant, these rare or novel genetic variants are now increasingly recognized as a source of genetic variation that underlies several common diseases. Most recently, the first confident estimates of germ line variation have emerged from the 1000 Genomes Project and an earlier study focusing on a family quartet.65 At an estimated 10−8 new mutations naturally arising per base pair per generation, this leads to ∼70–100 novel variants per new human genome. In addition to these newly arising variants, each individual genome sequenced has been found to carry a far larger number (up to 10 000) of very rare, often previously unrecorded, DNA variants, perhaps as many as 100 of which are predicted to result in loss of function of a gene.6668 However, estimating pathogenicity from these data presents many challenges. Specifically, distinguishing causative variants from the large number of non-causative variants discovered by exome or whole-genome sequencing requires prioritizing candidate variants by evidence of their functional impact and validating candidates to establish causality.

There are several approaches investigators have used to prioritize candidate genetic variants. One common and intuitive approach is to search for variants in coding sequences that are predicted to be deleterious. This would include variants that introduce new start or stop codons, frameshift variants, or variants disrupting splicing. However, for many genes, one functional copy is sufficient such that loss of the other copy can be silent, as witnessed by the surprisingly large number of ‘loss of function’ variants found in each individual genome.6668 To understand more subtle changes in protein structure and function (which may in some cases be more deleterious), biophysical factors may be relevant. A variety of computational tools have been developed to help investigators prioritize candidate genetic variants by predicting their functional impact, such as the Sorting Intolerant from Tolerant (SIFT) algorithm,69 Polymorphism Phenotyping (PolyPhen),70 the Universal Protein Resource (UniProt) database,71 and PolyDoms.72 Manual curation of SNPs associated with a known or suspected disease gene can also be done using annotations in the Online Mendelian Inheritance in Man (OMIM) database and Human Gene Mutation Database.73 Despite the availability of tools and databases for the analysis of rare or novel genetic variants, characterizing these variants remains quite challenging due to the fragmented nature of the curated databases, thus making it difficult to annotate novel/rare variants from whole-genome sequence data.

The reference human genome creates another challenge in the confident interpretation of high-throughput sequencing data. As a composite of a small number of individuals' DNA (all likely of European descent), the reference genome sequence by definition contains risk alleles.74 In time, with greater computational power or longer reads, de novo assembly of genomes will be more easily achievable. This will facilitate the return of a sequence rather than the more computationally convenient method of mapping to a known reference. Further, a less biased view can be obtained by comparing risk alleles in an individual genome to a composite major allele reference genome sequence.75

Conclusions

Over the next decade, we will gain an unprecedented appreciation of the genetic variation present within human genomes. This collective undertaking will yield tremendous benefits of new target and pathway discovery related to a multitude of human diseases. Directed to the individual patient, these efforts will provide a basis for refining personal predictors of disease and drug response. Aside from the technological and informatics challenges, the prospect that physicians will, in the next few years, be expected to include individual exome, or whole-genome, data in patient management raises obvious educational,76 ethical, and social implications.77 In addition, much genetic testing will be marketed direct to consumer. At present, this extends largely to common variants assessed by genotyping SNP arrays; variants identified this way will typically have a modest impact on a common disease and in turn may not be too problematic. However, it will soon become more cost effective to obtain a patient's exome or whole-genome sequence; the expectation that uncommon, highly predictive variants will also be uncovered in a significant minority illustrates the need for a well thought-through and regulated framework.

How the systematic exploration of human genetic variation will transform diagnostic and clinical practices remains to be seen, but tempered by the challenges discussed here, the analyses of human diseases powered by the genomic revolution bolsters hope for sustained and meaningful improvements in the diagnosis, prognosis, and treatment of individual patients.

Funding

K.S. is supported by the Medical Scientist Training Program at Stanford University School of Medicine. H.W. is supported by the NIHR Comprehensive Biomedical Research Centre and the BHF Centre of Excellence, at Oxford. E.A.A. is supported by the NIH NNN (grant no R01: NIH 5R01HL105993 and DP2: NIH DP2 OD004613).

Conflict of interest: none declared.

References

View Abstract