PK Ontology is composed of several components: experiments, metabolism, transporter, drug, and subject (Table
). Our primary contribution is the ontology development for the PK experiment, and integration of the PK experiment ontology with other PK-related ontologies.
specifies in vitro
and in vivo
PK studies and their associated PK parameters. Table
presents definitions and units of the in vitro
PK parameters. The PK parameters of the single drug metabolism experiment include Michaelis-Menten constant (Km
), maximum velocity of the enzyme activity (Vmax
), intrinsic clearance (CLint
), metabolic ratio, and fraction of metabolism by an enzyme (fmenzyme
]. In the transporter experiment, the PK parameters include apparent permeability (Papp), ratio of the basolateral to apical permeability and apical to basolateral permeability (Re), radioactivity, and uptake volume
]. There are multiple drug interaction mechanisms: competitive inhibition, non-competitive inhibition, uncompetitive inhibition, mechanism based inhibition, and induction
is the inhibition concentration that inhibits to 50% enzyme activity; it is substrate dependent; and it doesn’t imply the inhibition mechanism. Ki
is the inhibition rate constant for competitive inhibition, noncompetitive inhibition, and uncompetitive inhibition. It represents the inhibition concentration that inhibits to 50% enzyme activity, and it is substrate concentration independent. Kdeg
is the degradation rate constant for the enzyme. KI
is the concentration of inhibitor associated with half maximal Inactivation in the mechanism based inhibition; and Kinact
is the maximum degradation rate constant in the presence of a high concentration of inhibitor in the mechanism based inhibition. Emax
is the maximum induction rate, and EC50
is the concentration of inducer that is associated with the half maximal induction
The in vitro
experiment conditions are presented in Table
. Metabolism enzyme experiment conditions include buffer, NADPH sources, and protein sources. In particular, protein sources include recombinant enzymes, microsomes, hepatocytes, and etc. Sometimes, genotype information is available for the microsome or hepatocyte samples. Transporter experiment conditions include bi-directional transporter, uptake/efflux, and ATPase. Other factors of in vitro
experiments include pre-incubation time, incubation time, quantification methods, sample size, and data analysis methods. All these info can be found in the FDA website (http://www.abclabs.com/Portals/0/FDAGuidance_DraftDrugInteractionStudies2006.pdf
In vitro experiment conditions
The in vivo
PK parameters are presented in Table
. All of the information are summarized from two text books
]. There are several main classes of PK parameters. Area under the concentration curve parameters are (AUCinf
, AUMC); drug clearance parameters are (CL, CLb
); drug concentration parameters are (Cmax
); extraction ratio and bioavailability parameters are (E, EH
, F, FG
); rate constants include elimination rate constant k, absorption rate constant ka, urinary excretion rate constant ke, Michaelis-Menten constant Km, distribution rate constants (k12
), and two rate constants in the two-compartment model (λ1
); blood flow rate (Q, QH
); time parameters (tmax
); volume distribution parameters (V, Vb
); maximum rate of metabolism, Vmax; and ratios of PK parameters that present the extend of the drug interaction, (AUCR, CL ratio, Cmax ratio, Css
It is also shown in Table
that two types of pharmacokinetics models are usually presented in the literature: non-compartment model and one or two-compartment models. There are multiple items need to be considered in an in vivo PK study. The hypotheses include the effect of bioequivalence, drug interaction, pharmacogenetics, and disease conditions on a drug’s PK. The design strategies are very diverse: single arm or multiple arms, cross-over or fixed order design, with or without randomization, with or without stratification, pre-screening or no-pre-screening based on genetic information, prospective or retrospective studies, and case reports or cohort studies. The sample size includes the number of subjects, and the number of plasma or urine samples per subject. The time points include sampling time points and dosing time points. The sample type includes blood, plasma, and urine. The drug quantification methods include HPLC/UV, LC/MS/MS, LC/MS, and radiographic.
CYP450 family enzymes predominantly exist in the gut wall and liver. Transporters are tissue specific. Table
presents the tissue specific transports and their functions. Probe drug is another important concept in the pharmacology research. An enzyme’s probe substrate means that this substrate is primarily metabolized or transported by this enzyme. In order to experimentally prove whether a new drug inhibits or induces an enzyme, its probe substrate is always utilized to demonstrate this enzyme’s activity before and after inhibition or induction. An enzyme’s probe inhibitor or inducer means that it inhibits or induces this enzyme primarily. Similarly, an enzyme’s probe inhibitor needs to be utilized if we investigate whether a drug is metabolized by this enzyme. Table
presents all the probe inhibitors, inducers, and substrates of CYP enzymes. Table
presents all the probe inhibitors, inducers, and substrates of the transporters. All these information were collected from industry standard (http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm064982.htm
), reviewed in the top pharmacology journal
Tissue specific transporters
In vivo probe inhibitors/inducers/substrates of CYP enzymes
In vivo probe inhibitors/inducers/substrates of selected transporters
The cytochrome P450 superfamily (officially abbreviated as CYP) is a large and diverse group of enzymes that catalyze the oxidation of organic substances. The substrates of CYP enzymes include metabolic intermediates such as lipids and steroidal hormones, as well as xenobiotic substances such as drugs and other toxic chemicals. CYPs are the major enzymes involved in drug metabolism and bioactivation, accounting for about 75% of the total number of different metabolic reactions
]. CYP enzyme names and genetic variants were mapped from the Human Cytochrome P450 (CYP) Allele Nomenclature Database (http://www.cypalleles.ki.se/
). This site contains the CYP450 genetic mutation effect on the protein sequence and enzyme activity with associated references.
are proteins which serves the function of moving other materials within an organism. Transport proteins are vital to the growth and life of all living things. Transport proteins involved in the movement of ions, small molecules, or macromolecules, such as another protein, across a biological membrane. They are integral membrane proteins; that is they exist within and span the membrane across which they transport substances. Their names and genetic variants were mapped from the Transporter Classification Database (http://www.tcdb.org
). In addition, we also added the probe substrates and probe inhibitors to each one of the metabolism and transportation enzymes (see prescribed description).
names was created using the drug names from DrugBank 3.0
]. DrugBank consists of 6,829 drugs which can be grouped into different categories of FDA-approved, FDA approved biotech, nutraceuticals, and experimental drugs. The drug names are mapped to generic names, brand names, and synonyms.
included the existing ontologies for human disease ontology (DOID), suggested Ontology for Pharmacogenomics (SOPHARM),, and mammalian phenotype (MP) from
)The PK ontology was implemented with Protégé
] and uploaded to the BioPortal ontology platform.
A PK abstract corpus was constructed to cover four primary classes of PK studies: clinical PK studies (n = 56); clinical pharmacogenetic studies (n = 57); in vivo
DDI studies (n = 218); and in vitro
drug interaction studies (n = 210). The PK corpus construction process is a manual process. The abstracts of clinical PK studies were selected from our previous work, in which the most popular CYP3A substrate, midazolam was investigated
]. The clinical pharmacogenetic abstracts were selected based on the most polymorphic CYP enzyme, CYP2D6. We think these two selection strategies represent very well all the in vivo
PK and PG studies. In searching for the drug interaction studies, the abstracts were randomly selected from a PubMed query, which used probe substrates/inhibitors/inducers for metabolism enzymes reported in the Table
Once the abstracts have been identified in four classes, their annotation is a manual process (Figure
). The annotation was firstly carried out by three master level annotators (Shreyas Karnik, Abhinita Subhadarshini, and Xu Han), and one Ph.D. annotator (Lang Li). They have different training backgrounds: computational science, biological science, and pharmacology. Any differentially annotated terms were further checked by Sara K. Quinney and David A. Flockhart, one Pharm D. and one M.D. scientists with extensive pharmacology training background. Among the disagreed annotations between these two annotators, a group review was conducted (Drs Quinney, Flockhart, and Li) to reach the final agreed annotations. In addition a random subset of 20% of the abstracts that had consistent annotations among four annotators (3 masters and one Ph.D.), were double checked by two Ph.D. level scientists.
PK corpus annotation flow chart.
A structured annotation scheme was implemented to annotate three layers of pharmacokinetics information: key terms, DDI sentences, and DDI pairs (Figure
). DDI sentence annotation scheme depends on the key terms; and DDI annotations depend on the key terms and DDI sentences. Their annotation schemes are described as following.
A three level hierarchical PK and DDI annotation scheme.
Key terms include drug names, enzyme names, PK parameters, numbers, mechanisms, and change. The boundaries of these terms among different annotators were judged by the following standard.
• Drug names
were defined mainly on DrugBank 3.0
]. In addition, drug metabolites were also tagged, because they are important in in vitro
studies. The metabolites were judged by either prefix or suffix: oxi, hydroxyl, methyl, acetyl, N-dealkyl, N-demethyl, nor, dihydroxy, O-dealkyl, and sulfo. These prefixes and suffixes are due to the reactions due to phase I metabolism (oxidation, reduction, hydrolysis), and phase II metabolism (methylation, sulphation, acetylation, glucuronidation)
• Enzyme names
covered all the CYP450 enzymes. Their names are defined in the human cytochrome P450 allele nomenclature database,
. The variations of the enzyme or gene names were considered. Its regular expression is (?:cyp|CYP|P450|CYP450)?[0–9][a-zA-Z][0–9](?:\*[0–9])?$.
• PK parameters were annotated based on the defined in vitro and in vivo PK parameter ontology in Table
. In addition, some PK parameters have different names, CL = clearance, t1/2 = half-life, AUC = area under the concentration curve, and AUCR = area under the concentration curve ratio.
• Numbers such as dose, sample size, the values of PK parameters, and p-values were all annotated. If presented, their units were also covered in the annotations.
• Mechanisms denote the drug metabolism and interaction mechanisms. They were annotated by the following regular expression patterns: inhibit(e(s|d)?|ing|ion(s)?|or)$, catalyz(e(s|d)?|ing)$, correlat(e(s|d)?|ing|ion(s)?)$, metaboli(z(e(s|d)?|ing)|sm)$, induc(e(s|d)?|ing|tion(s)?|or)$, form((s|ed)?|ing|tion(s)?|or)$, stimulat(e(s|d)?|ing|ion(s)?)$, activ(e(s)?|(at)(e(s|d)?|ing|ion(s)?))$, and suppress(e(s|d)?|ing|ion(s)?)$.
• Change describes the change of PK parameters. The following words were annotated in the corpus to denote the change: strong(ly)?, moderate(ly)?, high(est)?(er)?, slight(ly)?, strong(ly)?, moderate(ly)?, slight(ly)?, significant(ly)?, obvious(ly)?, marked(ly)?, great(ly)?, pronounced(ly)?, modest(ly)?, probably, may, might, minor, little, negligible, doesn’t interact, affect((s|ed)?|ing|ion(s)?)?$, reduc(e(s|d)?|ing|tion(s)?)$, and increas(e(s|d)?|ing)$.
The middle level annotation focused on the drug interaction sentences. Because two interaction drugs were not necessary all presented in the sentence, sentences were categorized into two classes:
• Clear DDI Sentence (CDDIS): two drug names (or drug-enzyme pair in the in vitro study) are in the sentence with a clear interaction statement, i.e. either interaction, or non-interaction, or ambiguous statement (i.e. such as possible or might and etc.).
• Vague DDI Sentence (VDDIS): One drug or enzyme name is missed in the DDI sentence, but it can be inferred from the context. Clear interaction statement also is required.
Once DDI sentences were labeled, the DDI pairs in the sentences were further annotated. Because the fundamental difference between in vivo DDI studies and in vitro DDI studies, their DDI relationships were defined differently. In in vivo studies, three types of DDI relationships were defined (Table
): DDI, ambiguous DDI (ADDI), and non-DDI (NDDI). Four conditions are specified to determine these DDI relationships. Condition 1 (C1) requires that at least one drug or enzyme name has to be contained in the sentence; condition 2 (C2) requires the other interaction drug or enzyme name can be found from the context if it is not from the same sentence; condition 3 (C3) specifies numeric rules to defined the DDI relationships based on the PK parameter changes; and condition 4 (C4) specifies the language expression patterns for DDI relationships. Using the rules summarized in Table
, DDI, ADDI, and NDDI can be defined by C1 ^ C2 ^ (C3 ^ C4). The priority rank of in vivo PK parameters is AUC > CL > t1/2 > Cmax. In in vitro studies, six types of DDI relationships were defined (Table
). DDI, ADDI, NDDI were similar to in vivo DDIs, but three more drug-enzyme relationships were further defined: DEI, ambiguous DEI (ADEI), and non-DDI (NDEI). C1, C2, and C4 remained the same for in vitro DDIs. The main difference is in C3, in which either Ki or IC50 (inhibition) or EC50 (induction) were used to defined DDI relationship quantitatively. The priority rank of in vitro PK parameters is Ki > IC50. Table
presented eight examples of how DDIs or DEIs were determined in the sentences.
] was calculated to evaluate the reliability of annotations from four annotators. The frequencies of key terms, DDI sentences, and DDI pairs are presented in Table
. Their Krippendorff’s alphas are 0.953, 0.921, and 0.905, respectively. Please note that the total DDI pairs refer to the total pairs of drugs within a DDI sentence from all DDI sentences.
Annotation performance evaluation
The PK corpus was constructed by the following process. Raw abstracts were downloaded from PubMed in XML format. Then XML files were converted into GENIA corpus format following the gpml.dtd from the GENIA corpus
]. The sentence detection in this step is accomplished by using the Perl module Lingua::EN::Sentence, which was downloaded from The Comprehensive Perl Archive Network (CPAN,
). GENIA corpus files were then tagged with the prescribed three levels of PK and DDI annotations. Finally, a cascading style sheet (CSS) was implemented to differentiate colours for the entities in the corpus. This feature allows the users to visualize annotated entities. We would like to acknowledge that a DDI Corpus was recently published as part of a text mining competition DDIExtraction 2011 (http://labda.inf.uc3m.es/
DDIExtraction2011/dataset.html). Their DDIs were clinical outcome oriented, not PK oriented. They were extracted from DrugBank, not from PubMed abstracts. Our PK corpus complements to their corpus very well.