Introduction

In recent years, virtual screening (VS) has emerged as a complementary method to high-throughput screening of large chemical databases in terms of time-efficiency and cost-effectiveness [1, 2]. VS methods are designed for searching large compound databases in silico and selecting a limited number of candidate molecules for testing to identify novel chemical entities that have the desired biological activity [3]. There are two fundamental approaches to virtually screen databases for molecules fitting either a known pharmacophore [4, 5] or a 3D structure of the macromolecular target [612]. The power of pharmacophore-based methods for lead generation lies in their ability to suggest a diverse set of compounds potentially possessing a desired biological activity but which have totally different chemical scaffolds [13]. Structure-based VS searches a database for compounds that fit into the binding site of the target, based on their chemical feature matching and shape complementarity. Due to the exploitation of structural information, like shape and volume, taken directly from the active site, structure-based methods are more efficient than the pharmacophore-based methods [14]. However, structure-based studies require 3D knowledge of the target. Moreover, structure-based studies are affected by the inaccuracy of the molecular-docking methods [15, 16] that are, in turn, dependent on the computational algorithms for conformational sampling and scoring of different ligand-binding poses [8, 12, 17].

G-Protein coupled receptors (GPCRs) are major targets for drug discovery in the post-genomic era. They constitute a superfamily of membrane receptors of utmost importance in pharmaceutical research [18]. According to a recent analysis, GPCRs account for 40–50% of all current drug targets, with 26 out of the 100 top-selling drugs targeting this protein family [1921]. GPCRs correspond to the largest gene family in the human genome, comprising about 5% of human genes [22, 23]. Thus, there is high interest in developing new lead compounds for most human GPCRs. Given the difficulty in elucidating the structure of these membrane proteins at the atomic level, traditional computer-aided drug design for GPCRs heavily rely on ligand-based modeling techniques [24, 25]. With the determination of X-ray crystal structure from two structurally related proteins, bacteriorhodopsin [26, 27] and bovine rhodopsin [2831], homology modeling and structure-based approaches to GPCR drug discovery were made possible. Most of the modeled GPCR structures were only used to explain the binding of known agonists/antagonists retrospectively and their use in structure-based drug design was not common until recently. The probable reason for this may be the fact that the available rhodopsin structures are significantly different from each other, they have rather low sequence homology with other GPCRs, and interpreting data from mutagenesis experiments with respect to a GPCR has been problematic [32]. Therefore, it may be assumed that comparative modeling alone will not afford accurate structural information on individual receptors required for ligand design and in silico screening. Despite this assumption, reports of such models being successfully applied in VS protocols provides some support towards the effectiveness of homology-based molecular modeling in identifying novel lead molecules [3337].

Opioid receptors belong to the rhodopsin subclass within the GPCR superfamily [38]. The superfamily is characterized by a common structural motif of seven transmembrane-spanning (7TM) helices connected by intracellular (ILs) and extracellular loops (ELs) [39, 40]. The opioid receptors are ∼60% identical to each other with greatest homology in the TM regions and the greatest diversity in their N- and C-termini and the ELs [41]. A major clinical application of opioid ligands has been as potent analgesics. However, undesired effects associated with opioids limit their wider use in the clinical management of pain.

Pharmacological studies suggest that multiple opioid receptor subtypes, namely, the mu opioid (MOP), the kappa opioid (KOP), the delta opioid (DOP) [4250], and the more recently discovered NOP [51, 52] (also called LC132, ORL-1, or nociceptin/orphanin FQ receptor), might be responsible for the pharmacological effects of the opioid drugs, including the regulation of pain. Each receptor mediates unique pharmacological responses and is differentially distributed in the central and peripheral nervous systems. Particularly, activation of the KOP receptor produces many effects such as spinal antinociception, sedation, dysphoria, and diuresis [5355]. Ligands targeting the KOP receptor have been of clinical interest due to their ability to provide effective pain control without the dependence and respiratory depression that is associated with MOP receptor activation by morphine (Fig. 1; 2) [53]. On one hand, selective KOP receptor antagonists are being explored for their effects in the treatment of a wide variety of areas including cocaine addiction, depression, and feeding behavior, and have been proposed as a treatment for psychosis and schizophrenia [54]. On the other hand, selective KOP agonists such as U50,488H (Fig. 1; 4) find potential applications as analgesics, diuretics, dysphorics, and as antipruritic agents [55].

Fig. 1
figure 1

Structures of salvinorin A (1), morphine (2), fentanyl (3), and U50,488H (4)

Salvinorin A, a neoclerodane diterpene (Fig. 1; 1), has been identified as the most potent naturally occurring opioid agonist yet discovered with high selectivity and affinity for KOP receptor [56, 57]. It has no significant activity at other known opioid receptors, serotonergic, dopaminergic, adrenergic, muscarinic, cannabinoid and sigma receptors as well as various ionotropic and metabotropic receptors [5860]. It is isolated from the leaves of widely available psychoactive plant, Salvia divinorum [61, 62]. This plant has been used in traditional, spiritual and ethnopharmacological practices by the Mazatec Indians of Oaxaca, Mexico to produce a hallucinogenic effect [56, 61, 63, 64]. Additionally, salvinorin A, being significantly more efficacious than U50,488H or U69,593 and similar in efficacy to dynorphin A (the endogenous peptide ligand for kappa opioid receptors), may be used to treat heroin, cocaine, alcohol and amphetamine dependency, depression, and even excessive marijuana use [56].

Historically the protonated amine, common to all opioid ligands, was considered essential to interact with the side chain of a conserved aspartic acid residue located in TM3 of the opioid receptors to elicit activity [65]. However, salvinorin A, lacking this nitrogen atom, common to all other classes of opioid receptor ligands, is unable to interact in the generally accepted manner and therefore, is the only known non-nitrogenous agonist identified at the KOP receptor [56]. Thus, it is not only structurally distinct from naturally occurring hallucinogens but other classes of KOP receptor agonists such as morphinans, benzomorphans, arylacetamides and 4-phenylpiperidines (Fig. 1). It can cross the blood brain barrier [66] and reportedly, has low toxicity in rodents [67, 68]. Given this, salvinorin A represents a promising lead compound for the development of more potent and selective KOP receptor ligands. However, due to the reported hallucinogenic activity and short duration of action in humans [69], the potential of salvinorin A, itself, as an analgesic is limited. Efforts towards the synthesis and evaluation of structurally related analogs that may have longer lasting analgesic properties and be devoid of psychotropic activity have been reported by several groups and may prove useful as potential therapeutic agents in the treatment of pain and other medical conditions.

The aim of this study was to develop a combined ligand-based and target-based drug design strategy in order to identify novel molecular scaffolds as potential lead molecules for the KOP receptor. Our computational approach is summarized in Fig. 2. The idea was to develop a robust VS model that would combine the speed of pharmacophore modeling and the accuracy of structure-based docking. Additionally, the information obtained from the putative binding site of the generated receptor model may facilitate the identification of key residues and physicochemical properties that may be the molecular determinants for ligand recognition. Simultaneously, structure–activity relationships (SAR) among available ligands for a given receptor may afford equivalent pharmacophores. It is understood that combining ligand-based and target-based models, developed independently from each other, may offer a self-consistency check for spatial complementarities between them.

Fig. 2
figure 2

Flowchart describing our combined target-based and ligand-based drug design approach

To achieve this, a series of molecular modeling studies including automated pharmacophore hypothesis generation and comparative protein modeling was carried out. A systematic study was initiated to develop a pharmacophore hypothesis that could serve the following two purposes: (a) to identify new molecular entities (NMEs) by using a 3D query to search chemical databases and (b) as a predictive tool for estimating biological activity of virtual compounds or compounds designed de novo on the basis of structure–activity analyses. Due to the absence of direct structural information of the KOP receptor–ligand complex, a comparative model of the hKOP receptor was also generated and validated to allow the computational exploration of potential ligand–receptor interactions at the atomic level. Mutational and ligand binding data reported in the literature was also considered to obtain models in order to generate more relevant geometry of the protein binding site [70]. This hKOP receptor model is expected to provide a structural basis for rational drug design methodologies and a tool for large-scale VS of chemical databases.

Materials and methods

Computational resources

Molecular modeling was performed on a Silicon Graphics Octane2 R12000 dual processor workstation running on Irix 64 (SGI, 1600 Amphitheater Parkway, Mountain View, CA 94043). Pharmacophore Modeling was performed using Catalyst 4.8 [71]. Sequence alignment was performed using CLUSTALW 1.8 [72] and Align123 in InsightII 2005/HOMOLOGY module [73]. Comparative Modeling was performed using InsightII/MODELER. Energy minimization and molecular dynamics simulations were performed using InsightII/DISCOVER module. Ramachandran plots and statistics were generated by PROCHECK 3.4 [74]. For the purpose of docking, GOLD 2.1 (Genetic Optimization for Ligand Docking) was used [75]. Visualization of docking results was performed using graphics facility of Sybyl 7.2 [76].

Biological activity data and training set selection

The activity data for salvinorin A analogs, represented as K i in nM, were obtained from the literature [57, 66, 7780]. The chemical structures of the agonists are listed in Charts 1 and 2. The datasets were divided into a training set and a test set. The selection of a suitable training set is critical for the quality of automatically generated pharmacophore models [81]. To ensure the statistical relevance of the calculated model, the training set should contain a set of diverse compounds together with their activity values. These should originate from comparable binding assays and spread equally over at least 4–5 orders of magnitude. Each selected compound should add some new information to the model while avoiding redundancy and bias both in terms of structural features and activity range. The most active compounds should be included so that they provide information on most critical features required as a pharmacophore. Compounds that are inactive due to steric hindrance must be avoided since they cannot be handled by the currently available features in the Catalyst. On the basis of the above criteria, 15 compounds for the training set and 12 compounds for the test set were selected.

Chart 1
figure a

Structures of compounds in training set. All structures were drawn using ChemDraw Ultra 8.0

Chart 2
figure b

Chemical structures of test set compounds. All structures were drawn using ChemDraw Ultra 8.0

For estimation (prediction) purposes, the activity values were classified as follows: K i (nM) ≤ 1 nM means the compounds are highly active (represented as +++); 1 nM < K i (nM) ≤ 100 nM means the compounds are moderately active (represented as ++); K i (nM) > 100 nM means the compounds are inactive (represented as +). This classification scheme was created on the basis of the fact that salvinorin A, the lead compound had an activity (K i) of ∼1 nM, and subsequent optimization of this molecule has generated potent derivatives indicating that activity can reach subnanomolar levels. Therefore, a stricter cut-off value of less than 100 nM was chosen to consider any compound to be active. Because one of the primary objectives of pharmacophore generation is to utilize it for screening 3D drug-like chemical databases to identify lead compounds, this classification scheme is more consequential than actual prediction values and may help in the identification of novel, structurally diverse lead compounds.

Generation of pharmacophores

A pharmacophore model in Catalyst is generally referred to as a “hypothesis”, which consists of a collection of features necessary for the biological activity of the ligands oriented in 3D space [82]. In order to generate a pharmacophore, all molecules (both training and test sets) were built and minimized within the Catalyst. Conformation models for all molecules were generated using the quasi-exhaustive Catalyst/ConFirm module within the software, using the “best quality” conformational search option. A maximum of 250 conformations were generated using CHARMm force field parameters [83] and a constraint of 20 kcal/mol energy thresholds above the global energy minimum. Catalyst selects conformers using the Poling algorithm [84], that penalizes any newly generated conformer if it is too close to an already formed conformer in the set. This method ensures maximum coverage in conformational space. All other parameters were set to the default settings. An initial analysis revealed that hydrogen bond acceptor (HBA), hydrophobic (HYD), and ring aromatic (RA) features could effectively map all critical chemical/structural features of all the training set molecules. During the initial phase of the hypothesis generation, it was observed that only two features, i.e. HBA and HYD, out of those three mentioned above dominated most of the useful hypotheses generated by the Catalyst. Also, the maximum number of HBA features in those hypotheses was never more than two. Therefore, those two features were used to generate 10 pharmacophore hypotheses from the training set, using a default uncertainty value of 3. The uncertainty value represents a ratio range of uncertainty in the activity value based on the expected statistical straggling of biological data collection. The Catalyst/HypoGen module can only generate a maximum of five features for a hypothesis. The best model was selected on the basis of a high correlation coefficient (r 2), lowest total cost, and rms values.

In general, HypoGen tries to construct a pharmacophore that correlates best the 3D arrangement of features in a given set of training compounds with the corresponding pharmacological activities. The resulting hypotheses are three-dimensional arrangements of several default feature types (e.g., HBA, hydrogen bond donor (HBD), HYD, RA, positive ionizable (PI)) located at defined positions (location constraints). These are surrounded by certain spatial tolerance spheres, assessing the area in space that should be assigned by the corresponding chemical functions of the matched molecule. Each of the features occupies a certain weight that is proportional to its relative contribution to biological activity. HBA, HBD, and RAs additionally include a vector, defining the direction of the interaction.

The HypoGen algorithm tries to find hypotheses that are common among the active compounds of the training set but do not reflect the inactive ones. The overall assumption is based on Occam’s razor, which states that between otherwise equivalent alternatives, the simplest model is best [85]. Therefore, constructed models correlate best with measured activities and consists of as few features as possible. During a HypoGen run, three phases are passed through: (a) pharmacophores that are common among the most active compounds are elaborated in the constructive phase, (b) those pharmacophores that fit the inactive training set members are abolished in the subtractive phase, and (c) the remaining hypotheses are refined in the optimization phase. Thereby random translations of features, rotations of vectored features, and the removal or addition of features from the models are performed. Each perturbation is evaluated by consideration of three cost components, the Error, the Configuration, and the Weight Cost. By default, the 10 lowest cost hypotheses are written to the output file.

Each feature of a hypothesis represents a certain order of magnitude of the compounds’ activity. With the default setting of 0.302, the represented orders of magnitude are kept as close to 2 as possible. The weight component is a value that increases in a Gaussian form as these feature weight in a model deviate from the idealized value of 2. The configuration cost (also known as entropy cost) penalizes the complexity of the pharmacophore hypothesis space while the error cost penalizes the deviation between the estimated activities of the training set and experimentally determined values. The two additional theoretical cost calculations are the fixed cost and the null cost. The fixed cost represents the simplest model that fits all the data perfectly and is calculated by the minimum achievable error and weight cost in addition to the constant configuration cost. The null cost represents the maximum cost of a pharmacophore with no features and estimates activity data of every molecule in the training set as the average of all activities in the set. Therefore, the contribution from the weight or configuration component does not apply. Its absolute value is equal to the maximum occurring error cost [86].

Assessment of the quality of pharmacophore hypotheses

Cost function analysis

The quality of the generated hypotheses was evaluated by considering the cost functions (represented in bits unit) relative to the null and fixed hypothesis calculated by the Catalyst/HypoGen module during hypothesis generation. The total cost of any hypothesis should be toward the value of the fixed cost to represent any useful model. It has been suggested in the Catalyst that the differences between the cost of the generated hypothesis and the null hypothesis cost should be as large as possible; a value of 40–60 bits difference may indicate that most probably it has a 75–90% chance of representing a true correlation in the data set used. Furthermore, the configuration cost for any generated hypothesis should be less than or equal to 17 (corresponds to 217 pharmacophore models). Any value higher than 17 may indicate that the correlation from any generated pharmacophore is most likely due to chance since Catalyst cannot consider more than these models in the optimization phase and so the rest are left out of the process. The configuration cost value can be reduced by limiting the minimum and maximum features. The error cost should also be low; this value increases as the value of the rms increases. The rmsd represent the quality of the correlation between the estimated and the actual activity data.

Cross-validation test

To further assess the statistical significance of the pharmacophore hypotheses generated from the training set molecules a validation procedure, based on Fischer’s randomization test [87] was applied. The point of this test is to establish the strong correlation between chemical structures and biological activity. The activity values of the training set molecules are scrambled randomly using the CatScramble technique, available in the Catalyst/HypoGen module, and new spreadsheets are created. The number of spreadsheets generated depends on the level of statistical significance one wants to achieve. For a 95% confidence level, 19 spreadsheets are created. For 98% and 99% confidence levels, 49 and 99 spreadsheets, respectively, are created. In our validation test, we selected the 95% confidence level, and 19 spreadsheets were created by the CatScramble command.

Comparative modeling

Multiple sequence alignment

The peptide sequence of the hKOP receptor was extracted from the SWISSPROT protein sequence data bank (http://us.expasy.org/sprot/; accession no. P41145) [88]. The recently determined 2.2 Å X-ray structure of the bovine rhodopsin (PDB ID: 1U19) [31] available from the Protein Data Bank (PDB) [89] was used to build the hKOP receptor model. Three additional class A GPCR sequences, the human DOP (accession no.: P41143), MOP (accession no.: P35372), and NOP (accession no.: P41146) were used to obtain multiple sequence alignment. Sequence alignment was then derived with the CLUSTALW 1.8 package [72] and align123 in InsightII/HOMOLOGY module using the BLOSUM matrices [90] for scoring the alignments. The obtained alignment was then manually refined and inspected for structural integrity in the structurally conserved regions present in TM helices across all the sequences.

Generation of comparative model

The resulting alignment was used as an input for the automated homology-modeling program MODELER v.7.0 [91, 92] within the InsightII software suite. This program assigns atomic coordinates to regions structurally aligned with the template, builds intervening loops, optimizes the rotamers of amino acid side chains, and performs an initial energy optimization of the structure. MODELER generates protein 3D structures by satisfying spatial restraints imposed by the sequence alignment with the template structure. These spatial restraints consist of homology-derived restraints on the distances and dihedral angles in the target sequence based on its alignment with template structures, stereochemical restraints such as bond length and bond angle terms obtained from CHARMm22 forcefield [83], and statistical preferences for dihedral angles and non-bonded inter-atomic distances obtained from a representative set of protein structures. These restraints are expressed in terms of Probability Density Functions (PDF). A 3D protein model is obtained by optimizing the probability density function with the variable target function procedure in cartesian space that employs methods of conjugate gradients and molecular dynamics with simulated annealing. To guarantee sufficient conformational sampling of each active site residue, several homology models are generated in this step.

Initial model refinement

Hydrogen atoms were added to the selected model. By explicitly specifying the pH as 7.0, the protonation states for all the amino acid side chains sets in accordance with their typical pK a values. A consistent valence forcefield (CVFF) parameters, a distance dependent dielectric constant and a non-bond cut-off distance of 10.0 Å was employed. The initial model was energy-minimized, employing the InsightII/DISCOVER module, in a stepwise fashion following a standard procedure consisting of 500 steps of steepest descent and 1,000 steps of conjugate gradient minimization with an rms gradient of the potential energy of 0.001 kcal/mol·Å in each step. In the first step, all atoms other than hydrogens and terminal oxygen atoms were constrained. This was followed by restraining the backbone atoms allowing energy minimization of side chains of the protein. In the third step, the α-carbons of the entire model were held fixed. Finally, α-carbons of only the TM regions were held fixed while allowing free movement of loop atoms to optimize their geometry.

Generation and refinement of “agonist-bound” state model

After prepositioning the ligand in the receptor cavity according to the known SAR information [57, 66, 7780] and mutation data [93, 114], the protein–ligand complex was minimized to relieve steric clashes between the non-bonded atoms. The minimization protocol for all calculations consisted of an initial minimization using 100 steps of steepest descent and 1000 steps of conjugate gradient algorithm. Omega (ω) torsions for all protein residues were forced to trans configuration using a force constant of 100 kcal/mol Å2. Notably, the Phi–Psi (ϕ-ψ) torsions of residues of the TM domains were restrained using a force constant of 100 kcal/mol Å2. An NVT (constant-volume/constant-temperature) molecular dynamics simulation using Andersen thermostat [94] was carried out where the resulting complex was gradually “heated” from a temperature of 100–600 K in steps of 100 K with 10 picoseconds (ps) simulation at each stage. At 600 K, the complex was simulated for another 50 ps followed by gradual “cooling” to 300 K in steps of 100 K with 10 ps at each stage. Finally, a 50 ps simulation at 300 K with a time-step of 1 femtosecond was carried out. The averaged structure obtained in the last 25 ps trajectory of MD was reminimized until an rms of 0.05 kcal/mol Å was reached. During this minimization procedure, the restraints put on the TMs were reduced gradually. When the restraints were finally removed, the structure was submitted to the final minimization until an rms of 0.1 kcal/mol Å was reached.

Automated docking simulations

Docking runs were performed using the GOLD 2.1 (Genetic Optimization for Ligand Docking) [76, 95] program that uses a genetic algorithm (GA) to explore the full range of ligand conformational flexibility and the rotational flexibility of selected receptor hydrogens. The mechanism for ligand placement is based on fitting points. The program adds fitting points to hydrogen-bonding groups on the protein and the ligand and maps acceptor points in the ligand on donor points in the protein and vice versa. Additionally, GOLD generates hydrophobic fitting points in the protein cavity onto which ligand CH groups are mapped. The GA optimizes flexible ligand dihedrals, ligand ring geometries, dihedrals of protein OH and NH +3 groups, and the mappings of the fitting points. The docking poses are ranked based on a molecular mechanics like scoring function, which includes a hydrogen bond term, a 4–8 intermolecular van der Waals terms, and a 6–12 intramolecular van der Waals term for the internal energy of the ligand.

The protein input was prepared by removing salvinorin A from the final protein complex. Overall 15 ligands with known KOP receptor binding affinities [57, 66, 7780] that spanned an activity range of 4 log orders were built and minimized for 1,000 steps each of steepest descent followed by conjugate gradient and finally by the BFGS method to a gradient of 0.001 kcal/mol Å or less using the InsightII/BUILDER module. All atom types and charges were assigned in GOLD. Standard default settings were used for all calculations. GOLD was allowed to terminate when the top three dockings were within 1.5 Å of each other. The binding pocket was defined as a spherical region of 12 Å radius centered from the side-chain hydrogen atom of amide of Gln115 (Q2.60). The selection of the most appropriate docking solutions was based on a suitable placement of the ligand with respect to key interactions with the binding site residues.

Numbering of residues and residue index

Residues are numbered according to their positions in the sequences retrieved from SWISS-PROT database [88]. A general index numbering system first described by Ballesteros and Weinstein [96] that applies to all rhodopsin-like GPCRs has been used. Each residue is identified by the TM number (1–7), the most conserved residue within each TM is assigned the number ‘50’ and the positions of all other residues are numbered relative to the most conserved amino acid in each TM segment. This indexing simplifies the identification of corresponding residues in different GPCRs.

Results and discussion

Pharmacophore generation

A set of 10 pharmacophore hypotheses were generated using 15 training set compounds listed in Chart 1. The results of the hypotheses, which include different cost values calculated during hypotheses generation along with rmsd, correlation (r 2), and pharmacophore features, are listed in Table 1. The value of total cost of each hypothesis was close to the fixed cost values, which is expected for good hypotheses. The entropy (configuration cost) values of the hypotheses were also within the allowed range. The difference between null hypothesis and the fixed cost and the total cost of the best hypothesis were 35.79 and 30.90 bits, respectively. These values are lower than the recommended values, which may be due to the high structural homology as well as rigidity of the molecules in the training set molecules. Hypotheses 1 and 2 had the best values in terms of total cost, error cost, rms differences, and the highest correlation and utilized identical features, namely, two HBAs and three HYDs. Hypothesis 2 estimated the activities almost similarly (data not shown) to hypothesis 1. Therefore, hypothesis 1 (Fig. 3) was selected as the best pharmacophore. Table 2 shows the actual and estimated K i values of training set compounds calculated on the basis of hypothesis 1. It is notable that only one moderately active (++) compound was estimated to be inactive (+).

Table 1 Results obtained from pharmacophore hypothesis generation using the training set moleculesa
Fig. 3
figure 3

(A) A representation of the top-ranked hypothesis. (B) Mapping of the most active compound (5) onto the selected pharmacophore hypothesis. The green and blue contours represent HBA and hydrophobic features (HYD), respectively

Table 2 Actual and estimated activities of training set molecules calculated on the basis of the top-ranked hypothesis

Pharmacophore assessment

The quality of the pharmacophore was assessed using the CatScramble technique in the Catalyst. The purpose of using this technique is to randomize the activity data among the training set compounds and to generate pharmacophore hypotheses using the same features and parameters used to develop the original pharmacophore hypothesis. If the randomized sets generate pharmacophores with similar or better cost values, rms, and correlation, then the original pharmacophore can be considered as generated by chance. The results of the randomization tests are listed in Table 3. The results obtained from the CatScramble runs indicated that hypotheses generated after randomization had no predictive value. This cross-validation technique provided confidence on the pharmacophore generated from the training set molecules.

Table 3 Results from cross-validation run using CatScramble

Validation of pharmacophores

The selected pharmacophore was further validated by two techniques: (a) by assessing the predictive ability of the pharmacophore on test set molecules; (b) by incorporating an external set of negative controls consisting of five marketed drugs acting on the central nervous system, which target different G-protein coupled receptors. This validation should confirm that the pharmacophore does not predict those drugs as highly active (represented as +++).

Validation of pharmacophore using test set compounds

The validity of any pharmacophore model needs to be determined by applying that model to the test set to find out how correctly the model predicts the activity of the test set molecules and, most importantly, whether it can identify active and inactive molecules correctly. The selected pharmacophore was validated with a test set containing 12 compounds obtained from the same laboratories [57, 66, 7780] as that of the training set compounds. This validation gives additional confidence in the usability of the selected pharmacophore. The estimated activities were scored using hypothesis 1 and reported in Table 4 (see supporting information, Fig. S1). Out of 12 compounds, 10 were accurately predicted. Only two moderately active compounds were classified as inactive. The selected pharmacophore clearly showed minimal failure in classifying compounds correctly. One of the most potent compounds reported (5 in Chart 1 and Table 2) was chosen to show mapping onto the selected hypothesis, as shown in Fig. 3. The “best fit” option was selected in all cases.

Table 4 Actual and estimated activities of test set molecules calculated on the basis of the top-ranked hypothesis

Validation with negative controls

None of the five-marketed drugs mapped onto the selected pharmacophore (see supporting information, Table S1) and were estimated to be inactive (+) thus, confirming the validity of the model in a negative test.

This validation corroborated the effectiveness of the derived pharmacophore model, which most likely will be useful in identifying NMEs that target hKOP receptor. Additionally, this pharmacophore may also be used in de novo design to develop totally novel drugs that satisfy the pharmacophoric requirements. Furthermore, it can guide chemists to synthesize new compounds during lead optimization.

Generation of KOP receptor model

Homology modeling has proved to be a very powerful technique, even when the template and the target are distantly related [97]. However, although a common general structure is likely to be shared by all GPCRs, including the KOP receptor, specific differences do exist between rhodopsin and other receptors. To date, bovine rhodopsin is the only GPCR for which an experimental structure has been elucidated and therefore, is the only suitable template for the construction of homology models of other members of the superfamily. This structure however, corresponds to an antagonist-bound conformation where retinal is covalently bound [28]. Homology models constructed from this structure are not directly suitable for docking purpose for a variety of reasons. For example, GPCRs are known to adopt different conformations based on the type of bound ligand (agonist/antagonist) [40, 98, 99]. In addition, the modeled binding site is too narrow which is mostly due to the rather flat nature of the retinal, and also because the E2 loop in bovine rhodopsin closes tightly upon the center of the receptor that restricts the access of the larger GPCR ligands into the active site. So far, a 3D structure of a GPCR in its activated state is not available, however, several empirical models of the agonist-bound state have been generated using constraints from site-specific spin labeling and mutation studies [100103], NMR [104] and cross-linking studies [105]. Although such structures need detailed verification, there are reports of successful VS of large compound databases for drug design projects in the literature [106, 107]. Several homology-modeling studies of the KOP receptor have been reported in the literature [108]. While the previously reported KOP receptor models offer insights into the molecular pharmacology of the receptor, they lack detail for purposes of drug design. This warranted the generation of a KOP receptor model that not only addresses the possible binding modes of agonists in the receptor binding site to afford a better understanding of the molecular recognition process at the KOP receptor but can serve as a tool for structure-based drug design. Furthermore, to generate a model closer to the agonist-bound state (activated state) the model was refined by minimizing the protein with a full agonist to achieve expansion of the active site.

Multiple sequence alignment

Sequence alignment is a central technique in homology modeling used to establish a one-to-one connection between the amino acids of the reference and the target protein [109]. It is particularly critical in the case of GPCRs where the only structural template available is that of bovine rhodopsin and the sequence identity with the model sequence falls below 19.5% in most cases [110], which is inappropriate for homology modeling. However, class A GPCRs are an exception owing to the fact that each TM contains one or two highly conserved residues that allows an explicit alignment [111], that may be combined with manual modifications to obtain a fairly accurate alignment. Also, the problem of low sequence identity may be addressed by performing a multiple sequence alignment of the retrieved protein sequences and that obtained from template bovine rhodopsin. This also helps to minimize the errors originating from the availability of a single template structure. To this end, sequences from four opioid receptors were used during the alignment process to improve the low confidence regions in the alignment and also point out potential areas of low reliability in the homology model. All four opioid receptors contain highly conserved fingerprint residues of family A receptors, such as Asn46 (N1.50) in TM1, Asp74 (D2.50) in TM2, Arg126 (R3.50) in TM3, Trp153 (W4.50) in TM4, Pro207 (P5.50) in TM5, Pro255 (P6.50) in TM6, and Pro299 (P7.50) in TM7; Fig. 4 [112]. In addition, all four opioid receptors possess two conserved cysteine residues in the first and second ELs, that probably form a disulfide bridge which is implicated in constraining the conformation of the extracellular domains of the proteins and stabilizing the TM domains [113]. All of these highly conserved residues were carefully aligned with corresponding residues of the bovine rhodopsin, as well as the other sequences.

Fig. 4
figure 4

Multiple sequence alignment of the hKOP receptor with the bovine rhodopsin (P_1U19) and the other class A GPCR sequences. Each TM is highlighted in a different color. The most conserved residue for each TM helix across all the species is enclosed in a vertical box colored cyan. The conserved GPCR motifs have been indicated by cyan dots

The global sequence identity between the KOP receptor and bovine rhodopsin is found to be approximately 19%. The sequence identity and similarity values for each TM domain are given in Table 5. Although the sequence identity between the TMs of the modeled receptor and the template is only between 12 and 29%, the model is believed to share the same fold as the template. This is supported by the fact that the modeled GPCRs include several conserved motifs with the structurally and functionally important amino acid residues either highly conserved or substituted by amino acids of high similarity (the LAxxD on TM2, the D(E)RY on TM3, and the NPxxY on TM7) [14]. The final multiple sequence alignment used for model building is shown in Fig. 4. Therefore, it may be assumed that the generated model is of higher accuracy in the TMs as compared to the loop regions, which are divergent and flexible to a greater extent. However, since the binding site is located in the TM domains, the uncertainty in the loop regions as well as the N- and C-terminal domains was not considered of much consequence to ligand screening.

Table 5 Sequence identity and similarity between conserved regions of the hKOP receptor and the bovine rhodopsin

Model building procedure and initial model refinement

The 3D structure of hKOP receptor was generated using the MODELER program interfaced with the InsightII software based on the recently published bovine rhodopsin crystal structure (PDB ID: 1U19; 2.2 Å resolution) as the template structure. The input parameters of MODELER were set to generate six models including three loop refinements, with the level of optimization set to high and the manual-definition disulfide bridges assumed to exist in the extracellular region by homology to hKOP receptor [113]. Amongst the six different models generated, the one showing the lowest energy, restraint violations, and the number of main chain and side chain bad conformations was selected for further refinement, see Table 6. Since it is known that N-terminal does not contribute significantly toward the binding of KOP receptor ligands, some of the overhanging residues in this region were truncated [93, 114].

Table 6 Analysis of the hKOP receptor models using MODELER and PROSTAT

The quality of the initial model was improved by subjecting it to a crude energy minimization protocol as detailed in the Materials and methods section. These minimizations helped relieve any steric clashes or improper geometries in the protein structure and produced a model with correct bond lengths and bond angles and where individual atoms are not too close together.

Generation of “agonist-bound” hKOP receptor model

Binding of a ligand to the active site of a protein is typically associated with local and, perhaps, also global structural rearrangement of the receptor (Induced-fit effect). As a result, structure-based drug design preferentially relies on the structures of protein–ligand complexes containing bound ligands. Keeping this in mind, the next step was to develop a protein–ligand complex that would offer a more detailed and accurate picture of the interactions and structural complementarities between the ligands and the active site. Such a protein–ligand complex may be used more appropriately to perform docking simulation studies and may provide more meaningful structural models.

Therefore, to obtain the 3D coordinates of the agonist-bound state of the KOP receptor, salvinorin A was manually placed within the putative binding pocket guided by SAR, supported by experimental data [93, 114]. It should be noted that due to the lack of a protonable nitrogen atom common to all GPCR ligands, from the ligand structure, the generally accepted key anchoring interaction for agonist association (basic nitrogen forming a salt bridge with the highly conserved Asp138 (D3.11)) in TM3 for GPCRs cannot be established which makes the docking of salvinorin A challenging [115]. However, site directed mutagenesis studies did provide sufficient information of the key residues involved in binding of the ligand to the receptor that directed suitable placement of the ligand into the putative binding pocket [93, 114]. This was followed by a series of energy minimization and restrained MD simulations performed iteratively on the protein–ligand complex, the general procedure of which has been detailed in the Materials and methods section. The preferred binding mode was chosen on the basis of compatibility with mutation and binding data, as well as energetic considerations. Care was taken to maintain the conserved α-helical geometry of the TM domains during the refinement. The refined 3D structure of hKOP receptor generated by the homology modeling method is displayed in Fig. 5. The backbone rmsd between the final minimized homology model and the template crystal structure is 3.55 Å, while that between the starting homology-modeled structure and the template was only 1.58 Å.

Fig. 5
figure 5

General topology of the hKOP receptor model. The backbone of the KOP receptor is represented as oval ribbon in green. Salvinorin A, represented as a CPK model colored by atom type, is docked inside the trans-membrane recognition site (see Materials and methods section for details)

Protein structure validation

One essential requisite for a model is to have a good stereochemistry. The refined structure was evaluated for overall quality using available analysis procedures. These analyses compare specific properties of the model with those for known high quality protein structures. For this purpose two protein analysis programs were used: PROCHECK and PROSTAT.

PROCHECK is a well-known protein structure checking program that provides an assessment of the overall quality of the structure and highlight regions that may need further investigation. The phi–psi plots are shown in Fig. 6, while the more detailed results are listed in Table 7. The plot clearly shows the vast majority of the membrane-spanning amino acids are in a phi–psi distribution consistent with right-handed alpha helices. The remaining residues that fall into the random or beta configuration geometries are very short segments and are primarily in the loop regions of the protein. A quick check of amino acid side chain integrity did not reveal any amino acid side chain clashes with one another or with the backbone. Only five non-glycyl residues are outside the allowed regions. Analysis of the model reveals that these outlying residues are located far from the putative active site of the protein. Figure 6 shows that the structure is reasonable overall because the spot distribution for the homology modeled structure was similar to the standard X-ray structure of bovine rhodopsin. The results showed that our modeled structure was comparable with the standard structures.

Fig. 6
figure 6

Ramachandran plot of (A) the X-ray structure of bovine rhodopsin (B) and the homology-modeled structure of final hKOP receptor. The different colored areas indicate “disallowed” (white), “generously allowed” (light yellow), “additional allowed” (yellow), and “most favored” (red) regions (refer to Table 6)

Table 7 Results of protein structure check by PROCHECK and PROSTAT

Another protein structure analysis program, PROSTAT, was used for structure validation. The program measures all bond distances, bond angles, and dihedral angles for the given protein structure and compare them with its own database. It then lists the number of instances where structural features differ significantly from average values in known proteins. The cut-off used for significantly different was 10 standard deviations from the reference value. No spurious angle or bond length was detected in our model. The results are listed at the bottom of Table 7. Using these geometric criteria, the hKOP receptor model compares well with the X-ray structure of bovine rhodopsin.

Salient features of the model

There is a plethora of structural information available for GPCRs from a variety of experimental studies on different receptors. Since modeling of GPCRs is based on the assumption that they share considerable structural commonalities, the existence of such features may serve the purpose of validating the model. Accordingly, the final modeled structure of hKOP receptor was scrutinized for its similarity to the low-resolution structure of the bovine rhodopsin and for presence of common structural features inferred from several studies on GPCRs.

Conserved residues and motifs

A highly conserved residue in the GPCR family, Asn77 (N1.50) being a hydrophilic polar residue, is sited in the intracellular side in helix 1 and faces towards the core of the modeled structure. The LAxxD motif of GPCRs is also observed in the TM2 of the with the side chain of highly conserved Leu101 (L2.46) and Ala102 (A2.47) pointing towards the central core of the helix bundles. It is known that conserved cysteine residue in GPCRs may form an intramolecular disulfide bridge that links ELs of the receptors. In view of that a disulphide bond was observed between Cys131 (EL2) and Cys210 (C3.25) in the modeled structure. Another important motif, D(E)RY that has been connected to the regulation of interaction between the G-protein with the receptor [29], exists in most GPCRs. This microdomain is also conserved in the hKOP receptor. The side chains of both Asp155 (D3.49) and Arg156 (R3.50) are placed in hydrophobic surroundings (Thr94 (T2.39), Met152 (M3.46), Lys174 (L4.40), Ile272 (I6.33), Leu275 (L6.36) and Val276 (V6.37)). Interestingly, the charged interaction between Asp155 (D3.49) and Arg156 (D3.50) resulting in the formation of a salt bridge that serves to constrain the rhodopsin-like receptors in its inactive state [28] was not observed. This interaction was however, present in the initial model of hKOP receptor. It has been hypothesized that one of the key steps in receptor activation is breaking of this salt bridge [116, 117]. It is likely that refinement of the model with a full agonist may have induced changes allowing the receptor to assume a conformation closer to the activated state. Yet, another conserved microdomain in most GPCRs is the NPxxY motif [118]. The side chain nitrogen in Asn326 (N7.49) of this microdomain has been found to H-bond with the carboxylate of Asp105 (D2.50). Such an interaction has been demonstrated for Gonadotropin releasing hormone and Serotonin-2A receptors as well as in other GPCRs [119, 120]. Additionally, the helix–turn–helix fold of the NPxxY(x)5,6F region appears to be stabilized by a hydrophobic interaction between aromatic side chains of Tyr330 (Y7.53) and Phe337 (F7.60).

The aromatic cluster

Within the opioid receptor family, the conserved residues present in the aromatic cluster are Tyr139 (Y3.33), Phe231 (F5.43), and His291 (H6.52). The aromatic residues highly conserved in the larger family of GPCRs include Trp183 (W4.50), Phe235 (F5.47), Phe283 (F6.44), and Trp287 (W6.48), Fig. 7.

Fig. 7
figure 7

Interhelical network of aromatic-aromatic interactions in the hKOP receptor model

Interhelix hydrogen bonding network

In addition to hydrophobic interactions, an extensive interhelix hydrogen bond interaction exists between the helices that help stabilize the TM domains of the hKOP receptor. This cluster consists of several highly conserved residues across opioid receptors; Asn141 (N3.35), Ser145 (S3.39) and Thr148 (T3.42). Several residues highly conserved across all known GPCRs; Asn77 (N1.50), Asp105 (D2.50), Trp287 (W6.48), Asn322 (N7.45), Asn326 (N7.49), and Tyr330 (Y7.53) were also found within the cluster.

Binding site analysis

The binding cavity for salvinorin A type agonists appears to be formed essentially by TM2 and TM7. Mutational analysis has also suggested the binding site for the KOP receptor primarily between the amino acid residues of TM2 and TM7, implicates involvement of EL2 as well with no significant contribution from the TM1, TM3 TM5, TM6 and extracellular domains [114]. Consistent with the mutational and biochemical data [93, 114], the active site appears to be essentially hydrophobic with side chains of several hydrophobic residues (Tyr66 (Y1.39), Tyr119 (Y2.64), Ile137 (I3.33), Tyr312 (Y7.35), Tyr313 (Y7.36), Ile316 (I7.39) and Tyr320 (Y7.43)) extending into the deduced binding pocket for salvinorin A, shown in Fig. 8. As is obvious from the Fig. 5, salvinorin A appears to be positioned between the TM helices, closer to the extracellular domain of the receptor.

Fig. 8
figure 8

Salvinorin A in the putative binding pocket of agonist-bound conformation of hKOP receptor, represented as yellow ribbon. Side chains of selected residues within 5.0 Å of docked ligand are displayed. Salvinorin A is displayed as green carbon atoms. Receptor side chains are displayed as white carbon atoms

Detailed analysis of the final hKOP receptor model reveals several critical ligand–receptor contact points, which are in agreement with the known SAR and mutational data for the corresponding amino acid residues [93,114]. The ester oxygen at position 2 (between C-2 and C-21) is observed to be involved in H-bonding with the amide side-chain of Gln115 (Q2.60), shown in Fig. 9a. This residue has been shown to be one of the key amino acid residues involved in binding of salvinorin A to the hKOP receptor [114]. This is also in compliance with our ligand-based pharmacophore model where the ester oxygen at the position 2 of the ligand is implicated as HBA, refer Fig. 9b. Another hydrogen bonding interaction between the carbonyl oxygen (C-18) at position 4 of salvinorin A and Tyr313 (Y7.36) was also observed. Additionally, the methyl group (C-23) at position 4 ester shows a hydrophobic interaction with the phenol ring of Tyr313 (Y7.36) residue. Indeed, a similar interaction has been proposed by the developed pharmacophore model as well. Site directed mutation studies [93, 114] indicate that this residue (Tyr313) may be involved in hydrophobic interaction. However, the Tyr313Ala mutation results in an approximate 22-fold decreased affinity, which may be explained on the basis of the loss of both interactions established by this residue [93]. The furan ring of salvinorin A shows π-stacking interactions with Tyr320 (Y7.43) that lies at the base of the putative binding pocket. Similar inferences may be derived from the pharmacophore model. It is interesting to note that mutation studies by Kane et al. [114] suggest involvement of this residue in π–π interactions or other hydrophobic interactions, however, Yan et al. [93] in their study have implicated this residue in H-bonding with the furan oxygen of salvinorin A. A closer inspection of the putative binding pocket of our model reveals that this furan ring oxygen is 3.55 Å away from the hydroxyl functionality in Tyr320 (Y7.43) thus, it is possible that this residue may possibly be involved in H-bond formation as well. Hydrophobic interactions are also observed between Tyr119 (Y2.64) and methyl group (C19) at position 5 in salvinorin A. This interaction is consistent with the predicted pharmacophore model. It was also observed that EL2 moved away (towards extracellular region) during the refinement resulting in expanded binding pocket in order to accommodate ligands of varying sizes. Consequently, the strong correlation between the observed ligand–receptor interactions and the reported experimental data provide additional validation to the developed hKOP receptor model.

Fig. 9
figure 9

(A) Proposed binding mode and important interactions of salvinorin A in the final hKOP model. Salvinorin A is displayed as green carbon atoms. Receptor side chains are displayed as white carbon atoms. Hydrogen bonds are shown as dotted lines. (B) Mapping of salvinorin A onto the selected top-ranked pharmacophore hypothesis. The green and blue contours represent HBA and HYD features, respectively

Docking simulations

The task of docking 15 known derivatives of salvinorin A (see supporting information; Fig. S2), having agonistic activity at hKOP receptor, was undertaken to authenticate the model by calculating the correlation between the pK i values of the ligands and their respective docking scores. The GA program GOLD was used to perform automated molecular docking. The reliability of the applied docking protocol was assessed by docking salvinorin A into the active site of modeled hKOP receptor. The key characteristic of a good docking program is its ability to reproduce the experimental binding modes of ligands [121]. To test this, a ligand is taken out of the 3D structure of its protein–ligand complex and docked back into its binding site. The docked binding mode is then compared with the experimental binding mode, and an rmsd between the two is calculated; a prediction of a binding mode is considered successful if the rmsd is below a certain value (usually 2.0 Å). The rmsd between the docked binding mode and the binding mode for salvinorin A in the generated protein–ligand complex was within this cut-off limit (1.61 Å).

This protocol was then similarly applied to all ligands. For each ligand, 10 solutions were generated. Generated ligand docking solutions mutually differ by· 1.5 Å rmsd. An analysis of the best docking modes of various docked ligands exhibited common binding characteristics, which may be due to the close structural similarity existing among them. For the majority of the ligands, top-scoring poses showed relevant binding interactions including formation of the key H-bond with Gln115 (Q2.60). The plot of GOLD score versus the binding affinities (pK i) is shown in Fig. 10. As logically expected, a robust correlation (r 2 = 0.80) between the experimental binding affinity and the GOLD score was achieved. Thus, the model can explain variations in the pK i values, which, in turn, is related to binding affinities of the ligands, implying that it had been constructed with reasonable accuracy. The results from the docking simulations are self-consistent in that the best docking modes of various ligands exhibited common binding characteristics among them, supporting the validity of the model and its suitability for structure based drug design.

Fig. 10
figure 10

Correlation plot of binding affinity versus GOLD docking score for salvinorin A derivatives

Furthermore, the structure-based and ligand-based methods were quantitatively compared by plotting the predicted activities of the test set compounds utilized to validate the pharmacophore model against their respective GOLD scores obtained from docking them into the receptor model. The plot of GOLD scores versus predicted activities (pK i) for these compounds is shown in Fig. 11 (see supporting information, Table S2). Initially, a correlation of 0.67 was obtained however, after removing one outlier (compound 27), the correlation improved to 0.85, which is an excellent correlation taking into consideration that a computationally driven GPCR model was used for the docking analysis and the only scoring function used was GOLD score. This result further reinforces the idea that our ligand-based approach and the receptor-based study are indeed converging.

Fig. 11
figure 11

Correlation plot of the predicted activities of the test set compounds obtained from pharmacophore model versus their GOLD docking scores obtained from hKOP model. Thus, we may conclude that our validated model indeed presents an appropriate platform for structure-based drug design and may be used for GPCR lead finding and compound optimization

Conclusions

A composite VS model is proposed where ligand-based and target-based information has been synergistically integrated to improve the effectiveness and performance of database searching of the drug discovery process via VS. Such a combination should lower the imprecision embedded into either systems and allow the weakness of one approach to be compensated by the strength of the other, leading to a more reliable solution than obtained by each method separately.

In this regard, a ligand-based quantitatively predictive pharmacophore was generated and validated for salvinorin A-based KOP receptor agonists. The resulting best hypothesis consisted of two HBA and three hydrophobic features. The reliability of the pharmacophore in quantitative terms was verified in several validation procedures. The pharmacophore model elaborated within this study may therefore, prove useful as a virtual high-throughput screen to identify novel molecular scaffolds as potential lead molecules.

In addition, a computational approach was utilized to determine the complete 3D structure of hKOP receptor utilizing the recently published 2.2 Å resolution X-ray crystal structure of bovine rhodopsin. The model was refined through molecular dynamics and energy minimization. The quality of the model was assessed using several validation tools like PROCHECK and PROSTAT. A model structure of the protein–ligand complex was also obtained which may be used as a tool to enhance our structure-based understanding of salvinorin A recognition at the hKOP. Our proposed receptor–ligand binding mode rationalized the observed single-point mutation data and SAR data. The model also helped to identify several critical ligand–receptor contact points in the binding site that might serve as the key molecular determinants of agonist recognition by the hKOP receptor. Binding site residues such as Gln115 (Q2.60) and Tyr313 (Y7.43) were observed to interact via hydrogen bond while some other residues like Tyr119 (Y2.64), Tyr320 (Y7.43) and Tyr313 (Y7.43) were found to interact via hydrophobic and/or aromatic interactions.

The model was further validated by docking a series of salvinorin A derivatives into the binding pocket of the receptor model. A significant correlation was obtained between the binding affinities and the docking scores. This underscores the predictive ability of the model in identifying salvinorin A derivatives with high binding affinity.

Remarkably, the independently generated pharmacophore and comparative model of hKOP receptor complemented each other and provided similar conclusions. Whereas the ligand-based pharmacophore model revealed binding features, which are already captured by ligands, the steric factors were less well accounted for or completely neglected. In an attempt to overcome this, inclusion of complementary information from the 3D receptor model was made that allowed for a comprehensive molecular recognition process. In the future, we propose to employ a hybrid computational screening approach for the discovery of several classes of novel and potent KOP receptor ligands. It is anticipated that the employed hybrid approach may be more effective for lead discovery from a large chemical database than either pharmacophore-based or structure-based screening methods alone. Thus, in consideration of the enormous efforts of in vitro screening of large compound libraries, the in silico database mining method presented in our study aims at enabling a fast and reliable identification of new lead compounds that could decisively reduce time and cost of hit finding within the drug discovery and development process, particularly in the case of GPCRs.

Supporting information available

A representative graph for the activity prediction of the test set compounds (Fig. S1) and chemical structures, binding affinities and the corresponding GOLD scores of the compounds used in automated docking simulation studies (Fig. S2). Pharmacophore validation results with negative controls are listed in Table S1. The estimated activities of the test set molecules (derived from the pharmacophore model) and their respective GOLD scores (obtained from homology derived hKOP receptor model) were used to generate Fig. 11 (Table S2).