Academia.eduAcademia.edu
Simulation of alternative binding modes in a structure-based QSAR study of HIV-1 protease inhibitors Manuel Pastor, Carlos Pérez, and Federico Gago Department of Pharmacology, University of Alcalá, Alcalá de Henares, Spain We have used a published set of inhibitors of HIV-1 protease1 to build a COMBINE-type structure-based QSAR model with good predictive ability (r2 5 0.90, q2 5 0.69).2 Since the compounds in the training series exhibit most of their structural variability on one-half of the pseudosymmetrical binding cavity and only one binding orientation was explored for each molecule, the model describes mainly the effect of the structural changes on interactions involving only one-half of the binding cavity (pockets S19 and S29). Thus, the model cannot be expected to give accurate predictions for new compounds exhibiting structural variation in both halves. The model does in fact show a tendency to underpredict slightly the biological activity of the molecules in the external test set. In an attempt to improve the quality of the model, both possible orientations of the ligands are now considered so that structural variation takes place in all binding pockets. One possibility would have been to build an additional set of complexes with the inhibitors docked in a reversed orientation. The alternative we have explored, however, consists of manipulating the data matrix describing the interaction energies so that each row is duplicated and the order of the variables in the duplicated rows is swapped between subunits. This simple approach has produced a new model that is similar in quality to the original model (r2 5 0.89, q2 5 0.64) but lacks the tendency to underpredict the activity of the compounds in the external set. Moreover, since equivalent residues are assigned equivalent weights, the model is insensitive to ligand orientation and is easier to interpret. © 1998 by Elsevier Science Inc. this interaction energy would have important uses for the prediction of the activity of novel compounds in advance of their synthesis. Unfortunately, even when large amounts of structural information are available, such interaction energies are not easy to calculate accurately. When dealing with series of closely related compounds, it is possible to build simple models that relate biological activity to energy of interaction computed with molecular mechanics. An INTRODUCTION The biological activity of a compound largely depends on its interaction energy with the receptor. Accurate estimations of Color Plate for this article appears on page 389. Address reprint requests to: F. Gago, Department of Pharmacology, University of Alcalá, E-28871 Alcalá de Henares, Spain. Journal of Molecular Graphics and Modelling 15, 364 –371, 1997 © 1998 by Elsevier Science Inc. 655 Avenue of the Americas, New York, NY 10010 Figure 1. Correlation between experimental and calculated activities (pIC50 values) for the training set (filled squares) and the prediction set (open triangles), using model Csingle. The increase in slope of the line representing the leastsquares regression fit for the compounds in the prediction set (dashed) with respect to a perfect fit (solid) is indicative of a tendency to underpredict. 1093-3263/97/$17.00 PIIS1093-3263(98)00007-2 Figure 2. Schematic representation of models Csingle and Cduplo. Binding sites on the enzyme are denoted S1, S2, S19, and S29 whereas substituents on the inhibitors are labeled P1, P2, P19, and P29. Note that the two alternative orientations of the molecules in Cduplo are related by a 180° rotation. interesting example is the series of HIV-1 protease inhibitors published by Holloway et al.1 for which a linear regression model was reported between inhibitory potencies (pIC50) and ligand–receptor interaction energies calculated with a simple Figure 3. Scheme representing the “copy and paste” procedure used to derive Cduplo from the original Csingle energy decomposition matrix. See Methods for details. force field. More recently, we have used the same data set to build a model, also relating biological activities and interaction energies, which partitions these energies into a number of residue-based contributions.2 This methodology, which we term COMBINE (COMparative BINding Energy)2– 4 analysis, is essentially a 3D QSAR method that uses structures of the ligand–receptor complexes. Ligand–receptor interaction energies, computed within the framework of the AMBER suite of programs5 and broken down into residue contributions, were supplemented with estimations of the electrostatic contribution to the desolvation of ligands and receptors using a continuum method.6 All these energy values were then correlated with the biological activity using partial least squares (PLS) regression analysis. HIV-1 protease is a homodimeric enzyme that, in its complex with many inhibitors, has the interesting peculiarity of being pseudosymmetrical.7 In such a situation, the ligands can be inserted into the binding site in two alternative and nearly equivalent orientations, but usually just one of the two possibilities is considered. The COMBINE model obtained using only one orientation (Csingle) is reasonably good but is limited in two respects. First, since the model computes the contribution of each ligand–residue interaction to the activity, unequal importance is assigned to equivalent residues in subunits A and B, which are only arbitrarily distinguishable. Second, when Csingle is used to predict the activity of novel compounds, their biological activity tends to be slightly underpredicted (Figure 1). These limitations arise from the fact that the compounds in the training series exhibit structural variation mainly on one side of the structure (P19 and P29 substituents) (Color Plate 1a). Consequently, the model is trained to recognize the influence on the activity of structural changes taking place on only one-half of the molecules but receives no information about the other half. However, as mentioned above, this class of ligands can be docked into the binding site in two alternative and equivalent orientations. Thus, if the ligands were considered in both orientations, the model would “learn” the effect of intro- J. Mol. Graphics Mod., 1997, Vol. 15, December 365 Table 1. HIV-1 protease inhibitors included in the training seta No. Chemical structure Exp. pIC50 No. Chemical structure Exp. pIC50 (continued) 366 J. Mol. Graphics Mod., 1997, Vol. 15, December Table 1. (Continued) No. a Chemical structure Exp. pIC50 No. Chemical structure Exp. pIC50 See Refs. 1 and 2. ducing structural variability on both sides of the binding cleft. In our example, this means that the model would adequately represent the interactions of the ligand substituents with equivalent residues from each of the two HIV-1 protease subunits (Color Plate 1 and Fig. 2). The aim of this work is to obtain a model that adequately considers the duality of the potential interaction in order to check its sensitivity to ligand orientation within the receptorbinding site and its predictive ability. METHODS To consider both alternative orientations of n ligands within the binding site, the most straightforward approach would be to actually build 2n complexes. Every compound would then be included in the analysis twice. However, since the target is symmetrical, it can be argued that the interactions of the variable part of the ligands with subunit A in one orientation can be considered equivalent to the interactions of the same Figure 4. Plots of experimental versus calculated inhibitory activities (pIC50 values) for the compounds belonging to the training set (open squares) and those in the prediction set (filled triangles). Two possible orientations are considered for the compounds in the prediction set. Model Csingle (a) provides two different sets of predicted activities for each orientation, linked together by a horizontal line, whereas model Cduplo (b) yields identical values for both orientations. J. Mol. Graphics Mod., 1997, Vol. 15, December 367 Table 2. HIV-1 protease inhibitors included in the prediction seta No. a Chemical structure Exp. pIC50 No. Chemical structure Exp. pIC50 See Refs. 1 and 2. substituents with subunit B when the ligand binds in the alternative orientation. Therefore, both alternative binding modes can be represented with a simple “copy and paste” of the matrix describing the interaction (Figure 3). The alternative orienta- 368 J. Mol. Graphics Mod., 1997, Vol. 15, December tion is thus simulated by duplicating and swapping the blocks of variables that describe the electrostatic and steric contributions of every residue in each subunit to the ligand–receptor interaction energy. The descriptors involving interactions with Table 3. Comparison of models Csingle and Cduploa Model Objects Variables LV r2 q2 SDEPcv SDEPext SDEPdup Csingle Cduplo 32 64 47 56 2 2 0.90 0.89 0.73 0.72 0.69 0.69 0.59 0.79 1.54 0.79 r2, Squared correlation coefficient; q2, squared cross-validated correlation coefficient (using five randomly assigned groups); SDEPcv, standard deviation error of the predictions, as obtained from the cross-validation analysis; SDEPext, standard deviation error of the predictions, obtained from the prediction of the external prediction set; SDEPdup, standard deviation error of the predictions, obtained from the prediction of the external prediction set, but considering both orientations for each ligand. a residues Asp A25 and Asp B25 were not duplicated as these aspartic acid residues cannot be considered equivalent owing to their different protonation states.8 As in Csingle, two extra terms were included to describe the electrostatic contribution to the desolvation of both the ligand and the receptor.2 The new model (Cduplo) was built using the same methodology employed to derive Csingle, as described in Ref. 2. Both models were produced using compounds 1–34 (Table 1) as the training set and compounds 35–50 (Table 2) as the prediction set. The steric contributions to the ligand–receptor interaction energies were computed using the AMBER force field,9 whereas the electrostatic contributions to both the ligand– receptor interaction energies and the desolvation of ligands and receptor were computed using DelPhi.6 Data pretreatment and PLS model building and validation were performed using GOLPE 3.0.10 The quality of models Csingle and Cduplo is summarized in Table 3. RESULTS Advantages of the new model: Predictive ability Table 3 shows that the quality of models Csingle and Cduplo is comparable. The ability to predict the activities of the external set, in the binding orientation studied, is better for Csingle (SDEPext 5 0.59 compared with SDEPext 5 0.79 in Cduplo). However, if the compounds in the external prediction set are considered in both possible binding orientations, so that the number of objects and activity values is duplicated, Csingle yields a larger error in the predictions (SDEPdup 5 1.54) whereas Cduplo is insensitive to ligand orientation (SDEPdup 5 0.79). It must be borne in mind that the training set consists of molecules in which substituent variation is limited to just half of the ligand, so that when only one binding orientation is Figure 5. Weighted PLS pseudocoefficients for each of the van der Waals and electrostatic energy variables studied for models Csingle (a) and Cduplo (b). On the horizontal axis, the variables are ordered sequentially within subunit A (1–99, darker background) and subunit B (100 –198, lighter background), followed by the interactions with the water molecule (199). Note the equivalence between residues from both subunits in model Cduplo except for residues Asp A25 and Asp B25. J. Mol. Graphics Mod., 1997, Vol. 15, December 369 considered the resulting QSAR model (Csingle) has learned about the effect of differences from interactions with just one side of the receptor. In the Cduplo model, these differences in the ligand are considered as making contributions to interactions with both halves of the receptor. In this way, the QSAR model has been trained to recognize changes in both sides of the ligand, and this is why Cduplo performs better on the external prediction set in which there are changes on both sides of the ligand. As can be seen from the plots represented in Figure 4, model Csingle gives two different activity values to each compound, one for each orientation (Figure 4a). One of them is consistently much lower than the experimental value whereas the other is much closer, even though a tendency to underpredict is also observable. The reason for this behavior is related to the characteristics of the training series: the model is able to recognize the influence on the activity of just one-half of the molecules. Therefore, when challenged with molecules displaying variation on both halves, the predictions are more accurate when the part of the ligand showing more variation is oriented toward the best represented part of the receptor, but even so the model will not recognize the increase in affinity provided by the other part. On the other hand, Cduplo produces unique, consistent data of biological activity (Figure 4b). Moreover, the predicted residuals do not show any significant bias of the model to overpredicting or underpredicting the activity of external compounds. Advantages of the new model: Interpretability COMBINE models are useful to highlight those receptor residues whose interaction with the ligand contribute most to increasing the biological activity.2– 4 This information can be represented in a simple fashion in the form of weighted PLS pseudocoefficients, as shown in Figure 5. It is apparent that model Csingle (Figure 5a) assigns different importance to equivalent residues of subunits A and B (except for Asp A25 and Asp B25, which are not equivalent),8 whereas in the new Cduplo model (Figure 5b) a unique value is given. DISCUSSION The rationale behind a COMBINE analysis is to accurately translate distinct ligand–receptor interaction energies from a set of complexes into a large number of informative variables in order to find a model that correlates these descriptors and the biological activity. The hope is that the data matrix can capture the essence of all the structural changes in the compounds studied (“training set”), and that the resulting quantitative model will highlight those variables that are more important for improving the activity.2– 4 Once this information is gained, it can be used to advantage in the design of new structural changes. However, the quality of a QSAR model is limited by the quality of the training series, and in the original work,1,2 the arbitrary assumption of studying only one of the two alternative binding modes may have limited the ability of the model to give accurate predictions for new compounds. In the present study, since each compound is included in the training set twice (in both orientations and with the same activity value), the method is forced to produce an answer that respects the pseudosymmetry of the target. Thus, the solution is constrained to reproduce an already known characteristic of the ligand–receptor complex. Such restrained solutions have the 370 J. Mol. Graphics Mod., 1997, Vol. 15, December advantages of being less sensitive to the noise in the descriptor variables and of producing more robust models. In addition, since the number of objects is duplicated, the variables-toobjects ratio is decreased. It should also be emphasized that the method described here requires neither additional model building nor extra experimental work, as it simply manipulates already existing data. The numerical manipulation of the data matrix is simple and can be carried out on a standard spreadsheet or using small purposewritten programs. The method reported here has a limited range of application because there are few biological receptors showing similar symmetric characteristics. However, it can be applied to different series of HIV-1 protease inhibitors. Also, it should be noted that the method described is not limited to COMBINE models and can be applied in the context of other 3D QSAR methodologies,11 such as CoMFA, GRID/GOLPE, etc. CONCLUSIONS The procedure described represents the incorporation of symmetry constraints into a COMBINE 3D QSAR model. The method is performed by a simple manipulation of the data matrix already obtained and results in a model with comparably good predictive ability that is easier to interpret. REFERENCES 1 Holloway, M.K., Wai, J.M., Halgren, T.A., Fitzgerald, P.M.D., Vacca, J.P., Dorsey, B.D., Levin, R.B., Thompson, W.J., Chen, L.J., deSolms, S.J., Gaffin, N., Ghosh, A.K., Giuliani, E.A., Graham, S.L., Guare, J.P., Hungate, R.W., Lyle, T.A., Sanders, W.M., Tucker, T.J., Wiggins, M., Wiscount, C.M., Woltersdorf, O.W., Young, S.D., Darke, P.L., and Zugay, J.A. A priori prediction of activity for HIV-1 protease inhibitors employing energy minimization in the active site. J. Med. Chem. 1995, 38, 305–317 2 Pérez, C., Pastor, M., Ortiz, A.R., and Gago, F. Comparative binding energy (COMBINE) analysis of HIV-1 protease inhibitors: Incorporation of solvent effects and validation as a powerful tool in receptor-based drug design. J. Med. Chem. 1998, 41, 836–852 3 Ortiz, A.R., Pisabarro, M.T., Gago, F., and Wade, R.C. Prediction of drug binding affinities by comparative binding energy analysis. J. Med. Chem. 1995, 38, 2681– 2691 4 Wade, R.C., Ortiz, A.R., and Gago, F. Comparative binding energy analysis. In: 3D QSAR in Drug Design, Vol. 2: Ligand–Protein Interactions and Molecular Similarity (H. Kubinyi, G. Folkers, and Y.C. Martin, eds.). Kluwer Academic Publishers, Dordrecht, 1998, pp. 19–34 5 Pearlman, D.A., Case, D.A., Caldwell, J.W., Ross, W.S., Cheatham, T.E., Ferguson, D.M., Seibel, G.L., Singh, U.C., Weiner, P., and Kollman, P.A. AMBER (UCSF): Assisted Model Building with Energy Refinement, version 4.1. Department of Pharmaceutical Chemistry, University of California, San Francisco, 1995 6 Nicholls, A., and Honig, B. A rapid finite difference algorithm, utilizing successive over-relaxation to solve the Poisson–Boltzmann equation. J. Comput. Chem. 1991, 12, 435–445 7 Wlodawer, A., and Erickson, J.W. Structure-based inhibitors of HIV-1 protease. Annu. Rev. Biochem. 1993, 62, 543–585 8 Wang, Y.-X., Freedberg, D.I., Yamazaki, T., Wingfield, P.T., Stahl, S.J., Kaufman, J.D., Kiso, Y., and Torchia, D.A. Solution NMR evidence that the HIV-1 protease catalytic aspartyl groups have different ionization states in the complex formed with the asymmetric drug KNI272. Biochemistry 1996, 35, 9945–9950 9 Cornell, W.D., Cieplak, P., Bayly, C.I., Gould, I.R., Merz, K.M., Ferguson, D.M., Spellmeyer, D.C., Fox, T., Caldwell, J.W., and Kollman, P.A. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 1995, 117, 5179–5197 10 Pastor, M. GOLPE, version 3.0. Multivariate Infometric Analysis (MIA). Perugia, Italy, 1996 11 Kubinyi, H. (ed.). 3D QSAR in Drug Design: Theory, Methods and Applications. ESCOM Science Publishers, Leiden, 1993 J. Mol. Graphics Mod., 1997, Vol. 15, December 371