Novel QSPR Study on the Melting Points of a Broad Set of Drug-Like Compounds Using the Genetic Algorithm Feature Selection Approach Combined With Multiple Linear Regression and Support Vector Machine


Department of Chemistry, College of Basic Sciences, Shahrood Branch, Islamic Azad University, Shahrood, Iran


A robust and reliable quantitative structure-property relationship (QSPR) study was established to forecast the melting points (MPs)  of a diverse and long set including 250 drug-like compounds. Based on the calculated descriptors by Dragon software package, to detect homogeneities and to split the whole dataset into training and test sets, a principal component analysis (PCA) approach was used. Accordingly, there was no outlier in the constructed cluster. Afterwards, the genetic algorithm (GA) feature selection strategy was used to select the most impressive descriptors resulting in the best-fitted models. In addition, multiple linear regression (MLR) and support vector machine (SVM) were used to develop linear and non-linear models correlating the molecular descriptors and the melting points. The validation of the obtained models was confirmed applying cross validation, chance correlation along with statistical features associated with external test set. Our computational study exactly showed a determination coefficient and of 0.853 and a root mean square error (RMSE) of 11.082, which are better than those MLR model (R2=0.712, RMSE 15.042%) accounting for higher capability of SVM-based model in prediction of the theoretical values related to melting points. In fact, using the GA approach resulted in selection of powerful descriptors having useful information concerning effective variables on MPs, which can be utilized in further designing of drug-like compounds with desired melting points.


  1. Abramowitz R., Yalkowsky S. H., 1990. Melting-point, boiling-point, and symmetry. Pharm Res. 7 (9), 942-947.
  2. Katritzky A. R., Jain R., Lomaka A., Petrukhin R., Maran U., Karelson M., 2001. Perspective on the relationship between melting points and chemical structure. Cryst Growth Des. 1 (4), 261-265.
  3. Karthikeyan M., Glen R. C., Bender A., 2005. General melting point prediction based on a diverse compound data set and artificial neural networks. J Chem Inf Model. 45 (3), 581-590.
  4. Matheson L. E., Chen Y. S., 1995. A quantitative structure-transportability relationship for the release of a series of substituted benzenes and pyridines from a planar polydimethylsiloxane matrix. Int J Pharm. 125 (2), 297-307.
  5. Habibi-Yangjeh A., Pourbasheer E., Danandeh-Jenagharad M., 2008. Prediction of melting point for drug-like compounds using principal component-genetic algorithm-artificial neural network. Bull Korean Chem Soc. 29 (4), 833-841.
  6. Todeschini R., Consonni V. 2000. Handbook of Molecular Descriptors. Wiley-VCH. Weinheim, Germany.
  7. Atabati M., Khandani F., 2012. Ant colony optimization as a descriptor selection in QSPR modeling for prediction of lambda(max) of azo dyes. Chin Chem Lett. 23 (10), 1209-1212.
  8. Dai Y.-m., Zhu Z.-p., Cao Z., Zhang Y.-f., Zeng J.-l., Li X., 2013. Prediction of boiling points of organic compounds by QSPR tools. J Mol Graphics Model. 44, 113-119.
  9. Gharagheizi F., Sattari M., Ilani-Kashkouli P., Mohammadi A. H., Ramjugernath D., Richon D., 2013. A "non-linear" quantitative structure-property relationship for the prediction of electrical conductivity of ionic liquids. Chem Eng Sci. 101, 478-485.
  10. Goudarzi N., Goodarzi M., Mohammadhosseini M. M., Nekooei M., 2009. QSPR models for prediction of half-wave potentials of some chlorinated organic compounds using SR-PLS and GA-PLS methods. Mol Phys. 107 (17), 1739-1744.
  11. Liang G., Xu J., Liu L., 2013. QSPR analysis for melting point of fatty acids using genetic algorithm based multiple linear regression (GA-MLR). Fluid Phase Equilibr. 353, 15-21.
  12. Sosnowska A., Barycki M., Jagiello K., Haranczyk M., Gajewicz A., Kawai T., Suzuki N., Puzyn T., 2014. Predicting enthalpy of vaporization for persistent organic pollutants with quantitative structure-property relationship (QSPR) incorporating the influence of temperature on volatility. Atmos Environ. 87, 10-18.
  13. Toubaei A., Golmohammadi H., Dashtbozorgi Z., Acree W. E., Jr., 2012. QSPR studies for predicting gas to acetone and gas to acetonitrile solvation enthalpies using support vector machine. J Mol Liq. 175, 24-32.
  14. Golzar K., Amjad-Iranagh S., Modarress H., 2013. QSPR prediction of the solubility of CO2 and N-2 in common polymers. Measurement. 46 (10), 4206-4225.
  15. Maity U., Basu J. K., Sengupta S., 2013. A neural network prediction of conversion of benzothiophene oxidation catalyzed by nano-Ti-beta catalyst. Fuel. 113, 180-186.
  16. Qiu P., Ni Y.-N., Kokot S., 2013. Application of artificial neural networks to the determination of pesticides by linear sweep stripping voltammetry. Chin Chem Lett. 24 (3), 246-248.
  17. Zheng F., Zhan M., Huang X., Hameed M. D. M. A., Zhan C.-G., 2014. Modeling in vitro inhibition of butyrylcholinesterase using molecular docking, multi-linear regression and artificial neural network approaches. Biorg Med Chem. 22 (1), 538-549.
  18. Cortes C., Vapnik V., 1995. Support-Vector Networks. Mach Learn. 20, 273-297.
  19. Golmohammadi H., Dashtbozorgi Z., Acree W. E., Jr., 2012. Quantitative structure-activity relationship prediction of blood-to-brain partitioning behavior using support vector machine. Eur J Pharm Sci. 47 (2), 421-429.
  20. Hao M., Li Y., Wang Y., Zhang S., 2011. Prediction of P2Y(12) antagonists using a novel genetic algorithm-support vector machine coupled approach. Anal Chim Acta. 690 (1), 53-63.
  21. Xuan S., Wu Y., Chen X., Liu J., Yan A., 2013. Prediction of bioactivity of HIV-1 integrase ST inhibitors by multilinear regression analysis and support vector machine. Bioorg Med Chem Lett. 23 (6), 1648-1655.
  22. Zhong M., Xuan S., Wang L., Hou X., Wang M., Yan A., Dai B., 2013. Prediction of bioactivity of ACAT2 inhibitors by multilinear regression analysis and support vector machine. Bioorg Med Chem Lett. 23 (13), 3788-3792.
  23. Gao T., Sun S.-L., Shi L.-L., Li H., Li H.-Z., Su Z.-M., Lu Y.-H., 2009. An accurate density functional theory calculation for electronic excitation energies: The least-squares support vector machine. J Chem Phys. 130 (18), 184-194.
  24. Eddington N. D., Cox D. S., Khurana M., Salama N. N., Stables J. P., Harrison S. J., Negussie A., Taylor R. S., Tran U. Q., Moore J. A., Barrow J. C., Scott K. R., 2003. Synthesis and anticonvulsant activity of enaminones Part 7. Synthesis and anticonvulsant evaluation of ethyl 4- (substituted phenyl)amino -6-methyl-2-oxocyclohex-3-ene-1-carboxylates and their corresponding 5-methylcyclohex-2-enone derivatives. Eur J Med Chem. 38 (1), 49-64.
  25. Adimi M., Salimi M., Nekoei M., Pourbasheer E., Beheshti A. S., 2012. A quantitative structure-activity relationship study on histamine receptor antagonists using the genetic algorithm-multi-parameter linear regression method. J Serb Chem Soc. 77 (5), 639-650.
  26. Dolatabadi M., Nekoei M., Banaei A., 2010. Prediction of antibacterial activity of pleuromutilin derivatives by genetic algorithm-multiple linear regression (GA-MLR). Monatsh Chem. 141 (5), 577-588.
  27. Mohammadhosseini M., Nekoei M., 2013. Quantitative structure-electrochemistry relationship study for prediction of half-wave reduction potentials of some chlorinated organic compounds by genetic algorithm-multiple linear regression. Asian J Chem. 25 (1), 349-352.
  28. Nekoei M., Salimi M., Dolatabadi M., Mohammadhosseini M., 2011. Prediction of antileukemia activity of berbamine derivatives by genetic algorithm-multiple linear regression. Monatsh Chem. 142 (9), 943-948.
  29. Noorizadeh H., Ardakani S. S., Ahmadi T., Mortazavi S. S., Noorizadeh M., 2013. Application of genetic algorithm-kernel partial least square as a novel non-linear feature selection method: partitioning of drug molecules. Drug Test Anal. 5 (2), 89-95.
  30. Noorizadeh H., Farmany A., Narimani H., Noorizadeh M., 2013. QSRR using evolved artificial neural network for 52 common pharmaceuticals and drugs of abuse in hair from UPLCTOF-MS. Drug Test. Anal. 5 (5), 320-324.
  31. Noorizadeh H., Farmany A., Noorizadeh M., Kohzadi M., 2013. Prediction of polar surface area of drug molecules: A QSPR approach. Drug Test Anal. 5 (4), 222-227.
  32. Pourbasheer E., Riahi S., Ganjali M. R., Norouzi P., 2010. Quantitative structure-retention relationship (QSRR) models for predicting the GC retention times of essential oil components. Acta Chromatogr. 22 (3), 357-373.
  33. Riahi S., Ganjali M. R., Pourbasheer E., Norouzi P., 2008. QSRR study of GC retention indices of essential-oil compounds by multiple linear regression with a genetic algorithm. Chromatographia. 67 (11-12), 917-922.
  34. Pourbasheer E., Aalizadeh R., Ganjali M. R., Norouzi P., 2014. QSAR study of IKK beta inhibitors by the genetic algorithm: multiple linear regressions. Med Chem Res. 23 (1), 57-66.
  35. Mohammadhosseini M., Deeb O., Alavi- Gharabagh A., Nekoei M., 2012. Exploring novel QSRRs for simulation of gas chromatographic retention indices of diverse sets of terpenoids in Pistacia lentiscus L. essential oil using stepwise and genetic algorithm multiple linear regressions. Anal Chem Lett. 2, 80-102.
  36. Vapnik N. V. 1998. Statistical Learning Theory. John Wiley & Sons. New York.
  37. Vapnik V. N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag
  38. Agrawal V. K., Khadikar P. V., 2001. QSAR prediction of toxicity of nitrobenzenes. Biorg Med Chem. 9 (11), 3035-3040.
  39. Pourbasheer E., Riahi S., Ganjali M. R., Norouzi P., 2010. Quantitative structure-activity relationship (QSAR) study of interleukin-1 receptor associated kinase 4 (IRAK-4) inhibitor activity by the genetic algorithm and multiple linear regression (GA-MLR) method. J Enzym Inhib Med Chem. 25 (6), 844-853.