In-silico prediction of aqueous solubility plays an important role during the drug discovery and development processes. For many years, the limited performance of in-silico solubility models has been attributed to the lack of high-quality solubility data for pharmaceutical molecules. However, some studies suggest that the poor accuracy of solubility prediction is not related to the quality of the experimental data and that more precise methodologies (algorithms and/or set of descriptors) are required for predicting aqueous solubility for pharmaceutical molecules. In this study a large and diverse database was generated with aqueous solubility values collected from two public sources; two new recursive machine-learning approaches were developed for data cleaning and variable selection, and a consensus model based on regression and classification algorithms was created. The modelling protocol, which includes the curation of chemical and experimental data, was implemented in KNIME, with the aim of obtaining an automated workflow for the prediction of new databases. Finally, we compared several methods or models available in the literature with our consensus model, showing results comparable or even outperforming previous published models.


Quantitative Structure-Property Relationship (QSPR); KNIME; aqueous solubility; ADME; machine learning; Random Forest; supervised recursive selection.


Gabriela Falcón-Cano†, Christophe Molina∥, and Miguel Ángel Cabrera-Pérez †, §, ‡

†Unit of Modeling and Experimental Biopharmaceutics. Centro de Bioactivos Químicos. Universidad Central “Marta Abreu” de las Villas. Santa Clara 54830, Villa Clara, Cuba

∥PIKAÏROS S.A, 31650 Saint Orens de Gameville, France
§Department of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain

‡Department of Engineering, Area of Pharmacy and Pharmaceutical Technology, Miguel Hernández
University, 03550 Sant Joan d’Alacant, Alicante, Spain


DMPK & ADMET Journal (DOI: https://doi.org/10.5599/admet.852)
Special issue “Strategies of solubility enhancement and perspectives in solubility measurements of pharmaceutical compounds, Part I”, Vol. 8 No. 3 (2020). Published: 27-09-2020