Multi-Label Classifier Based on Histogram of Gradients for Predicting the Anatomical Therapeutic Chemical Class/Classes of a Given Compound
[abstract] Motivation: Given an unknown compound, is it possible to predict its ATC (Anatomical Therapeutic Chemical) class/classes? This is a challenging yet important problem since such a prediction could be used to deduce not only a compound's possible active ingredients but also its therapeutic, pharmacologi-cal, and chemical properties, thereby substantially expediting the pace of drug development. The prob-lem is challenging because some drugs and compounds belong to two or more ATC classes, making machine learning extremely difficult. Results: In this paper a multi-label classifier system is proposed that incorporates information about a compound's chemical-chemical interaction and its structural and fingerprint similarities to other com-pounds belonging to the different ATC classes. The proposed system reshapes a 1D feature vector to obtain a 2D matrix representation of the compound. This matrix is then described by a histogram of gra-dients that is fed into a LIFT (Multi-Label Learning with Label-Specific Features) classifier. Rigorous cross-validations demonstrate the superior prediction quality of this method compared to other state-of-the-art approaches developed for this problem, a superiority that is reflected particularly in the absolute true rate, the most important and harshest metric for assessing multi-label systems. Supplementary information: The MATLAB code for replicating the experiments presented in this pa-per is available at https://www.dropbox.com/s/7v1mey48tl9bfgz/ToolPaperATC.rar?dl=0Keywords Histogram of Gradients (HOG), Anatomical Therapeutic Chemical Class/Classes (ATC)