Neural networks for Anatomical Therapeutic Chemical (ATC)


Purpose: Automatic Anatomical Therapeutic Chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's therapeutic and chemical characteristics in terms of how it affects multiple organs and physiological systems makes automatic ATC classification a vital yet challenging multilabel problem. The aim of this paper is to experimentally derive an ensemble of different feature descriptors and classifiers for ATC classification that outperforms the state-of-the-art.

Methods:. The proposed method is an ensemble generated by the fusion of neural networks (i.e. a Tabular model and Long Short-Term Memory Networks (LSTM)) and a multilabel classifiers based on Multiple Linear Regression (hMuLab). All classifiers are trained on three sets of descriptors. Features extracted from the trained LSTMs are also fed into hMuLab. Evaluations of ensembles are compared on a benchmark data set of 3883 ATC-coded pharmaceuticals taken from KEGG, a publicly available drug databank.

Results: Experiments demonstrate the power of our best ensemble, EnsATC, which is shown to outperform the best methods reported in the literature, including the state-of-the-art developed by the research group. The MATLAB source code of our system is freely available to the public at

Originality: This study demonstrates the power of extracting LSTM features and combining them with ATC descriptors in ensembles for ATC classification.

Keywords:Machine learning, Multilabel classifier, Bidirectional long short-term memory, ATC classification, Learned Features.

[Full Paper]