Prediction of Adverse Drug Reactions

Supplemental material for the paper "Facilitating Prediction of Adverse Drug Reactions by Using Knowledge Graphs and Multi-Label Learning Models".

View the Project on GitHub

Article supplemental material

This page provides a full description of the data sets used in this manuscript and that are made available. These data sets were used to evaluate all approaches reviewed in the manuscript.

Download

All data sets files are publicly available for download at https://doi.org/10.6084/m9.figshare.4823203.

Liu’s data set

This data set was originally proposed by Liu et al. (2012)1, and then processed after by Zhang et al. (2015)2 and Zhang et al. (2016)3 for machine learning. Liu’s data set contains 832 drugs with 2892 features, and 1385 ADRs.

The results obtained using this data set are in Table 4 and Table 5 of the article.

Folder: liu/ Files:

The feature types, sources, and IDs are described as follows:

Feature type Specific feature Source ID Dimension Dictionary key
Chemical Substructures PubChem Substructure Fingerprints* 881 chemical
Biological Targets DrugBank GeneBank Gene IDs 786 Targets
Biological Transporters DrugBank HGNC IDs 72 Transporters
Biological Enzymes DrugBank GeneBank Gene IDs 111 Enzymes
Biological Pathways KEGG KEGG IDs 173 Pathways
Phenotypic Treatment indications SIDER CUI disease code 869 Treatment
Label Side effects SIDER CUI disease code 1385 side_effect

(*) A full description of PubChem Substructure Fingerprints can be found at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

Bio2RDF v1

We consider the list of drugs from Liu’s data set but not their features. Instead, we extract the features from the Knowledge Graph generated from Bio2RDF v1 DrugBank and SIDER data sets (Muñoz et al., 2016)4. This generates 30161 features for the 832 drugs, and we consider the same set of 1385 ADRs in Liu’s data set.

Download original files

The original Bio2RDF RDF files can be downloaded at http://purl.com/bib-adr-prediction/data For the feature extraction from those files, please check the supplemental material of the article.

The results obtained using this data set are in Table 6 of the article.

Folder: bio2rdf_v1/ Files:

Bio2RDF v2

We consider the list of drugs from Liu’s data set but not their features. Instead, we extract the features from the Knowledge Graph generated from Bio2RDF v2 DrugBank, SIDER and KEGG data sets. This generates 37368 features for the 832 drugs, and we consider the same set of 1385 ADRs in Liu’s data set.

Download original files

The original Bio2RDF RDF files can be downloaded at http://purl.com/bib-adr-prediction/data For the feature extraction from those files, please check the supplemental material of the article.

The results obtained using this data set are in Table 7 of the article.

Folder: bio2rdf_v2/ Files:

Liu + Bio2RDF v2

We also consider the integration of features from both Liu and Bio2RDF v2 data sets for the 832 drugs. This generates 40260 features in total, which are used to train the machine learning models.

The results obtained using this data set are in Table 8 of the article.

Folder: liubio2rdf_v2/ Files:

SIDER 4 data set

We also performed an independent evaluation using the SIDER 4 data set provided by Zhang et al. (2015)2, which comprises a subset of the drugs from Liu’s data set plus some newly added drugs.

Download original files

Zhang, Wen; Liu, Feng; Luo, Longqiang; Zhang, Jingxia (2015): Predicting drug side effects by multi-label learning and ensemble learning. figshare. http://doi.org/10.6084/m9.figshare.c.3608738 Retrieved: 12 34, May 09, 2017 (GMT)

The results obtained using this data set are in Table 9 of the article.

Folder: sider4/ Files:

The feature types, sources, and IDs are described as follows:

Feature type Specific feature Source ID Dimension Dictionary key
Chemical Substructures PubChem Substructure Fingerprints* 881 chemical
Biological Targets DrugBank GeneBank Gene IDs 1046 Targets
Biological Transporters DrugBank HGNC IDs 96 Transporters
Biological Enzymes DrugBank GeneBank Gene IDs 160 Enzymes
Biological Pathways KEGG KEGG IDs 268 Pathways
Phenotypic Treatment indications SIDER CUI disease code 2537 Treatment
Label Side effects SIDER CUI disease code 5579 side_effect

(*) A full description of PubChem Substructure Fingerprints can be found at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

SIDER 4 + Bio2RDF v2 data sets

Similarly to what we did with Liu’s data set, we also consider the list of drugs in SIDER 4 data set but not their features. Instead, we extract the features from the Knowledge Graph generated from Bio2RDF v2 DrugBank, SIDER and KEGG data sets. This generates 43843 features for the 1080 drugs (771 for training and 309 for testing), and we consider the same set of 5579 ADRs in SIDER 4 data set.

The results obtained using this data set are in Table 10 of the article.

Folder: sider4bio2rdf_v2_sider/ Files:

SIDER 4 + Bio2RDF v2 + Aeolus data sets

Additionally, we evaluate the predictions on newly added ADRs which were discovered (reported) after the generation of SIDER 4 data set. This relationships are published in the Aeolus data set, which is generated from the FAERS reports. The matrices shape is as in SIDER 4, and we update the matrix y_test with drug-ADR relations from Aeolus.

The results obtained using this data set are in Table 11 of the article.

Folder: sider4bio2rdf_v2_aeolus/ Files:

Changelog

References

  1. Liu M, Wu Y, Chen Y, et al. Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. Journal of the American Medical Informatics Association. 2012. 

  2. Zhang W, Liu F, Luo L, et al. Predicting drug side effects bymulti-label learning and ensemble learning. BMC bioinformatics. 2015;16:1.  2

  3. Zhang W, Chen Y, Tu S, et al. Drug side effect prediction through linear neighborhoods and multiple data source integration. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2016. p. 427–434. 

  4. Muñoz E, Novacek V, Vandenbussche PY. Using drug similarities for discovery of possible adverse reactions. In: AMIA 2016, American Medical Informatics Association Annual Symposium. American Medical Informatics Association; 2016. p. 924–933.