Main Menu

Knowledge Management Center for Illuminating the Druggable Genome

Avi Ma’ayan, PhD
Principal Investigator
Professor, Department of Pharmacological Sciences
Director, Mount Sinai Center for Bioinformatics
Icahn School of Medicine at Mount Sinai
NIH grant number: U24CA224260-01


To better understand the function of the understudied protein targets, which are the focus of the implementation phase of the Illuminating the Druggable Genome (IDG) project, we impute knowledge using machine learning strategies. To establish this classification system, we organize data from many omics- and literature-based resources into attribute tables where genes are the rows and their attributes are the columns. Examples of such attribute tables include gene or protein expression in cancer cell lines (CCLE) or human tissues (GTEx), changes in expression in response to drug perturbations or single-gene knockdowns (LINCS), regulation by transcription factors based on ChIP-seq data (ENCODE), and phenotypes in mice observed when single genes are knocked out (KOMP). In total, we process and abstract data from over 100 resources. We then predict target function, target association with pathways, small-molecules/drugs that modulate the activity and expression of the target, and target relevance to human disease. Overall, the KMC-ISMMS develops a useful resource that will accelerate target and drug discovery.

Diverse datasets from different resources are organized into attribute tables to perform machine learning strategies to impute knowledge about gene function of the understudied targets of IDG.

Screenshot from the ARCHS4 user interface: For developing the ARCHS4 resource, all available FASTQ files from RNA-seq experiments were retrieved from the Gene Expression Omnibus (GEO) and aligned using a cloud-based infrastructure. In total 137,792 samples are accessible through ARCHS4 with 72,363 mouse and 65,429 human samples. Through efficient use of cloud resources and dockerized deployment of the sequencing pipeline, the alignment cost per sample is reduced to less than one cent. The ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene landing pages that provide average expression across cell lines and tissues, top co-expressed genes, and predicted biological functions and protein-protein interactions for each gene, including all the IDG targets of interest, based on prior knowledge combined with co-expression data.


  • Ma’ayan Laboratory:
  • Mount Sinai Center for Bioinformatics:
  • Harmonizome
  • GEN3VA
  • Enrichr
  • Clustergrammer
  • ARCHS4


  • @AviMaayan
  • @MaayanLab

You Tube


KMC-ISMMS publications:

  1. Nakahara F, Borger DK, Wei Q, Pinho S, Maryanovich M, Zahalka AH, Suzuki M, Cruz CD, Wang Z, Xu C, Boulais PE, Ma'ayan A, Greally JM, Frenette PS. Engineering a haematopoietic stem cell niche by revitalizing mesenchymal stromal cells. Nature Cell Biology 2019 Apr 15. doi: 10.1038/s41556-019-0308-3. PMID: 30988422
  2. Wang Z, Lachmann A, Ma'ayan A. Mining data and metadata from the gene expression omnibus. Biophysical Reviews 2019 Feb;11(1):103-110. PMID: 30594974
  3. Ellis RJ, Wang Z, Genes N, Ma'ayan A. Predicting opioid dependence from electronic health records with machine learning. BioData Mining 2019 Jan 29;12:3. PMID: 30728857
  4. Mok KW, Saxena N, Heitman N, Grisanti L, Srivastava D, Muraro MJ, Jacob T, Sennett R, Wang Z, Su Y, Yang LM, Ma'ayan A, Ornitz DM, Kasper M, Rendl M. Dermal condensate niche fate specification occurs prior to formation and is placode progenitor dependent. Developmental Cell 2019 Jan 7;48(1):32-48.e5. PMID: 30595537
  5. Oprea TI, Jan L, Johnson GL, Roth BL, Ma'ayan A, Schürer S, Shoichet BK, Sklar LA, McManus MT. Far away from the lamppost. PLoS Biology 2018 Dec 11;16(12):e3000067. PMID: 30532236
  6. Torre D, Lachmann A, Ma'ayan A. BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud. Cell Systems 2018 Nov 28;7(5):556-561.e3. PMID: 30447998
  7. Wang Z, He E, Sani K, Jagodnik KM, Silverstein M, Ma'ayan A. Drug Gene Budger (DGB): An application for ranking drugs to modulate a specific gene based on transcriptomic signatures. Bioinformatics. 2018 Aug 31. doi: 10.1093/bioinformatics/bty763. PMID: 30169739
  8. Clarke DJB, Kuleshov MV, Schilder BM, Torre D, Duffy ME, Keenan AB, Lachmann A, Feldmann AS, Gundersen GW, Silverstein MC, Wang Z, Ma'ayan A. eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks. Nucleic Acids Research 2018 Jul 2;46(W1):W171-W179. PMID: 29800326
  9. Grimes M, Hall B, Foltz L, Levy T, Rikova K, Gaiser J, Cook W, Smirnova E, Wheeler T, Clark NR, Lachmann A, Zhang B, Hornbeck P, Ma'ayan A, Comb M. Integration of protein phosphorylation, acetylation, and methylation data sets to outline lung cancer signaling networks. Science Signaling 2018 May 22;11(531). pii: eaaq1087. PMID: 29789295
  10. Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma'ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 2018 Apr 10;9(1):1366. PMID: 29636450
  11. Torre D, Krawczuk P, Jagodnik KM, Lachmann A, Wang Z, Wang L, Kuleshov MV, Ma'ayan A. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses. Scientific Data 2018 Feb 27;5:180023. PMID: 29485625
  12. Wang Z, Lachmann A, Keenan AB, Ma'ayan A. L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics. 2018 Jun 15;34(12):2150-2152. PMID: 29420694
  13. Wang Z, Li L, Glicksberg BS, Israel A, Dudley JT, Ma’ayan A. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. Journal of Biomedical Informatics 2017 Nov 4. pii: S1532-0464(17)30240-X. PMID: 29113935
  14. Niepel M, Hafner M, Duan Q, Wang Z, Paull EO, Chung M, Lu X, Stuart JM, Golub TR, Subramanian A, Ma'ayan A, Sorger PK. Common and cell-type specific responses to anti-cancer drugs revealed by high throughput transcript profiling. Nature Communications 2017 Oct 30;8(1):1186. PMID: 29084964
  15. Fernandez NF, Gundersen GW, Rahman A, Grimes ML, Rikova K, Hornbeck P, Ma'ayan A. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data 2017 Oct 10;4:170151. PMID: 28994825
  16. Asada N, Kunisaki Y, Pierce H, Wang Z, Fernandez NF, Birbrair A, Ma'ayan A, Frenette PS. Differential cytokine contributions of perivascular haematopoietic stem cell niches. Nature Cell Biology 2017 Mar;19(3):214-223. PMID: 28218906
  17. Shameer K, Glicksberg BS, Hodos R, Johnson KW, Badgeley MA, Readhead B, Tomlinson MS, O'Connor T, Miotto R, Kidd BA, Chen R, Ma'ayan A, Dudley JT. Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Briefings in Bioinformatics 2017 Feb 15. PMID: 28200013
  18. Gundersen GW, Jagodnik KM, Woodland H, Fernandez NF, Sani K, Dohlman AB, Ung PM, Monteiro CD, Schlessinger A, Ma'ayan A. GEN3VA: aggregation and analysis of gene expression signatures from related studies. BMC Bioinformatics 2016 Nov 15;17(1):461. PMID: 27846806
  19. Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, Jenkins SL, Feldmann AS, Hu KS, McDermott MG, Duan Q, Clark NR, Jones MR, Kou Y, Goff T, Woodland H, Amaral FM, Szeto GL, Fuchs O, Schüssler-Fiorenza Rose SM, Sharma S, Schwartz U, Bausela XB, Szymkiewicz M, Maroulis V, Salykin A, Barra CM, Kruth CD, Bongio NJ, Mathur V, Todoric RD, Rubin UE, Malatras A, Fulp CT, Galindo JA, Motiejunaite R, Jüschke C, Dishuck PC, Lahl K, Jafari M, Aibar S, Zaravinos A, Steenhuizen LH, Allison LR, Gamallo P, de Andres Segura F, Dae Devlin T, Pérez-García V, Ma'ayan A. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications 2016 Sep 26;7:12846. PMID: 27667448
  20. Wang Z, Clark NR, Ma'ayan A. Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics 2016 Aug 1;32(15):2338-45. PMID: 27153606
  21. Wang Z, Ma'ayan A. An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study. F1000Research 2016 Jul 5;5:1574. PMID: 27583132
  22. Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford) 2016 Jul 3;2016. PMID: 27374120/a>
  23. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 2016 Jul 8;44(W1):W90-7. PMID: 27141961
  24. Duan Q, Reid SP, Clark NR, Wang Z, Fernandez NF, Rouillard AD, Readhead B, Tritsch SR, Hodos R, Hafner M, Niepel M, Sorger PK, Dudley JT, Bavari S, Panchal RG, Ma'ayan A. L1000CDS2: LINCS L1000 characteristic direction signatures search engine. NPJ Systems Biology and Applications 2016;2. pii: 16015. PMID: 28413689
  25. Khan JA, Mendelson A, Kunisaki Y, Birbrair A, Kou Y, Arnal-Estapé A, Pinho S, Ciero P, Nakahara F, Ma'ayan A, Bergman A, Merad M, Frenette PS. Fetal liver hematopoietic stem cell niches associate with portal vessels. Science 2016 Jan 8;351(6269):176-80. PMID: 26634440
  26. Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, Chen R, Bottinger EP, Dudley JT. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Science Translational Medicine 2015 Oct 28;7(311):311ra174. PMID: 26511511
  27. Rouillard AD, Wang Z, Ma'ayan A. Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction. Computational Biology and Chemistry 2015 Oct;58:104-19. PMID: 26101093
  28. Gundersen GW, Jones MR, Rouillard AD, Kou Y, Monteiro CD, Feldmann AS, Hu KS, Ma'ayan A. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions. Bioinformatics 2015 Sep 15;31(18):3060-2. PMID: 25971742
  29. Wang Z, Clark NR, Ma'ayan A. Dynamics of the discovery process of protein-protein interactions from low content studies. BMC Systems Biology 2015 Jun 6;9:26. PMID: 26048415
  30. Duan Q, Wang Z, Fernandez NF, Rouillard AD, Tan CM, Benes CH, Ma'ayan A. Drug/Cell-line Browser: interactive canvas visualization of cancer drug/cell-line viability assay datasets. Bioinformatics 2014 Nov 15;30(22):3289-90. PMID: 25100688
  31. Ma'ayan A, Duan Q. A blueprint of cell identity. Nature Biotechnology 2014 Oct;32(10):1007-8. PMID: 25299921
  32. Ma'ayan A, Rouillard AD, Clark NR, Wang Z, Duan Q, Kou Y. Lean Big Data integration in systems biology and systems pharmacology. Trends in Pharmacological Sciences 2014 Sep;35(9):450-60. PMID: 25109570
  33. Duan Q, Flynn C, Niepel M, Hafner M, Muhlich JL, Fernandez NF, Rouillard AD, Tan CM, Chen EY, Golub TR, Sorger PK, Subramanian A, Ma'ayan A. LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures. Nucleic Acids Research 2014 Jul;42(Web Server issue):W449-60. PMID: 24906883