Data Mining & Computational Biology


Softwares to detect cancer related genes and biomedical literatures
1. Wavelet-based Identification of DNA focal genomic aberrations(Hur, Y. et al., (2011) BMC Bioinformatics)
2. Integration of MicroRNA, mRNA, and Protein Expression Data for the Identification of Cancer-Related MicroRNAs(Seo, J. et al., (2017) PLoS One)
3. Prioritizing cancer-related microRNAs by integrating microRNA and mRNA datasets(Jin, D. et al., (2016) Scientific Reports)
4. An Integrative Model for Identification of Key Players of the Cancer Network(Amgalan, B. et al., (2018) Science Direct)
5. CGMMC : A Computational Approach to Identifying Gene-microRNA Modules in Cancer(Jin, D. et al., (2015) PLoS Comput Biol)
6. WIFA-Seq : Identification of cancer-driver genes in focal genomic aberrations from whole genome sequencing data(Jang, H. et al., (2016) Scientific Reports)
7. WIFA-X : Identification of cancer-driver genes in focal genomic aberrations from whole exome sequencing data(Jang, H. et al., (2018) Bioinformatics)
8. Wabico : Multi-resolution GC bias correction and its application to copy number alteration identification(Jang, H. et al., (2019) Bioinformatics)
9. Disease gene chemical search engine with evidence sentence pairs(Kim, J. et al., (2019) PLoS Comput Biol)
10. Prediction of survival and recurrence of pancreatic cancer by integrating multi-omics data(Baek, B. et al., (2019))
11. A joint deep semi-NMF method for learning integrative representation of molecular signals in blood samples from patients with Alzheimer's disease(Moon, S. et al., (2020))
12. Supervised Feature Extraction Learning using Triplet loss for drug response prediction with multi-omics(Park, S. et al., (2020))
13. Integration of heterogeneous information to predict drug-target interactions for unseen drugs (Soh, J. et al., (2021))

Bioinformatics corpus & softwares
1. CAM (Combining Array CGH and Microarray gene expression data from multiple samples)
2. OCPID (Organ Centric Protein Interaction Database)
3.1. Paper outline & additional files - Assessment of contribution of genomic data sources to predictiong protein functions(Ko, S. et al., (2009) BMC Bioinformatics)
3.2. Informative genomic data sets for the computational function predictions of Mus musculus
4. Disease Gene Identification by Integrating Domain Interactions and Mutations in the Proteins
5. Numerical Doses in LLLT and EA literatures
6. Voting based cancer module identification by combining topological and data-driven properties(Azad, A.K.M. et al., (2013) PLoS One)
7. Herb-Disease relation corpus
8. Plant corpus
9. HerDing: herb recommendation system to treat diseases using genes and chemicals(Choi, W. et al., (2016) Database-The Journal of Biological Databases and Curation (Oxford))
10. A corpus for plant-chemical relationships in the biomedical domain(Choi, W. et al., (2016) BMC bioinformatics)
11. An entity name normalization method for biomedical articles: application to diseases and plants
12. In silico re-identication of properties of drug target proteins(Kim, B. et al., (2017) BMC Bioinformatics)
13. Phenotype corpus
14. Drug, Herbs, Health functional food and Prescription - phenotype relationship corpus
15. Biomedical Named Entity Recognition Using Deep Neural Networks with Contextual Information