Appendix of DigSee


Please cite the use of DigSee in your work as:
  - DigSee: Disease Gene Search Engine with Evidence sentences (version cancer). Jeongkyun Kim, Seongeun So, Heejin Lee, Jong C. Park, Jung-Jae Kim, and Hyunju Lee. Nucleic acids research (2013)

 - Feature Keyword List -
  Cancer keyword (Total 12 terms)
    Adenocarcinoma, cancer, carcinoma, malignance, malignancy, malignant, metastatic, neoplasm, neoplastic, tissue, tumor, tumour

  Hallmark keyword (Total 6 terms)
    Angiogenesis, apoptosis, growth, invasion, metastasis, proliferation

  Troponyms of "Study" (Total 53 terms)
    Analogise, analogize, anatomize, appraise, assay, audit, bioassay, canvass, check, circularise, circularize, consider, consult, contemplate, diagnose, enquire, examine, examine, experiment, explore, explore, inquire, inspect, investigate, investigate, name, observe, pioneer, plumb, poll, probe, prospect, reexamine, re-examine, re-explore, refer, research, retried, retry, review, screen, scrutinise, scrutinize, search, sieve, sift, study, survey, trace, tried, try, view, watch

  Negation term (Total 5 terms)
    Hardly, never, no, not, yet

  to-infinitive phrase (Total 3 phrases)
    To assess, to determine, to find

  Cellular Component (Total 244 terms)
    actin cytoskeleton, actin filament, adherens junction, anchored to membrane, anchored to plasma membrane, apical junction complex, apical part of cell, apical plasma membrane, apicolateral plasma membrane, axon, basal lamina, basement membrane, basolateral plasma membrane, brush border, cell cortex, cell cortex part, cell fraction, cell junction, cell matrix junction, cell projection, cell projection part, cell soma, cell substrate adherens junction, cell surface, centrosome, chromatin, chromatin remodeling complex, chromosomal, chromosomal part, chromosome, chromosomepericentric region, clathrin coated vesicle, coated membrane, coated vesicle, coated vesicle membrane, collagen, condensed chromosome, condensed nuclear chromosome, contractile fiber, contractile fiber part, cornified envelope, cortical actin cytoskeleton, cortical cytoskeleton, cytoplasm, cytoplasmic, cytoplasmic membrane bound vesicle, cytoplasmic part, cytoplasmic vesicle, cytoplasmic vesicle membrane, cytoplasmic vesicle part, cytoskeletal, cytoskeletal part, cytoskeleton, cytosol, cytosolic, cytosolic part, dendrite, dna directed rna polymerase complex, dna directed rna polymeraseii core complex, dna directed rna polymeraseii holoenzyme, dystrophin associated glycoprotein complex, early endosome, endocytic vesicle, endomembrane system, endoplasmic reticulum, endoplasmic reticulum lumen, endoplasmic reticulum membrane, endoplasmic reticulum part, endosome, envelope, er golgi intermediate compartment, eukaryotic translation initiation factor 3 complex, external side of plasma membrane, extracellular matrix, extracellular matrix part, extracellular region, extracellular region part, extracellular space, extrinsic to membrane, extrinsic to plasma membrane, focal adhesion, golgi apparatus, golgi apparatus part, golgi associated vesicle, golgi membrane, golgi stack, growth cone, heterogeneous nuclear ribonucleoprotein complex, histone deacetylase complex, immunological synapse, insoluble fraction, integral to endoplasmic reticulum membrane, integral to golgi membrane, integral to membrane, integral to organelle membrane, integral to plasma membrane, integrator complex, integrin complex, intercalated disc, intercellular junction, intermediate filament, intermediate filament cytoskeleton, intracellular non membrane bound organelle, intracellular organelle, intracellular organelle part, intrinsic to endoplasmic reticulum membrane, intrinsic to golgi membrane, intrinsic to membrane, intrinsic to organelle membrane, intrinsic to plasma membrane, kinesin complex, kinetochore, lamellipodium, late endosome, leading edge, lipid raft, lysosomal membrane, lysosome, lytic vacuole, macromolecular complex, mediator complex, membrane, membrane bound vesicle, membrane coat, membrane enclosed lumen, membrane fraction, membrane part, microbody, microbody membrane, microbody part, microsome, microtubule, microtubule associated complex, microtubule cytoskeleton, microtubule organizing center, microtubule organizing center part, microvillus, mitochondrial, mitochondrial envelope, mitochondrial inner membrane, mitochondrial lumen, mitochondrial matrix, mitochondrial membrane, mitochondrial membrane part, mitochondrial outer membrane, mitochondrial part, mitochondrial respiratory chain, mitochondrial respiratory chain complex i, mitochondrial ribosome, mitochondrial small ribosomal subunit, mitochondrion, myofibril, myosin complex, nadh dehydrogenase complex, neuron projection, nicotinic acetylcholine gated receptor channel complex, non membrane bound organelle, nuclear, nuclear body, nuclear chromatin, nuclear chromosome, nuclear chromosome part, nuclear dna directed rna polymerase complex, nuclear envelope, nuclear envelope endoplasmic reticulum network, nuclear lumen, nuclear matrix, nuclear membrane, nuclear membrane part, nuclear part, nuclear pore, nuclear replication fork, nuclear speck, nuclear ubiquitin ligase complex, nucleolar, nucleolar part, nucleolus, nucleoplasm, nucleoplasm part, nucleus, oligosaccharyl transferase complex, organellar ribosome, organellar small ribosomal subunit, organelle, organelle envelope, organelle inner membrane, organelle lumen, organelle membrane, organelle outer membrane, organelle part, outer membrane, perinuclear region of cytoplasm, peroxisomal, peroxisomal membrane, peroxisomal part, peroxisome, plasma membrane, plasma membrane part, pml body, pore complex, proteasome complex, protein complex, protein serine threonine phosphatase complex, proteinaceous extracellular matrix, proton transporting two sector atpase complex, receptor complex, replication fork, respiratory chain complex i, ribonucleoprotein complex, ribosomal subunit, ribosome, rna polymerase complex, ruffle, sarcomere, secretory granule, site of polarized growth, small nuclear ribonucleoprotein complex, small ribosomal subunit, soluble fraction, spindle, spindle microtubule, spindle pole, spliceosome, synapse, synapse part, synaptic vesicle, tight junction, trans golgi network, trans golgi network transport vesicle, transcription factor complex, transcription factor tfiid complex, transport vesicle, u12 dependent spliceosome, ubiquitin ligase complex, vacuolar, vacuolar membrane, vacuolar part, vacuole, vesicle, vesicle coat, vesicle membrane, vesicular fraction, voltage gated calcium channel complex, voltage gated potassium channel complex

 - Event removed keyword list -
  Every events
    One letter alphabet or numbers (except -, +, %, <, >, =)

  Binding event
    "imaging", "alterations", "index", "-LRB-", "leukemia", "after", "OS", "virus", "unique", "sarcoma", "II", "5E", "Yp", "Zr", "TRUE", "i.e.", "CD68", "Compare", "spindle", "previously", "unrecognised", "LPP", "whereas", "22q", "After", "C\", "46\", "We", "department", "young", "especially", "boy", "how", "Does", "3-15", "get", "girl", "Age", "of", "CT", "LR", "cAMP", "Ad5\", "6q", "20q", "254", "2MT", "VEGF", "IV", "also", "--", "EC", "day", "PET"

  Gene expression event
    ",13", ",2", "1\", "11q13\", "17\", "175H", "18p", "1D-NMR", "1-expressing", "1q", "2,2-diphenylpropane", "2\", "20-expression", "2138insG", "220C", "245D", "245S", "248Q", "248W", "266E", "273H", "277-283", "3\", "36\", "3956G", "4\", "4C", "5\", "56-residue", "5-fluorouracil", "8\", "894G\", "A\", "aa", "AAs", "c.", "Cr\", "de", "DOSXYZ", "DU145", "-LRB-"

  Localization event
    evidence sentences which have chromosome location are removed
    "disease", "resection", "telomerase", "LOC284999", "CD56", "tcrbeta", "tcralpha", "canx", "tlr", "adj", "18q", "5-FU", "ml", "rs12450550", "Her2\", "else", "DSLET", "CAF", "PTEN", "and", "album", "eating", "Seven", "-", "%", "+", "<", "=", ">"

  Phosphorylation event
    "formation", "Organization", "form", "presence", "focus", "mg\\", "lack", "use", "rates", "evaluation", "Group", "Target", "ablation", "portion", "Positivity", "Hospital", "loss", "copies", "modulation", "direct", "effectors", "actions", "importance", "cycles", "cycle", "subset", "risk", "most", "safe", "uptake", "Which", "for", "by", "RELA", "type", "majority"

  Protein catabolism event
    "grade", "graded", "grades", "grading", "age", "Grade", "knowledge"

  Regulation event
    "\", "1:4,000,000", "1\", "1127_1128dupAT", "A", "a", "Why", "wishes", "years"

  Transcription event
    "described", "derived", "using", "describe", "description", "prescribed", "transition", "contribution", "describes", "directed", "composed", "present", "use", "prescription", "describing", "-LRB-", "-RRB-", "analyzed", ":", "analysis", "that", "used", "assay", "assayed", "derive", "ErbB2", "prescribing", "Using", "with", "A", "cm", "Described", "edition", "function", "Nutrition", "part", "SW480", "4", "activated", "analyze", "BACKGROUND", "RyR1", "usefulness", "-2", "12", "495", ",", "ability", "activator", "addition", "Analysis", "and", "approved", "are", "descriptions", "m", "make", "n", "NB", "prescribe", "presented", "SB203580", "to", "were", "x", "acetylation", "activation", "answering", "arm", "ascribed", "background", "Contribution", "Describing", "Description", "direction", "H.", "had", "IELs", "K-ras", "LOH", "M0", "presenting", "these", "those", "total", "XL"

 - Gold standard sentences -

Download gold standard sentences file


 - Top 10 sentences -
    Breast cancer, Glioblastoma, Pancreatic cancer, Prostate cancer
    link of top 10 sentences

 - More detail Table -


Download detail table file


 - Tested the Bayesian classifier using different numbers of training data -


Figure S1. Performance of individual features with the varying ratio of training data in terms of F-measure (a) and AUC scores (b).

    To check whether the training data used are large enough to build robust models for the identified features, we tested the Bayesian classifier using different numbers of training data. We selected different ratios of the training sentences from 10% to 90% as shown in Figure S1. Remaining sentences were used as a testing set. For each ratio of training sentences, ten different sets of training sentences were constructed, and the average accuracies were measured. From the results reported in Figure S1, we found that the relative order of feature performance is not related to the size of training data, and that the performances of the features do not change so much although still slightly increasing with a large number of data. Some features including 'Event depth' show decreasing performance graph as the training data increase. The released version of the DigSee system involves all these features.
Gwangju Institute of Science and Technology
Data Mining & Computational Biology Lab