********************************************************************************************************************************* * ReadMe File * * ------------- * * KPOCN: Key players on cancer-activated multi-type interaction network * * * * Bayarbaatar Amgalan*, Department of Applied Mathematics, School of Engineering and Applied Sciences, * * National University of Mongolia, Ulaanbaatar, Mongolia * * Ider Tseveendorj, LMV Laboratory, UVSQ, Universite Paris-Saclay, Paris, France * * Hyunju Lee* Data Mining & Computational Biology Lab, Gwangju Institute of Science & Technology, * * Gwangju, South Korea * * http://combio.gist.ac.kr/ * * * ********************************************************************************************************************************* I. List of input files: ================================================================================================================================ = 1. "GeneOnPPI.xlsx" [a list of genes selected for analysis] The genes in PPI network is included in the analysis. = = 2(A). "DataCancerEXP.xlsx" [a gene expression data is provided]: a matrix each column labeled with gene symbol represents = gene and each row represents samples. = = 2(B). "DataNormalEXP.xlsx" [a gene expression data is provided]: a matrix each column labeled with gene symbol represents = gene and each row represents samples. = = 3(A). "DataCancerCN.xlsx" [a copy number data is provided]: a matrix each column labeled with gene symbol represents gene = and each row represents samples. = = 3(B). "DataNormalCN.xlsx" [a copy number data is provided]: a matrix each column labeled with gene symbol represents gene = and each row represents samples. = = 4. "PPI.xlsx" [a protein-protein interaction data is provided]: a list of pairs in Protein-Protein interaction network. = = ================================================================================================================================ = Note: All analysis was done based on a machine with following specifications: = = --------------------------------------------------------------------------------------------------------------------------- = = Windows 7 64-bit operating system, Intel(R) Core(TM) i7-3770 CPU (3.4GHz, 32 GB RAM) = ================================================================================================================================ II List of Source Codes: ================================================================================================================================ 1. "SparceMatrix.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = =============== = | - For each gene, it estimates the incoming effects from the other genes my solving a convex optimization problem. = | = | It calls the procedures and functions as follows Adjacency.m and FindNeighbors.m. = | = | Input: (Input files are included in "Breast cancer data .rar") = | ===== = | all of inputs as mentioned above "GeneOnPPI.xlsx", "DataCancerEXP.xlsx", "DataNormalEXP.xlsx", "DataCancerCN.xlsx", = | "NormalCN.xlsx" and "PPI.xlsx" = = | = | Output: = | ======= = | a) A partial correlation matrix "PartialCorrMATRIX.txt" = | b) A partial covariance matrix "PartialCovMATRIX.txt" = -------------------------------------------------------------------------------------------------------------------------------= 2. "FindNeighbors.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= ---------------------------------------------------------------------------------------------------------------------------------- | Description: = | ============ = | - For a gene, obtain the incoming effect from other genes (neighbors) in the entire network. = | "FindNeighbors.m" is called by "SparceMatrix.m". = -------------------------------------------------------------------------------------------------------------------------------= 3. "ConstructForest.m": ================================================================================================================================ ---------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | - Compute the cancer driver score for each gene throughout the entire network. = = | = | Input: (Input files are included in "Breast cancer data .rar") = | ===== = | a) "PartialCovMatrixCN.txt" = | b) "PartialCovMatrixEXP.txt" = | c) "PartialCovMatrixCNandEXP.txt" = = = = | d) "GeneOnPPI.xlsx" = | = | Output: = | ======= = | a) A list of dominator genes (most likely cancer-drivers) as "DriverGeneScore.xlsx". = -------------------------------------------------------------------------------------------------------------------------------= 4. "Adjacency.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | - Construct PPI network from a list of pair genes = | = | "Adjacency.m" is called by "SparceMatrix.m" = -------------------------------------------------------------------------------------------------------------------------------= 5. "MDImatrix.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | -Construct Multi-Type Interaction Gene Network by finding the optimal projection of three weighted matrices onto a single = | weight matrix. = | "MDImatrix.m" is called by "ConstructForest.m". The inputs are three weight matrices and the output is a MDI matrix = -------------------------------------------------------------------------------------------------------------------------------= 6. "TreeScore.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | -Compure the score of effect on the downstream targets for a gene = | "TreeScore.m" is called by "ConstructForest.m" = -------------------------------------------------------------------------------------------------------------------------------= 7. "L1_projection.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | - Find optimal projection of a point onto L-one norm ball. = | "L1_projection.m" is called by "FindNeighbors.m" = -------------------------------------------------------------------------------------------------------------------------------= 8. "f.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | - Compute the least square objective function value and its gradient vector value. = | "f.m" is called by "FindNeighbors.m" = -------------------------------------------------------------------------------------------------------------------------------= 9. "dfs.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | - Find distance between two genes in the network using depth first search and it is used to discribe maximal distance = | target from the regulator = | "dfs.m" is called by "ConstructForest.m" = -------------------------------------------------------------------------------------------------------------------------------= 10. "sparse_to_csr.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | - Convert a sparse matrix into compressed row storage arrays. it is uded in computing depth first search distances. = | "sparse_to_csr.m" is called by "dfs.m" = -------------------------------------------------------------------------------------------------------------------------------= ================================================================================================================================ 11. "CenterD.m": ================================================================================================================================ -------------------------------------------------------------------------------------------------------------------------------= | Description: = | ============ = | - For each gene, the distributions of cancer and normal samples are respectively centered and so that the genetic changes = | betwee normal and cancer condition is measured as strandard deviations over all cancer and normal samples = | = | "CenterD.m" is called by "SparseMatrix.m" = -------------------------------------------------------------------------------------------------------------------------------= ================================================================================================================================