Identifying Gene Signatures from Cancer Progression Data Using Ordinal Analysis
A comprehensive understanding of cancer progression may shed light on genetic and molecular mechanisms of oncogenesis, and it may provide much needed information for effective diagnosis, prognosis, and optimal therapy. However, despite considerable effort in studying cancer progressions, their molecular and genetic basis remains largely unknown. Microarray experiments can systematically assay gene expressions across genome, therefore they have been widely used to gain insights on cancer progressions. In general, expression data may be obtained from different stages of the same samples. More often, data were obtained from individuals at different stages. Existing methods such as the Student's t-test and clustering approaches focus on identification of differentially expressed genes in different stages, but they are not suitable for capturing real progression signatures across all progression stages. We propose an alternative approach, namely a multicategory logit model, to identify novel genes that show significant correlations across multiple stages. We have applied the approach on a real data set concerning prostate cancer progression and obtained a set of genes that show consistency trends across multiple stages. Further analysis based on Gene Ontology (GO) annotations, protein-protein interaction networks and KEGG pathways databases, as well as literature search demonstrates that our candidate list not only includes some well-known prostate cancer related genes such as MYC and AMACR, but also consists of novel genes (e.g. CKS2) that have been confirmed by very recent independent studies. Our results illustrate that ordinal analysis of cancer progression data has the potential to obtain a set of promising candidate genes. Such a list can be further prioritized by combining other existing biomedical knowledge to identify therapeutic targets and/or biomarkers of cancer progressions.