Prognostic gene signatures for patient stratification in breast cancer - accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions
BACKGROUND:Stratification of patients according to their clinical prognosis is a desirable goal in cancer treatment in order to achieve a better personalized medicine. Reliable predictions on the basis of gene signatures could support medical doctors on selecting the right therapeutic strategy. However, during the last years the low reproducibility of many published gene signatures has been criticized. It has been suggested that incorporation of network or pathway information into prognostic biomarker discovery could improve prediction performance. In the meanwhile a large number of different approaches have been suggested for the same purpose.RESULTS:In this work we compared 14 published classification approaches (8 using network information) on six public breast cancer datasets with respect to prediction accuracy and gene selection stability. A gene set enrichment analysis for the predictive biomarker signatures by each of these methods was done to show the association with disease related genes, pathways and known drug targets. We found that on average incorporation of pathway information or protein interaction data did not significantly enhance prediction performance, but indeed greatly interpretability of gene signatures. Some methods (specifically network-based SVMs) could greatly enhance gene selection stability, but revealed only a comparably low prediction accuracy. Specifically network-based SVMs, performed well in terms of gene selection stability, whereas Reweighted Recursive Feature Elimination (RRFE) lead to very clearly interpretable signatures.CONCLUSION:The results indicated that no single algorithm to perform best with respect to all three categories in our study. Incorporating network of prior knowledge into gene selection methods in general did not significantly improve classification accuracy, but greatly interpretability of gene signatures compared to classical algorithms.