A simpler method of preprocessing MALDI-TOF MS data for differential biomarker analysis: stem cell and melanoma cancer studies.
Raw spectral data from matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) with MS profiling techniques usually contains complex information not readily providing biological insight into disease. The association of identified features within raw data to a known peptide is extremely difficult. Data preprocessing to remove uncertainty characteristics in the data is normally required before performing any further analysis. This study proposes an alternative yet simple solution to preprocess raw MALDI-TOF-MS data for identification of candidate marker ions. Two in-house MALDI-TOF-MS data sets from two different sample sources (melanoma serum and cord blood plasma) are used in our study. Raw MS spectral profiles were preprocessed using the proposed approach to identify peak regions in the spectra. The preprocessed data was then analysed using bespoke machine learning algorithms for data reduction and ion selection. Using the selected ions, an ANN-based predictive model was constructed to examine the predictive power of these ions for classification. Our model identified 10 candidate marker ions for both data sets. These ion panels achieved over 90% classification accuracy on blind validation data. Receiver operating characteristics analysis was performed and the area under the curve for melanoma and cord blood classifiers was 0.991 and 0.986, respectively. The results suggest that our data preprocessing technique removes unwanted characteristics of the raw data, while preserving the predictive components of the data. Ion identification analysis can be carried out using MALDI-TOF-MS data with the proposed data preprocessing technique coupled with bespoke algorithms for data reduction and ion selection.