Informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography and high-resolution mass spectrometry (GCxGC–HRMS)
This paper describes informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography (GCxGC) and high-resolution mass spectrometry (HRMS). GCxGC–HRMS analysis produces large data sets that are rich with information, but highly complex. The size of the data and volume of information requires automated processing for comprehensive cross-sample analysis, but the complexity poses a challenge for developing robust methods. The approach developed here analyzes GCxGC–HRMS data from multiple samples to extract a feature template that comprehensively captures the pattern of peaks detected in the retention-times plane. Then, for each sample chromatogram, the template is geometrically transformed to align with the detected peak pattern and generate a set of feature measurements for cross-sample analyses such as sample classification and biomarker discovery. The approach avoids the intractable problem of comprehensive peak matching by using a few reliable peaks for alignment and peak-based retention-plane windows to define comprehensive features that can be reliably matched for cross-sample analysis. The informatics are demonstrated with a set of 18 samples from breast-cancer tumors, each from different individuals, six each for Grades 1–3. The features allow classification that matches grading by a cancer pathologist with 78% success in leave-one-out cross-validation experiments. The HRMS signatures of the features of interest can be examined for determining elemental compositions and identifying compounds.