Diagnostic Peptide Discovery: Prioritization of Pathogen Diagnostic Markers Using Multiple Features
The availability of complete pathogen genomes has renewed interest in the development of diagnostics for infectious diseases. Synthetic peptide microarrays provide a rapid, high-throughput platform for immunological testing of potential B-cell epitopes. However, their current capacity prevent the experimental screening of complete “peptidomes”. Therefore, computational approaches for prediction and/or prioritization of diagnostically relevant peptides are required. In this work we describe a computational method to assess a defined set of molecular properties for each potential diagnostic target in a reference genome. Properties such as sub-cellular localization or expression level were evaluated for the whole protein. At a higher resolution (short peptides), we assessed a set of local properties, such as repetitive motifs, disorder (structured vs natively unstructured regions), trans-membrane spans, genetic polymorphisms (conserved vs. divergent regions), predicted B-cell epitopes, and sequence similarity against human proteins and other potential cross-reacting species (e.g. other pathogens endemic in overlapping geographical locations). A scoring function based on these different features was developed, and used to rank all peptides from a large eukaryotic pathogen proteome. We applied this method to the identification of candidate diagnostic peptides in the protozoan Trypanosoma cruzi, the causative agent of Chagas disease. We measured the performance of the method by analyzing the enrichment of validated antigens in the high-scoring top of the ranking. Based on this measure, our integrative method outperformed alternative prioritizations based on individual properties (such as B-cell epitope predictors alone). Using this method we ranked 10 million 12-mer overlapping peptides derived from the complete T. cruzi proteome. Experimental screening of 190 high-scoring peptides allowed the identification of 37 novel epitopes with diagnostic potential, while none of the low scoring peptides showed significant reactivity. Many of the metrics employed are dependent on standard bioinformatic tools and data, so the method can be easily extended to other pathogen genomes.