Untangling the transcriptome from fungus-infected plant tissues
The development of sequencing technology allows low-cost generation of sequence data. The huge amount of raw sequence data now available has introduced many challenges associated with analysis of these large-scale data banks. For example, it is very important to distinguish materials of plant and fungal origin in fungus-infected plant tissue. The origin of transcripts that were sequenced from Library 895-M6 (poplar tissue infected by Marssonina brunnea) on Illumina/Solexa GA IIx was determined by combining three methods: (1) based on the taxonomic information of homologous sequences; (2) based on the reference genome sequence; (3) based on the transcriptome sequence of the host and its pathogen obtained from Library 895 (poplar) and Library M6 (M. brunnea) as well as Library 895-M6 (mixture of poplar and M. brunnea). We identified accurately the origin of 80,978 (99.5%) contigs in the mixed poplar and M. brunnea sample (Library 895-M6) by integrating the results from the three methods. The results of this study demonstrate that a combination of these three approaches described here is an effective strategy for determining the origin of sequences in a mixed pool, and provides a basis for further transcriptome analysis of the mixed sample. âº We combined three methods to identify the origin of mixed plant-pathogen sequences. âº ESTs of > 99% in the mixed pools were partitioned into two parts of plant and fungus. âº For sequences of > 1kb, GC content was quite different between plant and pathogen.