Systematic comparison of RNA-Seq normalization methods using measurement error models.
MOTIVATION: Further advancement of RNA-Seq technology and its application call for the development of effective normalization methods for RNA-Seq data. Currently, different normalization methods are compared and validated by their correlations with a certain gold standard. Gene expression measurements generated by a different technology or platform such as Real-time reverse transcription polymerase chain reaction (qRT-PCR) or Microarray are usually used as the gold standard. Although the current approach is intuitive and easy to implement, it becomes statistically inadequate when the gold standard is also subject to measurement error (ME). Furthermore, the current approach is not informative, because the correlation of a normalization method with a certain gold standard does not provide much information about the exact quality of the normalized RNA-Seq measurements. RESULTS: We propose to use the system of ME models based on qRT-PCR, Microarray and RNA-Seq gene expression data to compare and validate RNA-Seq normalization methods. This approach does not assume the existence of a gold standard. The performance of a normalization method can be characterized by a group of parameters of the system, which are referred to as the performance parameters, and these performance parameters can be consistently estimated. Different normalization methods can thus be compared by comparing their corresponding estimated performance parameters. We applied the proposed approach to compare five existing RNA-Seq normalization methods using the gene expression data of two RNA samples from the microArray Quality Control and Sequencing Quality Control projects and gained much insight about the pros and cons of these methods.