Piccolo, S.R. et al. Proc. Natl. Acad. Sci. USA 110, 17778–17783 (2013).

To get the most out of the vast collection of public gene expression data, one should be able to compare values in any data set. But different data generation methods introduce unique biases, making it possible to assess only relative expression within a platform. Gene expression barcodes have been used to estimate absolute expression from microarray data by normalizing values against a large reference expression data set. Piccolo et al. extend this idea with their universal expression code (UPC) algorithm, which models noise according to genomic base composition and target-region length and estimates transcriptional activity using a mixture model. UPC values range from 0 to 1 and can be directly compared across microarray and RNA sequencing platforms. The method does not need a large reference panel, and new data sets can be added incrementally.