Cautions About The Reliability Of Pairwise Gene Correlations Based On Expression Data
Document Type
Article
Publication Date
6-26-2015
Publication Source
Frontiers In Microbiology
Volume Number
6
First Page
650
ISSN
1664-302X
Abstract
Background: Rapid growth in the availability of genome-wide transcript abundance levels through gene expression microarrays and RNAseq promises to provide deep biological insights into the complex, genome-wide transcriptional behavior of single-celled organisms. However, this promise has not yet been fully realized. Results: We find that computation of pairwise gene associations (correlation mutual information) across a set of 2782 total genome-wide expression samples from six diverse bacteria produces unexpectedly large variation in estimates of pairwise gene association regardless of the metric used, the organism under study, or the number and source of the samples. We pinpoint the cause to sampling bias. In particular, in repositories of expression data (e.g., Gene Expression Omnibus, GEO), many individual genes show small differences in absolute gene expression levels across the set of samples. We demonstrate that these small differences are due mainly to "noise" instead of "signal" attributable to environmental or genetic perturbations. We show that downstream analysis using gene expression levels of genes with small differences yields biased estimates of pairwise association. Conclusions: We propose flagging genes with small differences in absolute, RMA-normalized, expression levels (e.g., standard deviation less than 0.5), as potentially yielding biased pairwise association metrics. This strategy has the potential to substantially improve the confidence in genome-wide conclusions about transcriptional behavior in bacterial organisms. Further work is needed to further refine strategies to identify genes with small difference in expression levels prior to computing gene-gene association metrics.
Recommended Citation
Repository citation: Powers, Scott; DeJongh, Matt; Best, Aaron A.; and Tintle, Nathan L., "Cautions About The Reliability Of Pairwise Gene Correlations Based On Expression Data" (2015). Faculty Publications. Paper 1346.
https://digitalcommons.hope.edu/faculty_publications/1346
Published in: Frontiers In Microbiology, Volume 6, June 26, 2015, pages 650-.