Faculty Publications

Evaluating Methods for the Analysis of Rare Variants in Sequence Data

Alexander Luedtke, Brown University
Scott Powers, University of North Carolina at Chapel Hill
Ashley Petersen, St. Olaf College
Alexandra Sitarik, Wittenberg University
Airat Bekmetjev, Hope CollegeFollow
Nathan Tintle, Dordt College

Document Type

Conference Proceeding

Publication Date

11-29-2011

Publication Source

BMC Proceedings

Volume Number

Issue Number

First Page

S119

Publisher

BioMed Central Ltd.

ISSN

1753-6561

Comments

This work is funded by National Human Genome Research Institute grant R15HG004543. We wish to thank Scott DeClaire and Ben Boerema for their participation in early stages of this project. The Genetic Analysis Workshops are supported by National Institutes of Health grant R01 GM031575.

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Abstract

A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data.

Keywords

rare variant statistical methods, sequencing data, causal genes

Recommended Citation

Published in: BMC Proceedings, Volume 5, Issue 9, November 29, 2011, pages S119-. Copyright © 2011 BioMed Central Ltd.. The final published version is available at: http://www.biomedcentral.com/1753-6561/5/S9/S119

Link to Full Text

COinS

DOI

https://doi.org/10.1186/1753-6561-5-S9-S119

Faculty Publications

Evaluating Methods for the Analysis of Rare Variants in Sequence Data

Document Type

Publication Date

Publication Source

Volume Number

Issue Number

First Page

Publisher

ISSN

Comments

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Abstract

Keywords

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Evaluating Methods for the Analysis of Rare Variants in Sequence Data

Authors

Document Type

Publication Date

Publication Source

Volume Number

Issue Number

First Page

Publisher

ISSN

Comments

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Abstract

Keywords

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links