CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Statistical significance for hierarchical clustering in genetic association and microarray expression studies. Export

BMC Bioinformatics, Vol. 4, No. 1. (11 December 2003)

Citation Format

[Posts]

View FullText article


textoris's tags for this article

cluster puces_adn stat transcriptome

X Reviews [Write a review of this article]

X Notes for this article

textoris has 1 private note and 0 public notes for this article. If you are textoris then you can log in to see the private note.

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

BACKGROUND: With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large numbers of loci. It is then natural to group observations into smaller numbers of classes that allow for an easier overview and interpretation of the data. This grouping is often carried out in multiple steps with the aid of hierarchical cluster analysis, each step leading to a smaller number of classes by combining similar observations or classes. At each step, either implicitly or explicitly, researchers tend to interpret results and eventually focus on that set of classes providing the "best" (most significant) result. While this approach makes sense, the overall statistical significance of the experiment must include the clustering process, which modifies the grouping structure of the data and often removes variation. RESULTS: For hierarchically clustered data, we propose considering the strongest result or, equivalently, the smallest p-value as the experiment-wise statistic of interest and evaluating its significance level for a global assessment of statistical significance. We apply our approach to datasets from haplotype association and microarray expression studies where hierarchical clustering has been used. CONCLUSION: In all of the cases we examine, we find that relying on one set of classes in the course of clustering leads to significance levels that are too small when compared with the significance level associated with an overall statistic that incorporates the process of clustering. In other words, relying on one step of clustering may furnish a formally significant result while the overall experiment is not significant.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.