This Letter addresses the statistical significance of structures in random data: given a set of vectors and a measure of mutual similarity, how likely is it that a subset of these vectors forms a cluster with enhanced similarity among its elements? The computation of this cluster p value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple-testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.
SEEK ID: https://fairdomhub.org/publications/111
PubMed ID: 21231375
Projects: Noisy-Strep
Publication type: Not specified
Journal: Phys. Rev. Lett.
Citation:
Date Published: 27th Nov 2009
Registered Mode: Not specified
Views: 4160
Created: 23rd Feb 2011 at 10:44
Last updated: 8th Dec 2022 at 17:26
This item has not yet been tagged.
None