To use all functions of this page, please activate cookies in your browser.
my.bionity.com
With an accout for my.bionity.com you can always see everything at a glance – and you can configure your own website and individual newsletter.
- My watch list
- My saved searches
- My saved topics
- My newsletter
BiclusteringBiclustering, co-clustering, or two-mode clustering[1] is a data mining technique that allows simultaneous clustering of the rows and columns of a matrix. The term was first introduced by Mirkin[2] (recently by Cheng and Church[3] in gene expression analysis), although the technique was originally introduced much earlier [2] (i.e., by J.A. Hartigan[4]). Given a set of m rows in n columns (i.e., an m×n matrix), the biclustering algorithm generates biclusters - a subset of rows that exhibit similar behavior across a subset of columns, or vice versa. Additional recommended knowledge
ComplexityThe complexity of the biclustering problem depends on the exact problem formulation, and particularly on the merit function used to evaluate the quality of a given bicluster. However most interesting variants of this problem are NP-complete requiring either large computational effort or the use of lossy heuristics to short-circuit the calculation. Type of BiclusterDifferent biclustering algorithms have different definitions of bicluster. They are:
AlgorithmsThere are many biclustering algorithm developed for bioinformatics, including: Block clustering, CTWC, ITWC, δ-bicluster, δ-pCluster, δ-pattern, FLOC, OPC, Plaid Model, OPSMs, Gibbs, SAMBA, Robust Biclustering Algorithm (RoBA), Crossing Minimization, cMonkey[5], PRMs and DCC. Biclustering algorithms have also been proposed and used in other application fields under the names coclustering, biodimentional clustering, and subspace clustering[6]. Some recent algorithms have attempted to include additional support for biclustering rectangular matricies in the form of other datatypes. One such algorithm, cMonkey, has been recently developed and applied to several systems-biology datasets. There is an ongoing debate about how to judge the results of these methods, as biclustering allows overlap between clusters and some algorithms allow the exclusion of hard to reconcile columns/conditions. Not all of the available algorithms are deterministic and you need to pay attention to the degree to which results represent stable minima. Because this is an unsupervised classification problem, the lack of gold standard makes it difficult to spot errors in the results. One approach is to utilize multiple biclustering algorithms, with majority or super-majority voting amongst them deciding the best result. Another way is to analyse the quality of shifting and scaling patterns in biclusters[7]. See also
References
|
|
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Biclustering". A list of authors is available in Wikipedia. |