Browsing by Author "Erdil, Ertunç"

Now showing 1 - 1 of 1

Aggregating advantages of a set of clusterings into a final clustering using object-wise similarity graph
(Bahçeşehir Üniversitesi Fen Bilimleri Enstitüsü, 2011-06) Erdil, Ertunç; Mimaroğlu, Selim Necdet
Clustering is the process of grouping objects that are similar, where similarity between objects is usually measured by a distance metric. Clustering is a hard problem since the natural grouping of a data set is unknown. Clustering aims to divide a data set into meaningful groups where each group formed by a clustering method is referred as a cluster. Clustering is a useful starting point for different purposes such as data understanding and summarization. In the literature, there are numerous applications of clustering ranging from biology to economics. Clustering has a long and rich history in a variety of scientific fields. Themain contributing research areas to clusteringmethodology areMachine Learning, DataMining, and Pattern Recognition. Each clustering technique possess some advantages and disadvantages. Some clustering algorithms may even require input parameters which strongly affect the outcome. Some clustering techniques make some assumptions about the properties of the data sets and good quality clusterings are obtained, when the assumption holds. Distance metric also plays an important role in the process of producing a clustering. Especially in high dimensional data sets, it is hard to identify similarity or distance between objects. In most cases, it is not possible to choose the best distance metric, the best clustering method, and the best input parameter values for an input data set. Therefore, multiple clusterings can be obtained on a data set. And, multiple clusterings can be combined into a new and better quality final clustering. In this thesis, we propose a graph based combining multiple clusterings algorithm that is scalable, robust, and intuitive. Combiningmultiple clusterings requires reusing preexisting knowledge and producing a novel final clustering having better overall quality. Our new algorithm, COMUSA, works on an object-wise weighted similarity graph which is constructed by using the evidence accumulated from multiple input clusterings. COMUSA offers good quality final clusterings by working at object level in a short amount of time. Extensive experimental evaluations on some very challenging real, synthetically generated and gene expression data sets from a diverse set of domains establish the usefulness of our methods in terms of both quality and execution time.