Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed
Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741
Browse
5 results
Search Results
Publication Metadata only A generalization of Shannon's Mutual information for improved feature selection in databases involving possibly rare but well-predictable classes(2008) Kursun, Olcay; Kursun, Olcay, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, TurkeyFeature selection is a critical step in many Artificial Intelligence and Pattern Recognition problems. Shannon's Mutual Information is a classical and widely used measure of dependence measure that can serve as a good feature selection algorithm. However, rare yet important regularities can be overlooked by this measure, which can create critical misses (false negatives) especially in the applications of biomedical fields, in which it is assumed that many factors contribute but in small amounts to the target function to learn. We propose a new measure of relevance which accounts for predictability of each signal from the other in the calculations which improves the feature detection capability. Also, this measure, in its formulation, turns out to be a generalization of Shannon's Mutual Information. © 2013 Elsevier B.V., All rights reserved.Publication Metadata only A hybrid method for feature selection based on mutual information and canonical correlation analysis(2010) Sakar, C. Okan; Kursun, Olcay; Sakar, C. Okan, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey; Kursun, Olcay, Department of Computer Engineering, Istanbul Üniversitesi, Istanbul, TurkeyMutual Information (MI) is a classical and widely used dependence measure that generally can serve as a good feature selection algorithm. However, under-sampled classes or rare but certain relations are overlooked by this measure, which can result in missing relevant features that could be very predictive of variables of interest, such as certain phenotypes or disorders in biomedical research, rare but dangerous factors in ecology, intrusions in network systems, etc. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose Predictive Mutual Information (PMI), a hybrid measure of relevance not only is based on MI but also accounts for predictability of signals from one another as in KCCA. We show that PMI has more improved feature detection capability than MI and KCCA, especially in catching suspicious coincidences that are rare but potentially important not only for subsequent experimental studies but also for building computational predictive models which is demonstrated on two toy datasets and a real intrusion detection system dataset. © 2010 IEEE. © 2010 Elsevier B.V., All rights reserved.Publication Metadata only Prediction of protein sub-nuclear location by clustering mRMR ensemble feature selection(2010) Sakar, C. Okan; Kursun, Olcay; Şeker, Hüseyin; Gürgen, Fïkret S.; Sakar, C. Okan, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey; Kursun, Olcay, Department of Computer Engineering, Istanbul Üniversitesi, Istanbul, Turkey; Şeker, Hüseyin, Department of Informatics, De Montfort University, Leicester, United Kingdom; Gürgen, Fïkret S., Department of Computer Engineering, Boğaziçi Üniversitesi, Bebek, TurkeyIn many applications of pattern recognition in the bioinformatics and biomedical fields, input variables are organized into natural partitions that are called views in the literature. Mutual information can be used in selecting a minimal yet capable subset of views. Ignoring the presence of views, dismantling them, and treating their variables intermixed along with those of others at best results in a complex uninterpretable predictive system for researchers in these fields. Moreover, it would require measuring or computing majority of the views. We use the clustering indices of the views and rank the views according to the unique information they have with the target using minimum redundancy-maximum relevance (mRMR) approach. We also propose an ensemble approach to reduce the random variations in clusterings. © 2010 IEEE. © 2010 Elsevier B.V., All rights reserved.Publication Metadata only A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy-Maximum Relevance filter method(2012) Sakar, C. Okan; Kursun, Olcay; Gürgen, Fïkret S.; Sakar, C. Okan, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey; Kursun, Olcay, Department of Computer Engineering, Istanbul Üniversitesi, Istanbul, Turkey; Gürgen, Fïkret S., Department of Computer Engineering, Boğaziçi Üniversitesi, Bebek, TurkeyIn this paper, we propose a feature selection method based on a recently popular minimum Redundancy-Maximum Relevance (mRMR) criterion, which we called Kernel Canonical Correlation Analysis based mRMR (KCCAmRMR) based on the idea of finding the unique information, i.e. information that is distinct from the set of already selected variables, that a candidate variable possesses about the target variable. In simplest terms, for this purpose, we propose using correlated functions explored by KCCA instead of using the features themselves as inputs to mRMR. We demonstrate the usefulness of our method on both toy and benchmark datasets. © 2011 Elsevier Ltd. All rights reserved. © 2011 Elsevier B.V., All rights reserved.Publication Metadata only A method for combining mutual information and canonical correlation analysis: Predictive Mutual Information and its use in feature selection(2012) Sakar, C. Okan; Kursun, Olcay; Sakar, C. Okan, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey; Kursun, Olcay, Department of Computer Engineering, Istanbul Üniversitesi, Istanbul, TurkeyFeature selection is a critical step in many artificial intelligence and pattern recognition problems. Shannon's Mutual Information (MI) is a classical and widely used measure of dependence measure that serves as a good feature selection algorithm. However, as it is a measure of mutual information in average, under-sampled classes (rare events) can be overlooked by this measure, which can cause critical false negatives (missing a relevant feature very predictive of some rare but important classes). Shannon's mutual information requires a well sampled database, which is not typical of many fields of modern science (such as biomedical), in which there are limited number of samples to learn from, or at least, not all the classes of the target function (such as certain phenotypes in biomedical) are well-sampled. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose a hybrid measure of relevance, Predictive Mutual Information (PMI) based on MI, which also accounts for predictability of signals from each other in its calculation as in KCCA. We show that PMI has more improved feature detection capability than MI, especially in catching suspicious coincidences that are rare but potentially important not only for experimental studies but also for building computational models. We demonstrate the usefulness of PMI, and superiority over MI, on both toy and real datasets. © 2011 Elsevier Ltd. All rights reserved. © 2011 Elsevier B.V., All rights reserved.
