Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed

Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741

Browse

Search Results

Now showing 1 - 3 of 3
  • Publication
    A generalization of Shannon's Mutual information for improved feature selection in databases involving possibly rare but well-predictable classes
    (2008) Kursun, Olcay; Kursun, Olcay, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey
    Feature selection is a critical step in many Artificial Intelligence and Pattern Recognition problems. Shannon's Mutual Information is a classical and widely used measure of dependence measure that can serve as a good feature selection algorithm. However, rare yet important regularities can be overlooked by this measure, which can create critical misses (false negatives) especially in the applications of biomedical fields, in which it is assumed that many factors contribute but in small amounts to the target function to learn. We propose a new measure of relevance which accounts for predictability of each signal from the other in the calculations which improves the feature detection capability. Also, this measure, in its formulation, turns out to be a generalization of Shannon's Mutual Information. © 2013 Elsevier B.V., All rights reserved.
  • Publication
    A hybrid method for feature selection based on mutual information and canonical correlation analysis
    (2010) Sakar, C. Okan; Kursun, Olcay; Sakar, C. Okan, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey; Kursun, Olcay, Department of Computer Engineering, Istanbul Üniversitesi, Istanbul, Turkey
    Mutual Information (MI) is a classical and widely used dependence measure that generally can serve as a good feature selection algorithm. However, under-sampled classes or rare but certain relations are overlooked by this measure, which can result in missing relevant features that could be very predictive of variables of interest, such as certain phenotypes or disorders in biomedical research, rare but dangerous factors in ecology, intrusions in network systems, etc. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose Predictive Mutual Information (PMI), a hybrid measure of relevance not only is based on MI but also accounts for predictability of signals from one another as in KCCA. We show that PMI has more improved feature detection capability than MI and KCCA, especially in catching suspicious coincidences that are rare but potentially important not only for subsequent experimental studies but also for building computational predictive models which is demonstrated on two toy datasets and a real intrusion detection system dataset. © 2010 IEEE. © 2010 Elsevier B.V., All rights reserved.
  • Publication
    Prediction of protein sub-nuclear location by clustering mRMR ensemble feature selection
    (2010) Sakar, C. Okan; Kursun, Olcay; Şeker, Hüseyin; Gürgen, Fïkret S.; Sakar, C. Okan, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey; Kursun, Olcay, Department of Computer Engineering, Istanbul Üniversitesi, Istanbul, Turkey; Şeker, Hüseyin, Department of Informatics, De Montfort University, Leicester, United Kingdom; Gürgen, Fïkret S., Department of Computer Engineering, Boğaziçi Üniversitesi, Bebek, Turkey
    In many applications of pattern recognition in the bioinformatics and biomedical fields, input variables are organized into natural partitions that are called views in the literature. Mutual information can be used in selecting a minimal yet capable subset of views. Ignoring the presence of views, dismantling them, and treating their variables intermixed along with those of others at best results in a complex uninterpretable predictive system for researchers in these fields. Moreover, it would require measuring or computing majority of the views. We use the clustering indices of the views and rank the views according to the unique information they have with the target using minimum redundancy-maximum relevance (mRMR) approach. We also propose an ensemble approach to reduce the random variations in clusterings. © 2010 IEEE. © 2010 Elsevier B.V., All rights reserved.