Publication:
A method for combining mutual information and canonical correlation analysis: Predictive Mutual Information and its use in feature selection

dc.contributor.authorSakar, C. Okan
dc.contributor.authorKursun, Olcay
dc.contributor.institutionSakar, C. Okan, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey
dc.contributor.institutionKursun, Olcay, Department of Computer Engineering, Istanbul Üniversitesi, Istanbul, Turkey
dc.date.accessioned2025-10-05T16:43:02Z
dc.date.issued2012
dc.description.abstractFeature selection is a critical step in many artificial intelligence and pattern recognition problems. Shannon's Mutual Information (MI) is a classical and widely used measure of dependence measure that serves as a good feature selection algorithm. However, as it is a measure of mutual information in average, under-sampled classes (rare events) can be overlooked by this measure, which can cause critical false negatives (missing a relevant feature very predictive of some rare but important classes). Shannon's mutual information requires a well sampled database, which is not typical of many fields of modern science (such as biomedical), in which there are limited number of samples to learn from, or at least, not all the classes of the target function (such as certain phenotypes in biomedical) are well-sampled. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose a hybrid measure of relevance, Predictive Mutual Information (PMI) based on MI, which also accounts for predictability of signals from each other in its calculation as in KCCA. We show that PMI has more improved feature detection capability than MI, especially in catching suspicious coincidences that are rare but potentially important not only for experimental studies but also for building computational models. We demonstrate the usefulness of PMI, and superiority over MI, on both toy and real datasets. © 2011 Elsevier Ltd. All rights reserved. © 2011 Elsevier B.V., All rights reserved.
dc.identifier.doi10.1016/j.eswa.2011.09.020
dc.identifier.endpage3344
dc.identifier.issn09574174
dc.identifier.issue3
dc.identifier.scopus2-s2.0-80255131236
dc.identifier.startpage3333
dc.identifier.urihttps://doi.org/10.1016/j.eswa.2011.09.020
dc.identifier.urihttps://hdl.handle.net/20.500.14719/13413
dc.identifier.volume39
dc.language.isoen
dc.relation.sourceExpert Systems with Applications
dc.subject.authorkeywordsCanonical Correlation
dc.subject.authorkeywordsGebelein's Maximal Correlation
dc.subject.authorkeywordsImbalanced Datasets
dc.subject.authorkeywordsMutual Information
dc.subject.authorkeywordsStatistical Dependence
dc.subject.authorkeywordsSuspicious Coincidences
dc.subject.authorkeywordsCanonical Correlations
dc.subject.authorkeywordsImbalanced Data-sets
dc.subject.authorkeywordsMaximal Correlation
dc.subject.authorkeywordsMutual Informations
dc.subject.authorkeywordsStatistical Dependence
dc.subject.authorkeywordsSuspicious Coincidences
dc.subject.authorkeywordsArtificial Intelligence
dc.subject.authorkeywordsCorrelation Methods
dc.subject.authorkeywordsFeature Extraction
dc.subject.indexkeywordsCanonical correlations
dc.subject.indexkeywordsImbalanced Data-sets
dc.subject.indexkeywordsMaximal correlation
dc.subject.indexkeywordsMutual informations
dc.subject.indexkeywordsStatistical dependence
dc.subject.indexkeywordsSuspicious coincidences
dc.subject.indexkeywordsArtificial intelligence
dc.subject.indexkeywordsCorrelation methods
dc.subject.indexkeywordsFeature extraction
dc.titleA method for combining mutual information and canonical correlation analysis: Predictive Mutual Information and its use in feature selection
dc.typeArticle
dcterms.referencesProceedings of the International Meeting of the Psychometric Society, (2001), Uci Machine Learning Repository, (2007), Bach, Francis R., Kernel independent component analysis, Journal of Machine Learning Research, 3, 1, pp. 1-48, (2003), Network Computation in Neural Systems, (1996), Annals of Mathematical Statistics, (1962), Neural Networks for Pattern Recognition, (1995), Breiman, Leo, Estimating optimal transformations for multiple regression and correlation, Journal of the American Statistical Association, 80, 391, pp. 580-598, (1985), Bryc, WŁodzimierz, On the maximum correlation coefficient, Theory of Probability and its Applications, 49, 1, pp. 132-138, (2005), Burges, Christopher J.C., A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2, 2, pp. 121-167, (1998), Endres, Dominik Maria, Bayesian bin distribution inference and mutual information, IEEE Transactions on Information Theory, 51, 11, pp. 3766-3779, (2005)
dspace.entity.typePublication
local.indexed.atScopus
person.identifier.scopus-author-id25634712900
person.identifier.scopus-author-id25422067900

Files