Publication:
An ensemble approach for multi-label classification of item click sequences

dc.contributor.authorYağcı, A. Murat
dc.contributor.authorAytekin, Tevfik
dc.contributor.authorGürgen, Fïkret S.
dc.contributor.institutionYağcı, A. Murat, Department of Computer Engineering, Boğaziçi Üniversitesi, Bebek, Turkey
dc.contributor.institutionAytekin, Tevfik, Department of Computer Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey
dc.contributor.institutionGürgen, Fïkret S., Department of Computer Engineering, Boğaziçi Üniversitesi, Bebek, Turkey
dc.date.accessioned2025-10-05T16:30:25Z
dc.date.issued2015
dc.description.abstractIn this paper, we describe our approach to RecSys 2015 chal-lenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform pre-dictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance prob-lem, and the volume of data. We report our findings with re-spect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are dis-cussed. The solution is efficient and practical in commodity computers. © 2017 Elsevier B.V., All rights reserved.
dc.description.sponsorshipYOOCHOOSE
dc.identifier.conferenceNameInternational ACM Recommender Systems Challenge, RecSys 2015
dc.identifier.conferencePlaceVienna
dc.identifier.doi10.1145/2813448.2813516
dc.identifier.isbn9781450336659
dc.identifier.scopus2-s2.0-84960924553
dc.identifier.urihttps://doi.org/10.1145/2813448.2813516
dc.identifier.urihttps://hdl.handle.net/20.500.14719/12699
dc.language.isoen
dc.publisherAssociation for Computing Machinery, Inc acmhelp@acm.org
dc.subject.authorkeywordsRecommender Systems
dc.subject.authorkeywordsSequence Classification
dc.subject.authorkeywordsWeb Mining
dc.subject.authorkeywordsData Privacy
dc.subject.authorkeywordsDecision Trees
dc.subject.authorkeywordsRecommender Systems
dc.subject.authorkeywordsSales
dc.subject.authorkeywordsVehicle Routing
dc.subject.authorkeywordsClassifier Ensembles
dc.subject.authorkeywordsEnsemble Approaches
dc.subject.authorkeywordsFeature Engineerings
dc.subject.authorkeywordsMulti Label Classification
dc.subject.authorkeywordsPrivacy Preserving
dc.subject.authorkeywordsRandom Forest Classifier
dc.subject.authorkeywordsSequence Classification
dc.subject.authorkeywordsWeb Mining
dc.subject.authorkeywordsClassification (of Information)
dc.subject.indexkeywordsData privacy
dc.subject.indexkeywordsDecision trees
dc.subject.indexkeywordsRecommender systems
dc.subject.indexkeywordsSales
dc.subject.indexkeywordsVehicle routing
dc.subject.indexkeywordsClassifier ensembles
dc.subject.indexkeywordsEnsemble approaches
dc.subject.indexkeywordsFeature engineerings
dc.subject.indexkeywordsMulti label classification
dc.subject.indexkeywordsPrivacy preserving
dc.subject.indexkeywordsRandom forest classifier
dc.subject.indexkeywordsSequence classification
dc.subject.indexkeywordsWeb Mining
dc.subject.indexkeywordsClassification (of information)
dc.titleAn ensemble approach for multi-label classification of item click sequences
dc.typeConference Paper
dcterms.referencesBen-Shimon, David, RecSys challenge 2015 and the YOOCHOOSE dataset, pp. 357-358, (2015), Breiman, Leo, Random forests, Machine Learning, 45, 1, pp. 5-32, (2001), Chawla, Nitesh Vinay, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, pp. 321-357, (2002), Using Random Forest to Learn Imbalanced Data, (2004), Galar, Mikel, A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches, IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 42, 4, pp. 463-484, (2012), Louppe, Gilles C., Understanding variable importances in Forests of randomized trees, Advances in Neural Information Processing Systems, (2013), Pedregosa, Fabián, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, 12, pp. 2825-2830, (2011), Data Mining and Knowledge Discovery Handbook, (2010)
dspace.entity.typePublication
local.indexed.atScopus
person.identifier.scopus-author-id35325932900
person.identifier.scopus-author-id35793449500
person.identifier.scopus-author-id6603953162

Files