Publication: An ensemble approach for multi-label classification of item click sequences
No Thumbnail Available
Date
2015
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computing Machinery, Inc acmhelp@acm.org
Abstract
In this paper, we describe our approach to RecSys 2015 chal-lenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform pre-dictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance prob-lem, and the volume of data. We report our findings with re-spect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are dis-cussed. The solution is efficient and practical in commodity computers. © 2017 Elsevier B.V., All rights reserved.
