Hand, Chris and Fitkov-Norris, Elena (2023) Not seeing the wood for the trees? The effect of class imbalance and noise on random forests classification accuracy. In: FBSS Research Conference 2023; 30 June 2023, Kingston upon Thames, U.K.. (Unpublished)
Abstract
Machine learning algorithms are increasingly attracting attention from management and marketing researchers due to their predictive accuracy. There is, however, an increasing awareness of the limitations of these methods, particularly when they are faced with unbalanced samples and noisy data. Random Forests (RF), a machine learning classification algorithm has grown in popularity due to its learning capacity and has even been described as the best “off-the-shelf” algorithm. Thus, it is becoming more important for researchers to know how class imbalance (i.e.one of the categories in the target variable being much less prevalent than other) and the amount of noise in the data affect RF classification accuracy. The aim of this study is to determine whether these influences operate independently or if the incidence of one affects the severity of the other. Our results show that as expected both noise and sample imbalance affect classification accuracy, and in particular affect classification accuracy of the minority class. However, the results also show that these two effects are not independent of each other and classification accuracy worsens when the algorithm is faced with data which is both noisy and unbalanced, compared to when dealing with data which is either noisy or unbalanced. The findings have implications for evaluating random forest performance, and for strategies for reducing the effects of sample imbalance.
Actions (Repository Editors)
![]() |
Item Control Page |