Evaluation of Agreement-Related E-mail Classification Models with Unbalanced Classes
Purpose: The aim of the research is to evaluate the effectiveness of classification models of agreement-related emails with imbalanced classes, which allows for a more comprehensive assessment of their performance under severely imbalanced data and a better understanding of their behaviour in practical applications. Design/Methodology/Approach: The following machine learning classification methods have been used: Complement Naive Bayes, Logistic Regression, Random Forest, and Support Vector Machine. Findings: This research evaluated the effectiveness of classification models for agreement-related emails with imbalanced classes. Random Forest and Support Vector Machine achieve high values for both Accuracy and balanced Accuracy, demonstrating their strong classification performance. Practical Implications: Random Forest and Support Vector Machine can be implemented in intelligent information systems for a mail dispatcher. Correspondence can be automatically routed to the person responsible for handling the inquiry. This speeds up the process and minimises the risk of an inquiry being overlooked or left unanswered. Originality/Value: Despite a large body of research on email classification, there is still a lack of studies focused on specific applications, such as agreement document classification. In particular, it is rare to simultaneously examine different models and compare their performance using multiple metrics within a single real-world problem.