Bridging the Gap: Performance Evaluation of Classical and Modern Discretization Techniques in Real Estate Data
Purpose: The aim of this study is to evaluate the effectiveness of selected discretization methods for continuous variables using the example of real estate area—one of the key attributes in property analysis. The study addresses the challenge of transforming continuous variables into discrete forms with minimal information loss, a process crucial for data mining, statistical modeling, and classification tasks. Design/Methodology/Approach: Thirteen discretization methods were applied to a dataset of 3,732 residential real estate listings from the Szczecin housing market between 2017 and 2021. The methods include classical approaches with predefined class parameters (equal width and equal frequency), expert-driven methods, quantile-based techniques, clustering (k-means), and supervised learning approaches such as entropy minimization and 1R. The evaluation criteria included the deviation of grouped results from ungrouped data (arithmetic mean difference and loss function), and the number of classes, treated as a nominant. A linear ordering technique (Hellwig’s method) was used to rank the methods. Findings: The method based on expert-defined class width (Method 4) showed the highest consistency with the original data, followed by Scott’s rule (Method 2) and the entropy-based supervised method (Method 11). Contrary to expectations, quantile-based methods and commonly used rules such as Freedman–Diaconis or square-root yielded unsatisfactory results, either due to oversimplification (too few intervals) or excessive granularity (too many classes). Practical Implications: The results underline the importance of selecting discretization methods tailored to the characteristics of the variable and research context. In particular, they demonstrate the value of domain expertise in guiding discretization decisions in real estate analytics, improving data quality for downstream analysis such as classification, segmentation, or regression. Originality/Value: This study is one of the first to systematically compare a broad spectrum of discretization methods in the context of real estate data. It introduces a comprehensive evaluation framework combining statistical accuracy and interpretability. The findings contribute to both methodological development in data preprocessing and practical decision-making in real estate market research.