Use of Autoencoder and One-Hot Encoding for Customer Segmentation

Tomasz Smutek, Jan Sikora, Sylwester Bogacki, Marek Rutkowski, Dariusz Wozniak
European Research Studies Journal, Volume XXVIΙ, Special Issue 2, 72-82, 2024
DOI: 10.35808/ersj/3388


Purpose: The research article aims to apply Autoencoder and One-Hot Encoding techniques to the segmentation of retail customers, exploring how these methodologies can contribute to a more refined and actionable segmentation process. Design/Methodology/Approach: The study uses a dataset comprising detailed profiles of 2240 retail customers, applying Autoencoders and One-Hot Encoding to categorize customers into distinct segments. It evaluates Autoencoders' embeddings and compares them with the traditional One-Hot Encoding method. The effectiveness of the segmentation is further analyzed using various clustering algorithms, including K-means, DBSCAN, Louvain Community Detection, Greedy Modularity, and Label Propagation. The research assesses clustering quality using indices such as the Caliński-Harabasz, Davies-Bouldin, and modularity metrics. Findings: Application of the Louvain method with a cut-off parameter of 0.75 using AutoEmbedder revealed three evenly distributed customer groups, albeit with slightly lower Caliński-Harabasz and Davies-Bouldin index values than those obtained by the Greedy method using AutoEmbedder with a cut-off parameter of 0.5. However, the Louvain method exhibited higher modularity, indicating more cohesive segmentation. Comparisons between AutoEmbedder and One-Hot Encoding suggested the superiority of AutoEmbedder in forming customer clusters. Practical Implications: The findings present actionable insights for marketing strategists to develop targeted campaigns based on customer expenditure patterns. By identifying customer segments with similar attributes, businesses can allocate marketing resources more effectively and tailor strategies to meet the specific needs of each segment. Originality/Value: The article introduces a novel comparison between Autoencoder embeddings and traditional One-Hot Encoding in the context of customer segmentation, providing evidence of the former's enhanced capability in creating more meaningful and modular customer groups. It also extends the discussion on clustering quality assessment in the segmentation process, adding value to marketing analytics.

Cite Article (APA Style)