SegmentologyTM Report: September Issue

 

Beyond the Clouds: From Clustering to Latent Class Modeling

Segmentology

The primary objective of any segmentation solution is to identify well-differentiated customer groups that are informative for business strategy and marketing applications. For a customer segmentation to be accepted throughout an organization, all components of the customer relationship must be addressed: Value, Behavior, Lifestage/lifestyles, Attitudes, Price Sensitivity, and Competitive Environment. Converting the necessary information for all of these customer dimensions into useful segments presents a non-trivial challenge that cannot be met with standard segmentation methodologies.

We will first explore how standard clustering techniques are generally used for segmentation, the issues that arise as a result, and how latent class modeling provides a more appropriate and statistically valid approach for a multi-faceted segmentation, where underlying dimensions provide the building blocks necessary to create the final groups.

A typical approach to customer segmentation is a clustering algorithm (k-means is common) where all available pieces of information are transformed into continuous and discrete numeric variables. A "distance algorithm" is used to create a pre-defined number of groups based on how "close" individuals are to one another (based on similarities in their data elements). The process is generally as follows:

  • First all variables are transformed, recoded, and/or standardized depending on the type and distribution. If the data consists of a large number of variables, the data is generally collapsed into factors using principal component analysis.
  • The resulting factor scores are both continuous and orthogonal. They are most easily clustered into segments using the traditional k-means approach.
  • Cluster analysis works very well in these cases where
    • Descriptive variables are continuous
    • Variables are all of the same type: continuous (factor scores for example), binary indicators (0/1 flags), ranks (1=low, 2=middle, 3=high)
    • Variables are normally distributed

There can be substantial issues when the data does not exactly fit the conditions stated above. One common occurrence is the "big" cluster phenomenon. This occurs when there are many very small segments with very specific behaviors, and one extremely large group classifying 60% or more of the base into an "Average" cluster. The large segment provides no insights, and the other groups are too small to provide critical mass for specific marketing and measurement activities.

How do we recommend addressing these issues?
SegmentologyTM is based on many different aspects of the customer relationship (behavior, price sensitivity, product type, life stage, and attitudes). However, even in approaches that focus primarily on product or behavior, it is extremely unlikely that the data will be fit for a standard clustering approach. We utilize Latent Class Modeling because it is the most appropriate technique in these situations.

What is Latent Class Modeling?
Latent Class Model (LCM) predicts an unobserved behavior from a series of multivariate variables (typically discrete). It is called a latent class model because the behavior being predicted is unobserved (latent) and discrete (class, or segment). The solution is based on conditional probabilities that determine the likelihood of an individual being classified into a segment.

Latent Class Modeling is a statistical approach used to predict an unobserved variable. That is, the destination segments into which we are attempting to classify customers are not yet defined. This differs from, say, a discriminant analysis, where customers are mapped into pre-defined groups (responder, non-responder, for example).

Though LCM is similar to cluster analysis in that it attempts to classify customers into undefined segments, there are many differences in the process: As opposed to starting with the "kitchen sink" and entering hundreds of variables into the process, prior analysis is leveraged by utilizing the dimensions created in the Segmentology process.

  • Each dimension represents a different facet of the relationship, similar to the factors in a clustering process.
  • Whereas factor scores are continuous in nature making cluster analysis a suitable technique, dimensions cannot always be transformed in this manner. For example:
    • Value Groups can be meaningfully coded as 1, 2, 3, 4, …where 4 > 3 > 2 > 1. This allows for a continuous representation of the dimension.
    • Product Groups of A, B, C and D are not hierarchical in nature and cannot be recoded into a continuous variable.
  • Each dimension is treated as a discrete variable in the process allowing for the algorithms to identify the meaningful relationships between them.

Why is LCM more appropriate than standard cluster analysis?
Input variables can be categorical, not restricted to continuous or ordinal types. Variables can also be of mixed types - they need not be all discrete or all continuous. This is important when building a segmentation across multiple dimensions. Furthermore, LCM does not rely on the assumption that there be normally distributed predictors.

With LCM, observations are given a probability of membership in each segment, providing analytic and implementation flexibility. This probabilistic approach to assigning segments results in solutions that:

  • eliminate the "fuzziness" of cluster analysis, where customers grouped based on being "near" one another across a series of variables with a somewhat arbitrary distance calculation;
  • provide a more statistically significant solution defined by:
    • more within-segment homogeneity and
    • better cross-segment heterogeneity.
  • can be easily described. Membership is based on exact values of the dimensions, not "random" points of division in continuous variables. That is, customers are placed into different groups because one is in Product Group A, and the other is in Product Group B, not because one had an average spend of $33.47 and the other $33.51;
  • are more appropriate for tactical marketing efforts. Because the groups are created based on specific dimension values, marketers know better exactly who they are targeting when they are differentiating offers or creative messaging.

The goal of segmentation is to discover distinct subsets of the customer base that look, think, and behave similarly. Latent Class Modeling is the technique that best identifies clearly defined segments when building a multi-dimensional customer segmentation. This allows for the business, primarily marketing and strategy, to better focus on desirable relationships, improve customized communications, and optimally manage and track the overall health and growth of the customer portfolio.