Clustering
- Unsupervised learning wherein you try to identify patterns in data without being given a pre-determined set of labels.
- Clustering considerations / Underlying philosophy
- Stability - class of members should not change if clustering algo is run multiple times
- Inter-group heterogeneity & intra-group homogeneity is important.
- Members of different segments/cluster should have distinctly different behaviour/traits
- Members of the same cluster/segment should have similar behaviour/trait.
- K-Means
- Customer Segmentation (typical criteria)
- Behavioral (based on customer actions)
- Attitudinal (based on customer intention e.g. brand-consciousness)
- Demographic (could be a good substitute/shortcut for Behavioral)
- Common practice of customer clustering is to use RFM as a triad of features on which to base clustering
- R - Recency (how recent have been a customer's purchases/interactions)
- F - Frequency (frequency of buying)
- M - Monetary value of purchases.
- Other segmentation practices
- RPI
- Relationship
- Persona (e.g gift-giver based on the fact that a person orders and mostly ships to other addresses)
- Intention (can be discerned based on browsing pattern)
- CDJ ( Consumer Decision Journey)
- Hierarchical
- No need to pre-determine number of clusters
- Start with each point being a cluster and then iterate through to form one big cluster
- (Needs more processing power since it is more time-consuming)
- In the process create a dendrogram of the points at each step of cluster
- Clusters meeting at a higher point are more dissimilar
- Determine number of clusters by drawing a horizontal cut-off point across the dendrogram. Number of intersection points gives the # of clusters
- Cut-off line is somewhat arbitrary. Can be done bottom-up, called, agglomerative or top-down, called divisive.
- Linkage is the distance between points of one cluster to another in the process of fusing and creating clusters
- Single-linkage (take min distance between inter-cluster points)
- Complete-linkage (take max distance b/w inter-cluster points)
- Average-linkage (take avg distance b/w inter-cluster points)
- Clustering Choice Considerations
No comments:
Post a Comment