Tuesday, October 29, 2019

Clustering



  • Unsupervised learning wherein you try to identify patterns in data without being given a pre-determined set of labels.
  • Common clustering algos
    • K-Means
    • Hierarchical
    • Clustering considerations / Underlying philosophy
      • Stability - class of members should not change if  clustering algo is run multiple times
      • Inter-group heterogeneity & intra-group homogeneity is important. 
        • Members of different segments/cluster should have distinctly different behaviour/traits
        • Members of the same cluster/segment should have similar behaviour/trait.
      • K-Means
          • Customer Segmentation (typical criteria)
            • Behavioral (based on customer actions)
            • Attitudinal (based on customer intention e.g. brand-consciousness)
            • Demographic (could be a good substitute/shortcut for Behavioral)
            • Common practice of customer clustering is to use RFM as a triad of features on which to base clustering
              • R - Recency (how recent have been a customer's purchases/interactions)
              • F - Frequency (frequency of buying)
              • M - Monetary value of purchases.
            • Other segmentation practices
              • RPI 
                • Relationship  
                • Persona (e.g gift-giver based on the fact that a person orders and mostly ships to other addresses)
                • Intention (can be discerned based on browsing pattern)
              • CDJ ( Consumer Decision Journey)
                • Use a "Funnel" structure
        • Hierarchical
          • No need to pre-determine number of clusters
          • Start with each point being a cluster and then iterate through to form one big cluster
          • (Needs more processing power since it is more time-consuming)
          • In the process create a dendrogram of the points at each step of cluster
          • Clusters meeting at a higher point are more dissimilar
          • Determine number of clusters by drawing a horizontal cut-off point across the dendrogram. Number of intersection points gives the # of clusters
          • Cut-off line is somewhat arbitrary. Can be done bottom-up, called, agglomerative or top-down, called divisive.
          • Linkage is the distance between points of one cluster to another in the process of fusing and creating clusters
            • Single-linkage (take min distance between inter-cluster points)
            • Complete-linkage (take max distance b/w inter-cluster points)
            • Average-linkage (take avg distance b/w inter-cluster points)
        • Clustering Choice Considerations

          No comments:

          Post a Comment