Tuesday, October 29, 2019

Logistic Regression



Logistic Regression
  • Used for Classification problems
    • In industry, it is the binomial logistic regression technique that is used more compared to multinomial. Even for cases where you need to predict more than 2 classes, people tend to break it down into multiple binary/binomial models.
    • Advantages of logistic regression over other techniques such as Support Vector Method (SVM), Neural Network, Random Forest, Gradient Boosting, Deep Learning (DL) etc. is that 
      • it is easier to interpret and articulate the logistic model.
      • the final outcome has a linear relationship to the log (ln) of odds. Linearity is much easier to understand & explain.
    • Logistic Regression Model considerations
      • Sample Selection 
      1. Seasonal fluctuations. Get data that covers all fluctuations
      2. Representative: You want to get data that pertains to the type of population on which you are predicting.
      3. Rare incidence population: Rares/Low incidence events - stratify so there is no imbalance.
      • Segmentation
      1. The overall, combined predictive power of multiple segments of the population is greater than a single model 
      2. For each segment, predictive variables are likely to be different. 
      • Variable transformations (not generally part of the overall statistical approach to buid logistic regression models) 
        1. Dummy variables (for categorical variables; or even continuous variables with "binning")
        2. Bin the values and use Weight of Evidence (WoE) 
          • WOE = ln (% of good / % of bad)
            i.e., ln ( # of good in the bin/ Total # of good)
                       minus
                  ln (# of bad in the bin / Total # of bad)
          • Ensure the binning is such that there is a logical trend discernible across the WOE values for the bins.
          • Ensure IV (indicative of predictive power) is high; IV = WOE * (proportion of good - proportion of bad)
        3. Interaction variables (need a very good knowledge of the business domain for this)
        4. Mathematical transformation (x^2, x^3, log etc.) - but hard to explain to business.
        5. PCA (Principal Component Analysis) - very elegant, good predictive power, hard to explain

    No comments:

    Post a Comment