Tuesday, October 29, 2019

Inferential Stats


Notes
  • Normal Distribution
    • Most naturally occurring phenomena follow a normal distribution i.e. a bell curve
    • The bell curve has many interesting features
    • The mean & mode coincide
    • The 1-2-3 rule
      • The probability of any value lying within 1 standard deviation (sigma σ) from the mean (mu µ) is 68% (i.e µ+1 σ or µ-1 σs).
      • The probability of any value lying within 2 standard deviations (sigma) from the mean (mu) is 95% (i.e µ+2 σ or µ-2 σ).
      • The probability of any value lying within 3 standard deviations (sigma) from the mean (mu) is 99.7% (i.e µ+3 σ or µ-3 σ).
    • As a convention, a random variable's (X) value is specified in terms of its distance from the mean µ in units of std deviation σ i.e., (X-µ)/σ units - this is called Z, the standardised normal variable.
      e.g if, say, µ is 35 and σ is 5, X value of 43.5  would be (43.5 - 35)/5 units = 1.7σ away from mean. Hence Z score is 1.7
    • (Ugly) Alternative to Z Score table
       (Z)=12πZet22dt
    • or in Excel, NORM.S.DIST(z, TRUE/FALSE) where
      TRUE means find the cumulative probability, FALSE means find the probability density.
  • Sampling
    • When dealing with large populations, it is more feasible to quantify the characteristics of the population by using "samples" of the population.
    • Working with the samples, we can approximate/extrapolate the characteristics of the original, full population
    • Sampling distributions play an important part in understanding the "margin of error" and our confidence to closeness to the population when inferring chracteristics.
    • A sampling distribution starts to approach/resemble a normal distribution at a sampling size (n) of 30.
  •  The Central Limit Theorem (CLT)
    • For any kind of data (regardless of how it is distributed viz. normal, skewed, uniform etc.), the following properties hold true, provided a high number of samples has been taken :
      1. Sampling distribution's mean (µ⎺X) = Population mean (µ)
      2. Sampling distribution's standard deviation (Standard error) = σ / √n
      3. For n > 30, the sampling distribution becomes a normal distribution.

    No comments:

    Post a Comment