The Journey towards learning about Data Science, is interesting and mind-cracking. Where to start and what are the topics to be covered, and many more questions to be answered. Trying to demystify these details as much as possible. These will not be the exhaustive ones and there will be many more and different approaches.


Topics to be read

1. Basics of Statistics
2. Distributions
·               Binomial distribution
·              Poisson distribution
·             T- distribution
3. Regression Models
·               Linear regression
·              Logistic regression



1. Statistics:
a. Is a way to get information from data.
     b. Statistics occurs only when there is Variance or Variability

2. Descriptive Statistics : There is no inference or predictions

3. Inferential Statistics : have attached numerical values

4. Population sample is denoted by Greek Alphabets σ

5. Sample Population is denoted by English Alphabets s

6. Measure of Central Tendency:
  a. mean
  b. median
  c. mode

7. Median: The one value of X, when data is arranged in array. The mid-point which  divides the population into two equal halves.
  a. Weighted Mean

8. Skewed Distribution
  a. Right skewed
  b. Left skewed

9.Kurtois: Degree to which a distribution is peaked.

10. Central Limit Theorem will convert any Non-Normal distribution into Normal distribution.

11. Quartile
There are 3 Quartile’s present
Inter Quartile Range = Q3- Q1

12. Coefficient of Variance = Sigma / Mu

13. Sum of Squared Deviations =  (X-Xbar)²,

14. Statistical Methods
     a. Linear regression
     b. Logistic regression

15. Linear regression Y = mX + C

16. Types of Variables
·       Qualitative
·       Quantitative
o    Discrete
o   Continuous
17. Binomial means : Two outcomes – 1 0r 0 ; True or False.

18. Permutations and combinations
·       Permutations means arrangement
·       How many ways you can arrange “r” from “n” objects.

19. Probability
·       For decision making there is no probability, here it is whether YES or NO.
·       Only when there is likelihood of Outcome probability occurs.

20. (A and B)  = Joint occurrence. Event A must occur and event B must occur
·       (A or B) = Any event can occur or both event can occur  

21. Addition Theorem : P(A+B) = P(A) +P(B) - P(A-B)

22. Multiplication Theorem: P(AB)= P(A) * P(B/A)
·       P(B/A) = Probability B given A.
·       Given P(A) has happened.

23. Probability
·       Classical approach: When the outcomes are known and equally distributed.
o   Example Throw of Dices.
·       Frequency approach: Using Survey and Experiment
o   To know the mileage of  cars.
·       Subjective approach:
o   Like Man landing on moon: - before probability was zero.

24. Probability Tree

25. Bayes’ Theorem:
Changing the  probability value of the event Leads to Bayes’ theorem.

26. Binomial Distribution
x-axis is Variable
y-axis is Probability

Parameters of Binomial are “n” & “p”.

27. Normal distribution
·       Bell Curve
·       Mean = Medium = Mode at Peak
·       Asymptotic -  the curve doesn’t touches the x-axis on both the ends
·       Symmetric
·       Sum under the Curve is 1.
·       Std. Normal Curve :- z=(x-Mu)/sigma
28. Central Limit Theorem:
·       When you  plot the X-bar Mean of Sampling it is always a normal distribution.
·       Recommended to take 30 samples to make a data set  into normal distribution

29. Hypothesis
·       Hypothesis means assumptions
·       H0  (equal to )Mu0
·       H1  (not equal to) Mu1
·       When evidence is  strong you reject H0

·       When evidence is weak you are unable to reject H0

   2-Tail test of Hypothesis 

30.  Type 1 Error and Type 2 Error.
                  Type 1 error = Alpha error
                  Type 2 error = beta error

Not Guilty

Type 2 error
Not Guilty

Type 1 error

Type 1 error has to be minimised
Type 1 error alpha = 0.05% allowed in model

31.  Table Distributions
Z -  table distribution = large populations

T - Table distribution = unknown populations

32. ANOVA = ANalysis Of  VAriance

Two Ways ANOVA :-
Type of car Brands : brand 1, brand 2, brand 3, brand 4.
Here the these are called as LEVELS = 4
The parameters against which these are compared = Battery, Mileage
Here these two are called as FACTORS = 2.

Now There are 2 factors and 4 levels.  Since there are 2 factors it is called by  TWO Way ANOVA.

33.  F- Distribution
                  SSB         = Sum of square b/w columns            
                  SSW        = Sum of squares within columns
                  MSB        = Mean square b/w columns
                  MSW         = Mean square within columns
        dF            = degree of freedom


(F – distribution) F = MSB/MSW



34. Simple linear regression
                  Linear means a straight line
Y = m X + c
m = slope
Y = Dependent 

c = intercept



