DATA ANALYTICS - SCIENCE
The Journey towards learning about Data Science, is interesting and mind-cracking. Where to start and what are the topics to be covered, and many more questions to be answered. Trying to demystify these details as much as possible. These will not be the exhaustive ones and there will be many more and different approaches.
Disclaimer:
- This document contains unedited notes and has not been formally proofread.
- The information provided in this document is intended to provide a basic understanding of certain technologies.
- Please exercise caution when visiting or downloading from websites mentioned in this document and verify the safety of the website and software.
- Some websites and software may be flagged as malware by antivirus programs.
- The document is not intended to be a comprehensive guide and should not be relied upon as the sole source of information.
- The document is not a substitute for professional advice or expert analysis and should not be used as such.
- The document does not constitute an endorsement or recommendation of any particular technology, product, or service.
- The reader assumes all responsibility for their use of the information contained in this document and any consequences that may arise.
- The author disclaim any liability for any damages or losses that may result from the use of this document or the information contained therein.
- The author and publisher reserve the right to update or change the information contained in this document at any time without prior notice.
****************************************************
Topics to be read
1. Basics of Statistics
2. Distributions
· Binomial distribution
· Poisson distribution
· T- distribution
3. Regression Models
· Linear regression
· Logistic regression
******************************************************
Statistics
1. Statistics:
a. Is a way to get information from data.
b.
Statistics occurs only when there is Variance or Variability
2. Descriptive Statistics : There is no
inference or predictions
3. Inferential Statistics : have attached
numerical values
4. Population sample is denoted by Greek
Alphabets σ
5. Sample Population is denoted by English
Alphabets s
6. Measure of Central Tendency:
a.
mean
b.
median
c.
mode
7. Median: The one value of X, when data is
arranged in array. The mid-point which
divides the population into two equal halves.
a.
Weighted Mean
8. Skewed Distribution
a.
Right skewed
b.
Left skewed
9.Kurtois: Degree to which a distribution is
peaked.
10. Central Limit Theorem will convert any
Non-Normal distribution into Normal distribution.
11. Quartile
There are 3 Quartile’s present
Q1
Q2
Q3
Inter Quartile Range = Q3- Q1
12. Coefficient of Variance = Sigma / Mu
13. Sum of Squared Deviations = (X-Xbar)²,
14. Statistical Methods
a.
Linear regression
b.
Logistic regression
15. Linear regression Y = mX +
C
16. Types of Variables
· Qualitative
· Quantitative
o
Discrete
o
Continuous
17. Binomial means : Two outcomes – 1 0r 0 ;
True or False.
18. Permutations and combinations
· Permutations
means arrangement
· How
many ways you can arrange “r” from “n” objects.
19. Probability
· For
decision making there is no probability, here it is whether YES or NO.
· Only
when there is likelihood of Outcome probability occurs.
20. (A and B) = Joint occurrence. Event
A must occur and event B must occur
· (A
or B) = Any event can occur or both event can occur
21. Addition Theorem : P(A+B) = P(A) +P(B) - P(A-B)
22. Multiplication Theorem: P(AB)= P(A) *
P(B/A)
· P(B/A)
= Probability B given A.
· Given
P(A) has happened.
23. Probability
· Classical
approach: When the outcomes are known and equally distributed.
o
Example Throw of Dices.
· Frequency
approach: Using Survey and Experiment
o
To know the mileage of cars.
· Subjective
approach:
o
Like Man landing on moon: - before probability
was zero.
25. Bayes’ Theorem:
Changing the
probability value of the event Leads to Bayes’ theorem.
26. Binomial Distribution
x-axis is Variable
y-axis is Probability
Parameters of Binomial are “n” & “p”.
27. Normal distribution
· Bell
Curve
· Mean
= Medium = Mode at Peak
· Asymptotic
- the curve doesn’t touches the x-axis
on both the ends
· Symmetric
· Sum
under the Curve is 1.
· Std.
Normal Curve :- z=(x-Mu)/sigma
28. Central Limit Theorem:
· When
you plot the X-bar Mean of Sampling it is
always a normal distribution.
· Recommended
to take 30 samples to make a data set into normal distribution
29. Hypothesis
· Hypothesis
means assumptions
· H0 (equal to )Mu0
· H1
(not equal to) Mu1
· When
evidence is strong you reject H0
· When
evidence is weak you are unable to reject H0
2-Tail test of Hypothesis
30.
Type 1 Error and Type 2 Error.
Type
1 error = Alpha error
Type
2 error = beta error
|
Guilty
|
Not Guilty
|
|
Guilty
|
yes
|
x
|
Type 2 error
|
Not Guilty
|
x
|
yes
|
|
|
Type 1 error
|
|
|
Type 1 error has to be minimised
Type 1 error alpha = 0.05% allowed in model
31. Table Distributions
Z - table distribution = large populations
T - Table
distribution = unknown populations
32. ANOVA = ANalysis Of VAriance
Two Ways ANOVA :-
Example:
Type of car Brands : brand 1, brand 2, brand
3, brand 4.
Here the these are called as LEVELS = 4
The parameters against which these are
compared = Battery, Mileage
Here these two are called as FACTORS = 2.
Now There are 2 factors and 4 levels. Since there are 2 factors it is called by TWO Way ANOVA.
33. F-
Distribution
SSB
= Sum of square b/w columns
SSW
= Sum of squares within columns
MSB
= Mean square b/w columns
MSW
= Mean square within columns
dF = degree of freedom
MSB = SSB/ dF
MSW = SSW/dF
(F – distribution) F = MSB/MSW
******************************************************
Regression
34. Simple linear regression
Linear
means a straight line
Y = m X + c
m = slope
Y = Dependent
c = intercept
#
Comments
Post a Comment