*********************************************************************************
Statistics means most of likely event.
35. Two
Types of problems
Regression:
is a continuous variable, which is influenced by dependent variables.
36.
Conditional Mean:
Separating the height
from boys and girls.
· Conditional models have least predicted
error than the mean average representation
37. Conditional
predictive analysis
38. Simple linear regression
Linear means a straight line
Actual y
= structure y1 + error1
39. Correlation:
always lies between “ -1 to +1”
· r = 0.3 – low correlation
· r = 0.6 – medium correlation
· r = 0.8 – high correlation
40. Intercept:
- where it cuts’ Y axis
41. Std
Error: -
sum of SD of all the points
· Estimate the standard error
42. How far
is far?
· Here P- Value places the important
role
· P-value indicates farther from zero.
· P-value is small = Confidence
interval is high = Std Error is small
43. F-statistic:
Collective information to give
overall Model P-value.
If Overall
or Collective Model is good, it signifies – different from Zero.
If equal to
zero then- there is no information.
> #
Predict children’s height # 7% variances is explained by father’s height
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 2.4000
1.1155 2.151 0.1205
x 0.6000 0.2108
2.846 0.0653 .
Signif.
codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
‘.’ 0.1 ‘ ’ 1
Residual
standard error: 2.309 on 3 degrees of freedom
Multiple
R-squared: 0.7297, Adjusted R-squared: 0.6396
F-statistic: 8.1 on 1 and 3 DF, p-value: 0.06532
44. Adjusted R-Square Value
When there
are more number of variables (x) added to the model the “Multiple R-squared” doesn’t
provide the right information,
For that “Adjusted
R-square” value is used.
Predict height
of the children, given father’s and mother’s height.
Eg. Y =
34.65+ 0.42 * father_height + 5.17(male)
Here Mother
and father are “X” used but only the factor (gender) M & father values are
displayed
The left
out “X” Mother’s height is the – Intercept values
So, the
left-out “X” values are always represented in the intercept.
Residual
Sum of Squares SSres
46. Matrix multiplication
Hat matrix –
(refer to more internet content)
fitted or
predicted values from y = Xb
H(pxp)= Hat
matrix = X’(X’X)-1X
Xb is the
predicted vector
Predicted
vector = X(X’X)-1X’y
standardized residuals has sqrt(1-hii)
in the denominator
47. Logistic regression
Logistic
regression is called as Classification Method.
The
dependent variables are a class like – Yes/No, Zero/One, etc.
Here are
the Diagram to find the confusion matrix
Confusion
matrix to determine the Goodness of the model
How to
determine the model built is best?
McFadden
goodness of fit measure is used for the same.
47.1 ROC –
Receiver Operating Curve
ROC = True “+ve”
% vs False “+ve”
%
How much
willing to allow 10% error to attain 80% of True “+ve” %.
47.2
F1 Measure
= weighted average of Precision and Recall
Precision =
(True “+ve”) / (True “+ve” + False “+ve”)
Recall = (True
“+ve”) / (False “-ve” + True “+ve”)
· Health problem detection
Comments
Post a Comment