added old data
This commit is contained in:
parent
584470596d
commit
15006a3637
842
ISLR/.ipynb_checkpoints/ch2-8-checkpoint.ipynb
Normal file
842
ISLR/.ipynb_checkpoints/ch2-8-checkpoint.ipynb
Normal file
File diff suppressed because one or more lines are too long
840
ISLR/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
Normal file
840
ISLR/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
Normal file
File diff suppressed because one or more lines are too long
593
ISLR/.ipynb_checkpoints/ch2-9R-checkpoint.ipynb
Normal file
593
ISLR/.ipynb_checkpoints/ch2-9R-checkpoint.ipynb
Normal file
File diff suppressed because one or more lines are too long
BIN
ISLR/Concepts.emmx
Normal file
BIN
ISLR/Concepts.emmx
Normal file
Binary file not shown.
@ -38,14 +38,14 @@ To go futher with our mean analogy, how do we compute how far the sample mean is
|
|||||||
$$SE(\hat{\mu})^2 = {\sigma^2 \over n}$$
|
$$SE(\hat{\mu})^2 = {\sigma^2 \over n}$$
|
||||||
|
|
||||||
similarly,
|
similarly,
|
||||||
![](./ISLR/pics/ch3-1.png)
|
![](./pics/ch3-1.png)
|
||||||
$\sigma^2$ is generally not known, but we can use an estimate called **residual standard error** which is calculated as follows:
|
$\sigma^2$ is generally not known, but we can use an estimate called **residual standard error** which is calculated as follows:
|
||||||
$$RSE = \sqrt{RSS/(n-2)}$$
|
$$RSE = \sqrt{RSS/(n-2)}$$
|
||||||
|
|
||||||
Standard errors can be used to compute **confidence intervals**.
|
Standard errors can be used to compute **confidence intervals**.
|
||||||
For linear regression, the 95 % confidence interval for β1
|
For linear regression, the 95 % confidence interval for β1
|
||||||
approximately takes the form
|
approximately takes the form
|
||||||
![](./ISLR/pics/ch3-2.png)
|
![](./pics/ch3-2.png)
|
||||||
the factor of 2 in front of the SE(β̂1 ) term will vary slightly depending on the number of observations n in the linear regression. To be precise, rather than the number 2, it should contain the 97.5 % quantile of a t-distribution with n−2 degrees of freedom.
|
the factor of 2 in front of the SE(β̂1 ) term will vary slightly depending on the number of observations n in the linear regression. To be precise, rather than the number 2, it should contain the 97.5 % quantile of a t-distribution with n−2 degrees of freedom.
|
||||||
We can also use this for hypothesis testing. To test the null hypothesis, we need to determine whether β̂1 , our estimate for β1 , is sufficiently far from zero that we can be confident that β1 is non-zero. How far depends on SE(β̂1). small SE allows for small numbers, and in contrast, if SE is large, we need a large b1 to tell us that b1 = 0.
|
We can also use this for hypothesis testing. To test the null hypothesis, we need to determine whether β̂1 , our estimate for β1 , is sufficiently far from zero that we can be confident that β1 is non-zero. How far depends on SE(β̂1). small SE allows for small numbers, and in contrast, if SE is large, we need a large b1 to tell us that b1 = 0.
|
||||||
We're actually computing the t statistic,
|
We're actually computing the t statistic,
|
||||||
@ -59,7 +59,7 @@ Once we've rejected the null hypotheses, we would likely want to know to what ex
|
|||||||
### RSE
|
### RSE
|
||||||
Recall that because of the irreducable error, we won't be able to perfectly predict Y anyway. RSE is an estimate of the std of $\epsilon$. Roughly speaking, it is the average amount that the response
|
Recall that because of the irreducable error, we won't be able to perfectly predict Y anyway. RSE is an estimate of the std of $\epsilon$. Roughly speaking, it is the average amount that the response
|
||||||
will deviate from the true regression line. It is computed using the formula
|
will deviate from the true regression line. It is computed using the formula
|
||||||
![](ISLR/pics/ch3-3.png)
|
![](pics/ch3-3.png)
|
||||||
In table 3.2, we have an RSE of 3.26. Another way to think about this is that even if the model were correct and the true values of the unknown coefficients β0 and β1 were known exactly, any prediction of sales on the basis of TV
|
In table 3.2, we have an RSE of 3.26. Another way to think about this is that even if the model were correct and the true values of the unknown coefficients β0 and β1 were known exactly, any prediction of sales on the basis of TV
|
||||||
advertising would still be off by about 3.260 units on average.
|
advertising would still be off by about 3.260 units on average.
|
||||||
It depends on the context wether this is acceptable. In the advertising data set, the mean value of sales over all markets is approximately 1 units, and so the percentage error is
|
It depends on the context wether this is acceptable. In the advertising data set, the mean value of sales over all markets is approximately 1 units, and so the percentage error is
|
||||||
@ -69,7 +69,7 @@ The RSE is considered a measure of the lack of fit of the model to the data. As
|
|||||||
### R^2 statistic
|
### R^2 statistic
|
||||||
The RSE provides an absolute measure of lack of fit of the model to the data. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE. The $R^2$ statistic provides an alternative measure of fit. It takes the form of a proportion—the proportion of variance explained—and so it always takes on a value between 0 and 1, and is independent of the scale of Y .
|
The RSE provides an absolute measure of lack of fit of the model to the data. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE. The $R^2$ statistic provides an alternative measure of fit. It takes the form of a proportion—the proportion of variance explained—and so it always takes on a value between 0 and 1, and is independent of the scale of Y .
|
||||||
|
|
||||||
![](ISLR/pics/ch3-4.png)
|
![](pics/ch3-4.png)
|
||||||
|
|
||||||
TTS is the total variance in Y. TSS - RSS is the ammount of variability that can be explained with the regression. $R^2$ is the proportion of the variability of Y that can be explained with X. Higher is better. If it's low, regression did not explain much of the variability, and may be due the fact that the real world problem isnt linear at all or that the inherent error $\sigma^2$ is high.
|
TTS is the total variance in Y. TSS - RSS is the ammount of variability that can be explained with the regression. $R^2$ is the proportion of the variability of Y that can be explained with X. Higher is better. If it's low, regression did not explain much of the variability, and may be due the fact that the real world problem isnt linear at all or that the inherent error $\sigma^2$ is high.
|
||||||
The pro is that it's way more interpretable than RSE. How close to 1 is acceptable depends on the context. In physics, a number that's not extremely close to 1 might indicate a serious problem with the experiment, but in biology, sociology, etc, a value of 0.1 might be realistic. Also, correlation(r) is another good measure. In simple linear regression, $R^2 = r^2$
|
The pro is that it's way more interpretable than RSE. How close to 1 is acceptable depends on the context. In physics, a number that's not extremely close to 1 might indicate a serious problem with the experiment, but in biology, sociology, etc, a value of 0.1 might be realistic. Also, correlation(r) is another good measure. In simple linear regression, $R^2 = r^2$
|
||||||
@ -101,15 +101,15 @@ To answer this, we test the null hypothesis:
|
|||||||
H0 : every $X_i$ is zero.
|
H0 : every $X_i$ is zero.
|
||||||
We do this with the **F-statistic**
|
We do this with the **F-statistic**
|
||||||
|
|
||||||
![](ISLR/pics/ch3-5.png)
|
![](pics/ch3-5.png)
|
||||||
![](ISLR/pics/ch3-6.png)
|
![](pics/ch3-6.png)
|
||||||
Hence, when there is no relationship between the response and predictors,
|
Hence, when there is no relationship between the response and predictors,
|
||||||
one would expect the F-statistic to take on a value close to 1, and a lot greater than 1 otherwise.
|
one would expect the F-statistic to take on a value close to 1, and a lot greater than 1 otherwise.
|
||||||
|
|
||||||
The larger the number of datapoints n, the smaller F has to be to reject the null hypothesis. Every good software package provides a way to calculate the **p-value** associated with the F-statistic using this distribution. Based on this p-value, we can determine whether or not to reject H0 .
|
The larger the number of datapoints n, the smaller F has to be to reject the null hypothesis. Every good software package provides a way to calculate the **p-value** associated with the F-statistic using this distribution. Based on this p-value, we can determine whether or not to reject H0 .
|
||||||
|
|
||||||
Sometimes we only want to see wether a subset q of the coefficients is zero. We just create a model with only those subset of predictors, and do the same analysis as above, but this time,
|
Sometimes we only want to see wether a subset q of the coefficients is zero. We just create a model with only those subset of predictors, and do the same analysis as above, but this time,
|
||||||
![](ISLR/pics/ch3-7.png)
|
![](pics/ch3-7.png)
|
||||||
|
|
||||||
**if p > n, we can't fit the linear regression model with least squares, so we don't use the F statistic, or most concepts discussed in this chapter. When p is large, some of the approaches discussed in the next section, such as *forward selection*, can be used. This *high-dimensional* setting will be discussed later.**
|
**if p > n, we can't fit the linear regression model with least squares, so we don't use the F statistic, or most concepts discussed in this chapter. When p is large, some of the approaches discussed in the next section, such as *forward selection*, can be used. This *high-dimensional* setting will be discussed later.**
|
||||||
|
|
||||||
@ -117,7 +117,7 @@ Sometimes we only want to see wether a subset q of the coefficients is zero. We
|
|||||||
|
|
||||||
*Variable selection*, the practice of determining which predictors are associated with the response, in order to fit a single model involving only those predictors is extensively discussed in Ch6, but we'll go a bit in it here.
|
*Variable selection*, the practice of determining which predictors are associated with the response, in order to fit a single model involving only those predictors is extensively discussed in Ch6, but we'll go a bit in it here.
|
||||||
|
|
||||||
![](ISLR/pics/ch3-8.png)
|
![](pics/ch3-8.png)
|
||||||
Unfortunately, we need to fit and test $2^p$ models, which might be very impractical, so we need an automated and efficient approach.
|
Unfortunately, we need to fit and test $2^p$ models, which might be very impractical, so we need an automated and efficient approach.
|
||||||
There are 3 classical approaches available:
|
There are 3 classical approaches available:
|
||||||
* Forward selection. We start with a model with no predictors. We then test simple regression models for all p, selecting the one with the lowest RSS. We then test all models with 2 variables containing the previous one, again selecting the one with the lowest RSS. We keep this up untill some stopping rule says we stop (e.g. we only want 5 vars).
|
* Forward selection. We start with a model with no predictors. We then test simple regression models for all p, selecting the one with the lowest RSS. We then test all models with 2 variables containing the previous one, again selecting the one with the lowest RSS. We keep this up untill some stopping rule says we stop (e.g. we only want 5 vars).
|
||||||
@ -183,7 +183,7 @@ However, it is sometimes the case that an interaction term has a very small p-va
|
|||||||
Here we present a very simple way to directly extend the linear model to accommodate non-linear relationships, using polynomial regression. We'll present more complex methods later.
|
Here we present a very simple way to directly extend the linear model to accommodate non-linear relationships, using polynomial regression. We'll present more complex methods later.
|
||||||
|
|
||||||
Take the example of miles per gallon with horsepower as predictor
|
Take the example of miles per gallon with horsepower as predictor
|
||||||
![](ISLR/pics/ch3-9.png)
|
![](pics/ch3-9.png)
|
||||||
We clearly see a relationship, but we also see it's nonlinear.
|
We clearly see a relationship, but we also see it's nonlinear.
|
||||||
Now, the model looks like
|
Now, the model looks like
|
||||||
$$
|
$$
|
||||||
@ -215,8 +215,8 @@ If we spot a nonlinear problems, we can include nonlinear transformations such a
|
|||||||
|
|
||||||
### 2. Correlation of Error Terms
|
### 2. Correlation of Error Terms
|
||||||
we made the assumption that $\epsilon_0, \ldots, \epsilon_n$ are not correlated. The standard errors calculated are certainly based on that assumption. But if there is correlation, the estimated standard errors will underestimate the true standard errors. Confidence and prediction intervals will be smaller than they should be. p-values will be lower. We may falsely conclude that a predictor is statistically significant.
|
we made the assumption that $\epsilon_0, \ldots, \epsilon_n$ are not correlated. The standard errors calculated are certainly based on that assumption. But if there is correlation, the estimated standard errors will underestimate the true standard errors. Confidence and prediction intervals will be smaller than they should be. p-values will be lower. We may falsely conclude that a predictor is statistically significant.
|
||||||
![](ISLR/pics/ch3-10.png)
|
![](pics/ch3-10.png)
|
||||||
![](ISLR/pics/ch3-11.png)
|
![](pics/ch3-11.png)
|
||||||
There are many different methods to take into account the correlation of error terms in time series data. But it can certainly happen outside of time series data.
|
There are many different methods to take into account the correlation of error terms in time series data. But it can certainly happen outside of time series data.
|
||||||
**In general, the assumption of uncorrelated errors is extremely important for linear regression as well as for other statistical methods, and good experimental design is crucial in order to mitigate the risk of such correlations.**
|
**In general, the assumption of uncorrelated errors is extremely important for linear regression as well as for other statistical methods, and good experimental design is crucial in order to mitigate the risk of such correlations.**
|
||||||
|
|
||||||
@ -239,7 +239,7 @@ Another important assumption is that the errors have a constant variance. The st
|
|||||||
|
|
||||||
One can identify non-constant variances in
|
One can identify non-constant variances in
|
||||||
the errors, or **heteroscedasticity**, from the presence of a funnel shape in the residual plot. When faced with this problem, one possible solution is to transform the response Y using a concave function such as log Y or sqrt(Y). If instead the error decreases, we could maybe use Y^2. Such a transformation results in a greater amount of shrinkage of the larger responses, leading to a reduction in heteroscedasticity.
|
the errors, or **heteroscedasticity**, from the presence of a funnel shape in the residual plot. When faced with this problem, one possible solution is to transform the response Y using a concave function such as log Y or sqrt(Y). If instead the error decreases, we could maybe use Y^2. Such a transformation results in a greater amount of shrinkage of the larger responses, leading to a reduction in heteroscedasticity.
|
||||||
![](ISLR/pics/ch3-12.png)
|
![](pics/ch3-12.png)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -254,7 +254,7 @@ each of these raw observations is uncorrelated with variance σ 2 , then their a
|
|||||||
if Yi is far from predicted Yi. They arise from various causes, such as incorrect recording of data.
|
if Yi is far from predicted Yi. They arise from various causes, such as incorrect recording of data.
|
||||||
It is typical for an outlier that does not have an unusual
|
It is typical for an outlier that does not have an unusual
|
||||||
predictor value to have little effect on the least squares fit. However, even if an outlier does not have much effect on the least squares fit, it can cause other problems. For instance, in this example, the RSE is 1.09 when the outlier is included in the regression, but it is only 0.77 when the outlier is removed. Since the RSE is used to compute all confidence intervals and p-values, such a dramatic increase caused by a single data point can have implications for the interpretation of the fit. Similarly, inclusion of the outlier causes the R2 to decline from 0.892 to 0.805.
|
predictor value to have little effect on the least squares fit. However, even if an outlier does not have much effect on the least squares fit, it can cause other problems. For instance, in this example, the RSE is 1.09 when the outlier is included in the regression, but it is only 0.77 when the outlier is removed. Since the RSE is used to compute all confidence intervals and p-values, such a dramatic increase caused by a single data point can have implications for the interpretation of the fit. Similarly, inclusion of the outlier causes the R2 to decline from 0.892 to 0.805.
|
||||||
![](ISLR/pics/ch3-13.png)
|
![](pics/ch3-13.png)
|
||||||
Residual plots can be used to identify outliers. But in practice, it can be difficult to decide how large a residual needs to be before we consider the point to be an outlier. To address this problem, instead of plotting the residuals, we can plot the **studentized residuals**, computed by dividing each residual ei by its estimated error. Observations whose studentized residuals are greater than 3 in absolute value are possible outliers.
|
Residual plots can be used to identify outliers. But in practice, it can be difficult to decide how large a residual needs to be before we consider the point to be an outlier. To address this problem, instead of plotting the residuals, we can plot the **studentized residuals**, computed by dividing each residual ei by its estimated error. Observations whose studentized residuals are greater than 3 in absolute value are possible outliers.
|
||||||
|
|
||||||
If we believe that an outlier has occurred due to an error in data collection or recording, then one solution is to simply remove the observation. However, care should be taken, since an outlier may instead indicate a deficiency with the model, such as a missing predictor.
|
If we believe that an outlier has occurred due to an error in data collection or recording, then one solution is to simply remove the observation. However, care should be taken, since an outlier may instead indicate a deficiency with the model, such as a missing predictor.
|
||||||
@ -263,7 +263,7 @@ If we believe that an outlier has occurred due to an error in data collection or
|
|||||||
### 5. High Leverage Points
|
### 5. High Leverage Points
|
||||||
|
|
||||||
This is kinda the reverse of an outlier. Instead of having a unusual Y for an X, observations with **high leverage** have an unusual X.
|
This is kinda the reverse of an outlier. Instead of having a unusual Y for an X, observations with **high leverage** have an unusual X.
|
||||||
![](ISLR/pics/ch3-14.png)
|
![](pics/ch3-14.png)
|
||||||
|
|
||||||
High leverage observations tend to have a sizable impact on the estimated regression line. It is cause for concern if the least squares line is heavily affected by just a couple of observations, because any problems with these points may invalidate the entire fit.
|
High leverage observations tend to have a sizable impact on the estimated regression line. It is cause for concern if the least squares line is heavily affected by just a couple of observations, because any problems with these points may invalidate the entire fit.
|
||||||
|
|
||||||
@ -275,7 +275,7 @@ statistic.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
![](ISLR/pics/ch3-15.png)
|
![](pics/ch3-15.png)
|
||||||
|
|
||||||
There is a simple extension of hi to the case of multiple predictors, though we do not provide the formula here. The leverage statistic hi is always between 1/n and 1, and the average leverage for all the observations is always equal to (p + 1)/n. So if a given observation has a leverage statistic that greatly exceeds (p+ 1)/n, then we may suspect that the corresponding point has high leverage.
|
There is a simple extension of hi to the case of multiple predictors, though we do not provide the formula here. The leverage statistic hi is always between 1/n and 1, and the average leverage for all the observations is always equal to (p + 1)/n. So if a given observation has a leverage statistic that greatly exceeds (p+ 1)/n, then we may suspect that the corresponding point has high leverage.
|
||||||
|
|
||||||
@ -288,7 +288,7 @@ This refers to when 2 or more variables are closely related/dependent. The pres
|
|||||||
The left-hand panel of Figure 3.15 is a contour plot of the RSS (3.22)
|
The left-hand panel of Figure 3.15 is a contour plot of the RSS (3.22)
|
||||||
associated with different possible coefficient estimates for the regression of balance on limit and age .
|
associated with different possible coefficient estimates for the regression of balance on limit and age .
|
||||||
|
|
||||||
![](ISLR/pics/ch3-16.png)
|
![](pics/ch3-16.png)
|
||||||
|
|
||||||
**Blah blagh GO READ THE TEXT AGAIN**
|
**Blah blagh GO READ THE TEXT AGAIN**
|
||||||
This results in a great deal of uncertainty in the
|
This results in a great deal of uncertainty in the
|
||||||
@ -304,7 +304,7 @@ has a particularly high correlation. We call this situation **multicollinearity*
|
|||||||
|
|
||||||
A better way to assess multicollinearity is to compute the **variance inflation factor** (VIF). The VIF is the ratio of the variance of β̂j when fitting the full model divided by the variance of β̂j if fit on its own. The smallest possible value for VIF is 1, which indicates the complete absence of collinearity. Typically in practice there is a small amount of collinearity among the predictors. Good rule of thumb: **VIF > 5 or 10 = BAD!!**
|
A better way to assess multicollinearity is to compute the **variance inflation factor** (VIF). The VIF is the ratio of the variance of β̂j when fitting the full model divided by the variance of β̂j if fit on its own. The smallest possible value for VIF is 1, which indicates the complete absence of collinearity. Typically in practice there is a small amount of collinearity among the predictors. Good rule of thumb: **VIF > 5 or 10 = BAD!!**
|
||||||
|
|
||||||
![](ISLR/pics/ch3-17.png)
|
![](pics/ch3-17.png)
|
||||||
|
|
||||||
When faced with the problem of collinearity, there are two simple solu-
|
When faced with the problem of collinearity, there are two simple solu-
|
||||||
tions. The first is to drop one of the problematic variables from the regression. This isn't a huge loss, because colinearity indicates that one of them is redundant in the presence of the other variable(s).
|
tions. The first is to drop one of the problematic variables from the regression. This isn't a huge loss, because colinearity indicates that one of them is redundant in the presence of the other variable(s).
|
||||||
@ -332,6 +332,84 @@ there is effectively a reduction in sample size.
|
|||||||
As a general rule, parametric methods will tend to outperform non-parametric approaches when there is a small number of observations per predictor.
|
As a general rule, parametric methods will tend to outperform non-parametric approaches when there is a small number of observations per predictor.
|
||||||
|
|
||||||
|
|
||||||
|
# Excercises
|
||||||
|
|
||||||
|
1. Describe the null hypotheses to which the p-values given in Table 3.4
|
||||||
|
correspond. Explain what conclusions you can draw based on these
|
||||||
|
p-values. Your explanation should be phrased in terms of sales, TV,
|
||||||
|
radio, and newspaper, rather than in terms of the coefficients of the
|
||||||
|
linear model.
|
||||||
|
|
||||||
|
**The p-values of Intercept, radio and TV are extremely close to 0, and we can reject the null hypotheses of these. Furthermore, these also have a high T-value, further strengthening this.
|
||||||
|
This is in opposition to newspaper, with low t-values and high p-values, meaning that there isn't a strong correlation between newspaper and sales.**
|
||||||
|
|
||||||
|
|
||||||
|
2.Carefully explain the differences between the KNN classifier and KNN
|
||||||
|
regression methods.
|
||||||
|
|
||||||
|
KNN classifier returns a category, regression returns a number.
|
||||||
|
classifier returns the category, and the probability samecategory/neighbours.
|
||||||
|
regression returns the mean of all the neighbours nearby.
|
||||||
|
|
||||||
|
Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ,
|
||||||
|
X3 = Gender (1 for Female and 0 for Male), X4 = Interaction between
|
||||||
|
GPA and IQ, and X5 = Interaction between GPA and Gender. The
|
||||||
|
response is starting salary after graduation (in thousands of dollars).
|
||||||
|
Suppose we use least squares to fit the model, and get β̂0 = 50, β̂1 =
|
||||||
|
20, β̂2 = 0.07, β̂3 = 35, β̂4 = 0.01, β̂5 = −10.
|
||||||
|
|
||||||
|
|
||||||
|
For a fixed value of IQ and GPA, males earn more on average
|
||||||
|
than females.
|
||||||
|
|
||||||
|
**False. As long as GPA < 3.5, females earn more**
|
||||||
|
|
||||||
|
For a fixed value of IQ and GPA, females earn more on
|
||||||
|
average than males.
|
||||||
|
**false, for a gpa > 3.5, male earns more**
|
||||||
|
|
||||||
|
iii. For a fixed value of IQ and GPA, males earn more on average
|
||||||
|
than females provided that the GPA is high enough.
|
||||||
|
**True**
|
||||||
|
|
||||||
|
Predict the salary of a female with IQ of 110 and a GPA of 4.0.
|
||||||
|
**200 + 7.7 + 35 + 4.4 -40 = 207.1 x thousand**
|
||||||
|
|
||||||
|
True or false: Since the coefficient for the GPA/IQ interaction
|
||||||
|
term is very small, there is very little evidence of an interaction
|
||||||
|
effect. Justify your answer.
|
||||||
|
|
||||||
|
False. **You have to look at the p values, the R values and the t value for that**
|
||||||
|
|
||||||
|
4. I collect a set of data (n = 100 observations) containing a single
|
||||||
|
predictor and a quantitative response. I then fit a linear regression
|
||||||
|
model to the data, as well as a separate cubic regression, i.e. Y =
|
||||||
|
β0 + β1 X + β2 X 2 + β3 X 3 + .
|
||||||
|
|
||||||
|
(a) Suppose that the true relationship between X and Y is linear,
|
||||||
|
i.e. Y = β0 + β1 X + elipson. Consider the training residual sum of
|
||||||
|
squares (RSS) for the linear regression, and also the training
|
||||||
|
RSS for the cubic regression. Would we expect one to be lower
|
||||||
|
than the other, would we expect them to be the same, or is there
|
||||||
|
not enough information to tell? Justify your answer.
|
||||||
|
|
||||||
|
**cubic expression would either be the same or larger. The same, because the model can compute the betas > 1 to be zero. Or larger, if it finds patterns that aren't there**
|
||||||
|
|
||||||
|
(b) Answer (a) using test rather than training RSS.
|
||||||
|
**We would find low t values for the betas > 1.**
|
||||||
|
|
||||||
|
(c) Suppose that the true relationship between X and Y is not linear,
|
||||||
|
but we don’t know how far it is from linear. Consider the training
|
||||||
|
RSS for the linear regression, and also the training RSS for the
|
||||||
|
cubic regression. Would we expect one to be lower than the
|
||||||
|
other, would we expect them to be the same, or is there not
|
||||||
|
enough information to tell? Justify your answer.
|
||||||
|
|
||||||
|
**I expect the RSS to be lower for cubic regression, because it'll be able to model it better**
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
338
ISLR/notebooks/.ipynb_checkpoints/3.6.2.R-checkpoint.ipynb
Normal file
338
ISLR/notebooks/.ipynb_checkpoints/3.6.2.R-checkpoint.ipynb
Normal file
@ -0,0 +1,338 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 5,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"library(MASS)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 6,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/html": [
|
||||||
|
"\n",
|
||||||
|
"<table width=\"100%\" summary=\"page for Boston {MASS}\"><tr><td>Boston {MASS}</td><td style=\"text-align: right;\">R Documentation</td></tr></table>\n",
|
||||||
|
"\n",
|
||||||
|
"<h2>\n",
|
||||||
|
"Housing Values in Suburbs of Boston\n",
|
||||||
|
"</h2>\n",
|
||||||
|
"\n",
|
||||||
|
"<h3>Description</h3>\n",
|
||||||
|
"\n",
|
||||||
|
"<p>The <code>Boston</code> data frame has 506 rows and 14 columns.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"<h3>Usage</h3>\n",
|
||||||
|
"\n",
|
||||||
|
"<pre>\n",
|
||||||
|
"Boston\n",
|
||||||
|
"</pre>\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"<h3>Format</h3>\n",
|
||||||
|
"\n",
|
||||||
|
"<p>This data frame contains the following columns:\n",
|
||||||
|
"</p>\n",
|
||||||
|
"\n",
|
||||||
|
"<dl>\n",
|
||||||
|
"<dt><code>crim</code></dt><dd>\n",
|
||||||
|
"<p>per capita crime rate by town.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>zn</code></dt><dd>\n",
|
||||||
|
"<p>proportion of residential land zoned for lots over 25,000 sq.ft.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>indus</code></dt><dd>\n",
|
||||||
|
"<p>proportion of non-retail business acres per town.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>chas</code></dt><dd>\n",
|
||||||
|
"<p>Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>nox</code></dt><dd>\n",
|
||||||
|
"<p>nitrogen oxides concentration (parts per 10 million).\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>rm</code></dt><dd>\n",
|
||||||
|
"<p>average number of rooms per dwelling.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>age</code></dt><dd>\n",
|
||||||
|
"<p>proportion of owner-occupied units built prior to 1940.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>dis</code></dt><dd>\n",
|
||||||
|
"<p>weighted mean of distances to five Boston employment centres.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>rad</code></dt><dd>\n",
|
||||||
|
"<p>index of accessibility to radial highways.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>tax</code></dt><dd>\n",
|
||||||
|
"<p>full-value property-tax rate per \\$10,000.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>ptratio</code></dt><dd>\n",
|
||||||
|
"<p>pupil-teacher ratio by town.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>black</code></dt><dd>\n",
|
||||||
|
"<p><i>1000(Bk - 0.63)^2</i> where <i>Bk</i> is the proportion of blacks\n",
|
||||||
|
"by town.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>lstat</code></dt><dd>\n",
|
||||||
|
"<p>lower status of the population (percent).\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"<dt><code>medv</code></dt><dd>\n",
|
||||||
|
"<p>median value of owner-occupied homes in \\$1000s.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"</dd>\n",
|
||||||
|
"</dl>\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"<h3>Source</h3>\n",
|
||||||
|
"\n",
|
||||||
|
"<p>Harrison, D. and Rubinfeld, D.L. (1978)\n",
|
||||||
|
"Hedonic prices and the demand for clean air.\n",
|
||||||
|
"<em>J. Environ. Economics and Management</em>\n",
|
||||||
|
"<b>5</b>, 81–102.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"<p>Belsley D.A., Kuh, E. and Welsch, R.E. (1980)\n",
|
||||||
|
"<em>Regression Diagnostics. Identifying Influential Data and Sources\n",
|
||||||
|
"of Collinearity.</em>\n",
|
||||||
|
"New York: Wiley.\n",
|
||||||
|
"</p>\n",
|
||||||
|
"\n",
|
||||||
|
"<hr /><div style=\"text-align: center;\">[Package <em>MASS</em> version 7.3-51.5 ]</div>"
|
||||||
|
],
|
||||||
|
"text/latex": [
|
||||||
|
"\\inputencoding{utf8}\n",
|
||||||
|
"\\HeaderA{Boston}{Housing Values in Suburbs of Boston}{Boston}\n",
|
||||||
|
"\\keyword{datasets}{Boston}\n",
|
||||||
|
"%\n",
|
||||||
|
"\\begin{Description}\\relax\n",
|
||||||
|
"The \\code{Boston} data frame has 506 rows and 14 columns.\n",
|
||||||
|
"\\end{Description}\n",
|
||||||
|
"%\n",
|
||||||
|
"\\begin{Usage}\n",
|
||||||
|
"\\begin{verbatim}\n",
|
||||||
|
"Boston\n",
|
||||||
|
"\\end{verbatim}\n",
|
||||||
|
"\\end{Usage}\n",
|
||||||
|
"%\n",
|
||||||
|
"\\begin{Format}\n",
|
||||||
|
"This data frame contains the following columns:\n",
|
||||||
|
"\\begin{description}\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{crim}] \n",
|
||||||
|
"per capita crime rate by town.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{zn}] \n",
|
||||||
|
"proportion of residential land zoned for lots over 25,000 sq.ft.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{indus}] \n",
|
||||||
|
"proportion of non-retail business acres per town.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{chas}] \n",
|
||||||
|
"Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{nox}] \n",
|
||||||
|
"nitrogen oxides concentration (parts per 10 million).\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{rm}] \n",
|
||||||
|
"average number of rooms per dwelling.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{age}] \n",
|
||||||
|
"proportion of owner-occupied units built prior to 1940.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{dis}] \n",
|
||||||
|
"weighted mean of distances to five Boston employment centres.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{rad}] \n",
|
||||||
|
"index of accessibility to radial highways.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{tax}] \n",
|
||||||
|
"full-value property-tax rate per \\bsl{}\\$10,000.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{ptratio}] \n",
|
||||||
|
"pupil-teacher ratio by town.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{black}] \n",
|
||||||
|
"\\eqn{1000(Bk - 0.63)^2}{} where \\eqn{Bk}{} is the proportion of blacks\n",
|
||||||
|
"by town.\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{lstat}] \n",
|
||||||
|
"lower status of the population (percent).\n",
|
||||||
|
"\n",
|
||||||
|
"\\item[\\code{medv}] \n",
|
||||||
|
"median value of owner-occupied homes in \\bsl{}\\$1000s.\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"\\end{description}\n",
|
||||||
|
"\n",
|
||||||
|
"\\end{Format}\n",
|
||||||
|
"%\n",
|
||||||
|
"\\begin{Source}\\relax\n",
|
||||||
|
"Harrison, D. and Rubinfeld, D.L. (1978)\n",
|
||||||
|
"Hedonic prices and the demand for clean air.\n",
|
||||||
|
"\\emph{J. Environ. Economics and Management}\n",
|
||||||
|
"\\bold{5}, 81--102.\n",
|
||||||
|
"\n",
|
||||||
|
"Belsley D.A., Kuh, E. and Welsch, R.E. (1980)\n",
|
||||||
|
"\\emph{Regression Diagnostics. Identifying Influential Data and Sources\n",
|
||||||
|
"of Collinearity.}\n",
|
||||||
|
"New York: Wiley.\n",
|
||||||
|
"\\end{Source}"
|
||||||
|
],
|
||||||
|
"text/plain": [
|
||||||
|
"Boston package:MASS R Documentation\n",
|
||||||
|
"\n",
|
||||||
|
"_\bH_\bo_\bu_\bs_\bi_\bn_\bg _\bV_\ba_\bl_\bu_\be_\bs _\bi_\bn _\bS_\bu_\bb_\bu_\br_\bb_\bs _\bo_\bf _\bB_\bo_\bs_\bt_\bo_\bn\n",
|
||||||
|
"\n",
|
||||||
|
"_\bD_\be_\bs_\bc_\br_\bi_\bp_\bt_\bi_\bo_\bn:\n",
|
||||||
|
"\n",
|
||||||
|
" The ‘Boston’ data frame has 506 rows and 14 columns.\n",
|
||||||
|
"\n",
|
||||||
|
"_\bU_\bs_\ba_\bg_\be:\n",
|
||||||
|
"\n",
|
||||||
|
" Boston\n",
|
||||||
|
" \n",
|
||||||
|
"_\bF_\bo_\br_\bm_\ba_\bt:\n",
|
||||||
|
"\n",
|
||||||
|
" This data frame contains the following columns:\n",
|
||||||
|
"\n",
|
||||||
|
" ‘crim’ per capita crime rate by town.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘zn’ proportion of residential land zoned for lots over 25,000\n",
|
||||||
|
" sq.ft.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘indus’ proportion of non-retail business acres per town.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘chas’ Charles River dummy variable (= 1 if tract bounds river; 0\n",
|
||||||
|
" otherwise).\n",
|
||||||
|
"\n",
|
||||||
|
" ‘nox’ nitrogen oxides concentration (parts per 10 million).\n",
|
||||||
|
"\n",
|
||||||
|
" ‘rm’ average number of rooms per dwelling.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘age’ proportion of owner-occupied units built prior to 1940.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘dis’ weighted mean of distances to five Boston employment\n",
|
||||||
|
" centres.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘rad’ index of accessibility to radial highways.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘tax’ full-value property-tax rate per \\$10,000.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘ptratio’ pupil-teacher ratio by town.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘black’ 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by\n",
|
||||||
|
" town.\n",
|
||||||
|
"\n",
|
||||||
|
" ‘lstat’ lower status of the population (percent).\n",
|
||||||
|
"\n",
|
||||||
|
" ‘medv’ median value of owner-occupied homes in \\$1000s.\n",
|
||||||
|
"\n",
|
||||||
|
"_\bS_\bo_\bu_\br_\bc_\be:\n",
|
||||||
|
"\n",
|
||||||
|
" Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the\n",
|
||||||
|
" demand for clean air. _J. Environ. Economics and Management_ *5*,\n",
|
||||||
|
" 81-102.\n",
|
||||||
|
"\n",
|
||||||
|
" Belsley D.A., Kuh, E. and Welsch, R.E. (1980) _Regression\n",
|
||||||
|
" Diagnostics. Identifying Influential Data and Sources of\n",
|
||||||
|
" Collinearity._ New York: Wiley.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "display_data"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"?Boston"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 23,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"\n",
|
||||||
|
"Call:\n",
|
||||||
|
"lm(formula = medv ~ lstat, data = Boston)\n",
|
||||||
|
"\n",
|
||||||
|
"Residuals:\n",
|
||||||
|
" Min 1Q Median 3Q Max \n",
|
||||||
|
"-15.168 -3.990 -1.318 2.034 24.500 \n",
|
||||||
|
"\n",
|
||||||
|
"Coefficients:\n",
|
||||||
|
" Estimate Std. Error t value Pr(>|t|) \n",
|
||||||
|
"(Intercept) 34.55384 0.56263 61.41 <2e-16 ***\n",
|
||||||
|
"lstat -0.95005 0.03873 -24.53 <2e-16 ***\n",
|
||||||
|
"---\n",
|
||||||
|
"Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1\n",
|
||||||
|
"\n",
|
||||||
|
"Residual standard error: 6.216 on 504 degrees of freedom\n",
|
||||||
|
"Multiple R-squared: 0.5441,\tAdjusted R-squared: 0.5432 \n",
|
||||||
|
"F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "display_data"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"lm.fit = lm(medv~lstat, data=Boston)\n",
|
||||||
|
"summary(lm.fit)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "R",
|
||||||
|
"language": "R",
|
||||||
|
"name": "ir"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": "r",
|
||||||
|
"file_extension": ".r",
|
||||||
|
"mimetype": "text/x-r-source",
|
||||||
|
"name": "R",
|
||||||
|
"pygments_lexer": "r",
|
||||||
|
"version": "3.6.3"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 4
|
||||||
|
}
|
443
ISLR/notebooks/.ipynb_checkpoints/3.6.2.python-checkpoint.ipynb
Normal file
443
ISLR/notebooks/.ipynb_checkpoints/3.6.2.python-checkpoint.ipynb
Normal file
File diff suppressed because one or more lines are too long
518
ISLR/notebooks/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
Normal file
518
ISLR/notebooks/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
Normal file
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue
Block a user