added old data

2020-08-01 19:25:45 -03:00
parent 584470596d
commit 15006a3637
10 changed files with 6610 additions and 309 deletions
--- a/ISLR/.ipynb_checkpoints/ch2-8-checkpoint.ipynb
+++ b/ISLR/.ipynb_checkpoints/ch2-8-checkpoint.ipynb
--- a/ISLR/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
+++ b/ISLR/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
--- a/ISLR/.ipynb_checkpoints/ch2-9R-checkpoint.ipynb
+++ b/ISLR/.ipynb_checkpoints/ch2-9R-checkpoint.ipynb
--- a/ISLR/Concepts.emmx
+++ b/ISLR/Concepts.emmx
--- a/ISLR/islr-ch3.md
+++ b/ISLR/islr-ch3.md
@ -38,14 +38,14 @@ To go futher with our mean analogy, how do we compute how far the sample mean is
 $$SE(\hat{\mu})^2 = {\sigma^2 \over n}$$
 similarly,
-![](./ISLR/pics/ch3-1.png)
+![](./pics/ch3-1.png)
 $\sigma^2$ is generally not known, but we can use an estimate called **residual standard error**  which is calculated as follows:
 $$RSE = \sqrt{RSS/(n-2)}$$
 Standard errors can be used to compute **confidence intervals**.
 For linear regression, the 95 % confidence interval for β1
 approximately takes the form
-![](./ISLR/pics/ch3-2.png)
+![](./pics/ch3-2.png)
 the factor of 2 in front of the SE(β̂1 ) term will vary slightly depending on the number of observations n in the linear regression. To be precise, rather than the number 2, it should contain the 97.5 % quantile of a t-distribution with n−2 degrees of freedom. 
 We can also use this for hypothesis testing. To test the null hypothesis, we need to determine whether β̂1 , our estimate for β1 , is sufficiently far from zero that we can be confident that β1 is non-zero. How far depends on SE(β̂1). small SE allows for small numbers, and in contrast, if SE is large, we need a large b1 to tell us that b1 = 0. 
 We're actually computing the t statistic,
@ -59,7 +59,7 @@ Once we've rejected the null hypotheses, we would likely want to know to what ex
 ### RSE
 Recall that because of the irreducable error, we won't be able to perfectly predict Y anyway. RSE is an estimate of the std of $\epsilon$.  Roughly speaking, it is the average amount that the response
 will deviate from the true regression line. It is computed using the formula
-![](ISLR/pics/ch3-3.png)
+![](pics/ch3-3.png)
 In table 3.2, we have an RSE of 3.26.  Another way to think about this is that even if the model were correct and the true values of the unknown coefficients β0 and β1 were known exactly, any prediction of sales on the basis of TV
 advertising would still be off by about 3.260 units on average.
 It depends on the context wether this is acceptable. In the advertising data set, the mean value of sales over all markets is approximately 1 units, and so the percentage error is
@ -69,7 +69,7 @@ The RSE is considered a measure of the lack of fit of the model to the data. As
 ### R^2 statistic
 The RSE provides an absolute measure of lack of fit of the model to the data. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE. The $R^2$ statistic provides an alternative measure of fit. It takes the form of a proportion—the proportion of variance explained—and so it always takes on a value between 0 and 1, and is independent of the scale of Y .
-![](ISLR/pics/ch3-4.png)
+![](pics/ch3-4.png)
 TTS is the total variance in Y. TSS - RSS is the ammount of variability that can be explained with the regression. $R^2$ is the proportion of the variability of Y that can be explained with X. Higher is better. If it's low, regression did not explain much of the variability, and may be due the fact that the real world problem isnt linear at all or that the inherent error $\sigma^2$ is high. 
 The pro is that it's way more interpretable than RSE. How close to 1 is acceptable depends on the context. In physics, a number that's not extremely close to 1 might indicate a serious problem with the experiment, but in biology, sociology, etc, a value of 0.1 might be realistic. Also, correlation(r) is another good measure. In simple linear regression, $R^2 = r^2$ 
@ -101,15 +101,15 @@ To answer this, we test the null hypothesis:
 H0 : every $X_i$ is zero.
 We do this with the **F-statistic**
-![](ISLR/pics/ch3-5.png)
+![](pics/ch3-5.png)
-![](ISLR/pics/ch3-6.png)
+![](pics/ch3-6.png)
 Hence, when there is no relationship between the response and predictors,
 one would expect the F-statistic to take on a value close to 1, and a lot greater than 1 otherwise.
 The larger the number of datapoints n, the smaller F has to be to reject the null hypothesis. Every good software package provides a way to calculate the **p-value** associated with the F-statistic using this distribution. Based on this p-value, we can determine whether or not to reject H0 . 
 Sometimes we only want to see wether a subset q of the coefficients is zero. We just create a model with only those subset of predictors, and do the same analysis as above, but this time, 
-![](ISLR/pics/ch3-7.png)
+![](pics/ch3-7.png)
 **if p > n, we can't fit the linear regression model with least squares, so we don't use the F statistic, or most concepts discussed in this chapter. When p is large, some of the approaches discussed in the next section, such as *forward selection*, can be used. This *high-dimensional* setting  will be discussed later.**
@ -117,7 +117,7 @@ Sometimes we only want to see wether a subset q of the coefficients is zero. We
 *Variable selection*, the practice of determining which predictors are associated with the response, in order to fit a single model involving only those predictors is extensively discussed in Ch6, but we'll go a bit in it here.
-![](ISLR/pics/ch3-8.png)
+![](pics/ch3-8.png)
 Unfortunately, we need to fit and test $2^p$ models, which might be very impractical, so we need an automated and efficient approach.
 There are 3 classical approaches available:
 * Forward selection. We start with a model with no predictors. We then test simple regression models for all p, selecting the one with the lowest RSS. We then test all models with 2 variables containing the previous one, again selecting the one with the lowest RSS. We keep this up untill some stopping rule says we stop (e.g. we only want 5 vars). 
@ -183,7 +183,7 @@ However, it is sometimes the case that an interaction term has a very small p-va
 Here we present a very simple way to directly extend the linear model to accommodate non-linear relationships, using polynomial regression. We'll present more complex methods later. 
 Take the example of miles per gallon with horsepower as predictor
-![](ISLR/pics/ch3-9.png)
+![](pics/ch3-9.png)
 We clearly see a relationship, but we also see it's nonlinear. 
 Now, the model looks like 
 $$
@ -215,8 +215,8 @@ If we spot a nonlinear problems, we can include nonlinear transformations such a
 ### 2. Correlation of Error Terms
 we made the assumption that $\epsilon_0, \ldots, \epsilon_n$ are not correlated. The standard errors calculated are certainly based on that assumption. But if there is correlation, the estimated standard errors  will underestimate the true standard errors. Confidence and prediction intervals will be smaller than they should be. p-values will be lower. We may falsely conclude that a predictor is statistically significant. 
-![](ISLR/pics/ch3-10.png)
+![](pics/ch3-10.png)
-![](ISLR/pics/ch3-11.png)
+![](pics/ch3-11.png)
 There are many different methods to take into account the correlation of error terms in time series data. But it can certainly happen outside of time series data. 
 **In general, the assumption of uncorrelated errors is extremely important for linear regression as well as for other statistical methods, and good experimental design is crucial in order to mitigate the risk of such correlations.**
@ -239,7 +239,7 @@ Another important assumption is that the errors have a constant variance. The st
 One can identify non-constant variances in
 the errors, or **heteroscedasticity**, from the presence of a funnel shape in the residual plot.  When faced with this problem, one possible solution is to transform the response Y using a concave function such as log Y or sqrt(Y). If instead the error decreases, we could maybe use Y^2. Such a transformation results in a greater amount of shrinkage of the larger responses, leading to a reduction in heteroscedasticity. 
-![](ISLR/pics/ch3-12.png)
+![](pics/ch3-12.png)
@ -254,7 +254,7 @@ each of these raw observations is uncorrelated with variance σ 2 , then their a
 if Yi is far from predicted Yi. They arise from various causes, such as incorrect recording of data. 
 It is typical for an outlier that does not have an unusual
 predictor value to have little effect on the least squares fit. However, even if an outlier does not have much effect on the least squares fit, it can cause other problems. For instance, in this example, the RSE is 1.09 when the outlier is included in the regression, but it is only 0.77 when the outlier is removed. Since the RSE is used to compute all confidence intervals and p-values, such a dramatic increase caused by a single data point can have implications for the interpretation of the fit. Similarly, inclusion of the outlier causes the R2 to decline from 0.892 to 0.805.
-![](ISLR/pics/ch3-13.png)
+![](pics/ch3-13.png)
 Residual plots can be used to identify outliers. But in practice, it can be difficult to decide how large a residual needs to be before we consider the point to be an outlier. To address this problem, instead of plotting the residuals, we can plot the **studentized residuals**, computed by dividing each residual ei by its estimated error. Observations whose studentized residuals are greater than 3 in absolute value are possible outliers.
 If we believe that an outlier has occurred due to an error in data collection or recording, then one solution is to simply remove the observation. However, care should be taken, since an outlier may instead indicate a deficiency with the model, such as a missing predictor.
@ -263,7 +263,7 @@ If we believe that an outlier has occurred due to an error in data collection or
 ### 5. High Leverage Points
 This is kinda the reverse of an outlier. Instead of having a unusual Y for an X, observations with **high leverage** have an unusual X. 
-![](ISLR/pics/ch3-14.png)
+![](pics/ch3-14.png)
 High leverage observations tend to have a sizable impact on the estimated regression line. It is cause for concern if the least squares line is heavily affected by just a couple of observations, because any problems with these points may invalidate the entire fit. 
@ -275,7 +275,7 @@ statistic.
-![](ISLR/pics/ch3-15.png)
+![](pics/ch3-15.png)
 There is a simple extension of hi to the case of multiple predictors, though we do not provide the formula here. The leverage statistic hi is always between 1/n and 1, and the average leverage for all the observations is always equal to (p + 1)/n. So if a given observation has a leverage statistic that greatly exceeds (p+ 1)/n, then we may suspect that the corresponding point has high leverage.
@ -288,7 +288,7 @@ This refers to when 2 or more variables are closely related/dependent.  The pres
 The left-hand panel of Figure 3.15 is a contour plot of the RSS (3.22)
 associated with different possible coefficient estimates for the regression of balance on limit and age . 
-![](ISLR/pics/ch3-16.png)
+![](pics/ch3-16.png)
 **Blah blagh GO READ THE TEXT AGAIN**
 This results in a great deal of uncertainty in the
@ -304,7 +304,7 @@ has a particularly high correlation. We call this situation **multicollinearity*
 A better way to assess multicollinearity is to compute the **variance inflation factor** (VIF). The VIF is  the ratio of the variance of β̂j when fitting the full model divided by the variance of β̂j if fit on its own. The smallest possible value for VIF is 1, which indicates the complete absence of collinearity. Typically in practice there is a small amount of collinearity among the predictors. Good rule of thumb: **VIF > 5 or 10 = BAD!!**
-![](ISLR/pics/ch3-17.png)
+![](pics/ch3-17.png)
 When faced with the problem of collinearity, there are two simple solu-
 tions. The first is to drop one of the problematic variables from the regression. This isn't a huge loss, because colinearity indicates that one of them is redundant in the presence of the other variable(s).
@ -332,6 +332,84 @@ there is effectively a reduction in sample size.
 As a general rule, parametric methods will tend to outperform non-parametric approaches when there is a small number of observations per predictor.
 # Excercises
 1. Describe the null hypotheses to which the p-values given in Table 3.4
 correspond. Explain what conclusions you can draw based on these
 p-values. Your explanation should be phrased in terms of sales, TV,
 radio, and newspaper, rather than in terms of the coefficients of the
 linear model.
 **The p-values of Intercept, radio and TV are extremely close to 0, and we can reject the null hypotheses of these. Furthermore, these also have a high T-value, further strengthening this.
 This is in opposition to newspaper, with low t-values and high p-values, meaning that there isn't a strong correlation between newspaper and sales.**
 2.Carefully explain the differences between the KNN classifier and KNN
 regression methods.
 KNN classifier returns a category, regression returns a number.
 classifier returns the category, and the probability samecategory/neighbours.
 regression returns the mean of all the neighbours nearby.
 Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ,
 X3 = Gender (1 for Female and 0 for Male), X4 = Interaction between
 GPA and IQ, and X5 = Interaction between GPA and Gender. The
 response is starting salary after graduation (in thousands of dollars).
 Suppose we use least squares to fit the model, and get β̂0 = 50, β̂1 =
 20, β̂2 = 0.07, β̂3 = 35, β̂4 = 0.01, β̂5 = −10.
 For a fixed value of IQ and GPA, males earn more on average
 than females.
 **False.  As long as GPA < 3.5, females earn more**
 For a fixed value of IQ and GPA, females earn more on
 average than males.
 **false, for a gpa > 3.5, male earns more**
 iii. For a fixed value of IQ and GPA, males earn more on average
 than females provided that the GPA is high enough.
 **True**
 Predict the salary of a female with IQ of 110 and a GPA of 4.0.
 **200 + 7.7 + 35 + 4.4 -40 = 207.1 x thousand**
 True or false: Since the coefficient for the GPA/IQ interaction
 term is very small, there is very little evidence of an interaction
 effect. Justify your answer.
 False. **You have to look at the p values, the R values and the t value for that**
 4. I collect a set of data (n = 100 observations) containing a single
 predictor and a quantitative response. I then fit a linear regression
 model to the data, as well as a separate cubic regression, i.e. Y =
 β0 + β1 X + β2 X 2 + β3 X 3 + .
 (a) Suppose that the true relationship between X and Y is linear,
 i.e. Y = β0 + β1 X + elipson. Consider the training residual sum of
 squares (RSS) for the linear regression, and also the training
 RSS for the cubic regression. Would we expect one to be lower
 than the other, would we expect them to be the same, or is there
 not enough information to tell? Justify your answer.
 **cubic expression would either be the same or larger. The same, because the model can compute the betas > 1 to be zero. Or larger, if it finds patterns that aren't there**
 (b) Answer (a) using test rather than training RSS.
 **We would find low t values for the betas > 1.**
 (c) Suppose that the true relationship between X and Y is not linear,
 but we don’t know how far it is from linear. Consider the training
 RSS for the linear regression, and also the training RSS for the
 cubic regression. Would we expect one to be lower than the
 other, would we expect them to be the same, or is there not
 enough information to tell? Justify your answer.
 **I expect the RSS to be lower for cubic regression, because it'll be able to model it better**
--- a/ISLR/notebooks/.ipynb_checkpoints/3.6.2.R-checkpoint.ipynb
+++ b/ISLR/notebooks/.ipynb_checkpoints/3.6.2.R-checkpoint.ipynb
@ -0,0 +1,338 @@
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(MASS)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<table width=\"100%\" summary=\"page for Boston {MASS}\"><tr><td>Boston {MASS}</td><td style=\"text-align: right;\">R Documentation</td></tr></table>\n",
       "\n",
       "<h2>\n",
       "Housing Values in Suburbs of Boston\n",
       "</h2>\n",
       "\n",
       "<h3>Description</h3>\n",
       "\n",
       "<p>The <code>Boston</code> data frame has 506 rows and 14 columns.\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Usage</h3>\n",
       "\n",
       "<pre>\n",
       "Boston\n",
       "</pre>\n",
       "\n",
       "\n",
       "<h3>Format</h3>\n",
       "\n",
       "<p>This data frame contains the following columns:\n",
       "</p>\n",
       "\n",
       "<dl>\n",
       "<dt><code>crim</code></dt><dd>\n",
       "<p>per capita crime rate by town.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>zn</code></dt><dd>\n",
       "<p>proportion of residential land zoned for lots over 25,000 sq.ft.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>indus</code></dt><dd>\n",
       "<p>proportion of non-retail business acres per town.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>chas</code></dt><dd>\n",
       "<p>Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>nox</code></dt><dd>\n",
       "<p>nitrogen oxides concentration (parts per 10 million).\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>rm</code></dt><dd>\n",
       "<p>average number of rooms per dwelling.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>age</code></dt><dd>\n",
       "<p>proportion of owner-occupied units built prior to 1940.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>dis</code></dt><dd>\n",
       "<p>weighted mean of distances to five Boston employment centres.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>rad</code></dt><dd>\n",
       "<p>index of accessibility to radial highways.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>tax</code></dt><dd>\n",
       "<p>full-value property-tax rate per \\$10,000.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>ptratio</code></dt><dd>\n",
       "<p>pupil-teacher ratio by town.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>black</code></dt><dd>\n",
       "<p><i>1000(Bk - 0.63)^2</i> where <i>Bk</i> is the proportion of blacks\n",
       "by town.\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>lstat</code></dt><dd>\n",
       "<p>lower status of the population (percent).\n",
       "</p>\n",
       "</dd>\n",
       "<dt><code>medv</code></dt><dd>\n",
       "<p>median value of owner-occupied homes in \\$1000s.\n",
       "</p>\n",
       "</dd>\n",
       "</dl>\n",
       "\n",
       "\n",
       "\n",
       "<h3>Source</h3>\n",
       "\n",
       "<p>Harrison, D. and Rubinfeld, D.L. (1978)\n",
       "Hedonic prices and the demand for clean air.\n",
       "<em>J. Environ. Economics and Management</em>\n",
       "<b>5</b>, 81&ndash;102.\n",
       "</p>\n",
       "<p>Belsley D.A., Kuh, E.  and Welsch, R.E. (1980)\n",
       "<em>Regression Diagnostics. Identifying Influential Data and Sources\n",
       "of Collinearity.</em>\n",
       "New York: Wiley.\n",
       "</p>\n",
       "\n",
       "<hr /><div style=\"text-align: center;\">[Package <em>MASS</em> version 7.3-51.5 ]</div>"
      ],
      "text/latex": [
       "\\inputencoding{utf8}\n",
       "\\HeaderA{Boston}{Housing Values in Suburbs of Boston}{Boston}\n",
       "\\keyword{datasets}{Boston}\n",
       "%\n",
       "\\begin{Description}\\relax\n",
       "The \\code{Boston} data frame has 506 rows and 14 columns.\n",
       "\\end{Description}\n",
       "%\n",
       "\\begin{Usage}\n",
       "\\begin{verbatim}\n",
       "Boston\n",
       "\\end{verbatim}\n",
       "\\end{Usage}\n",
       "%\n",
       "\\begin{Format}\n",
       "This data frame contains the following columns:\n",
       "\\begin{description}\n",
       "\n",
       "\\item[\\code{crim}] \n",
       "per capita crime rate by town.\n",
       "\n",
       "\\item[\\code{zn}] \n",
       "proportion of residential land zoned for lots over 25,000 sq.ft.\n",
       "\n",
       "\\item[\\code{indus}] \n",
       "proportion of non-retail business acres per town.\n",
       "\n",
       "\\item[\\code{chas}] \n",
       "Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n",
       "\n",
       "\\item[\\code{nox}] \n",
       "nitrogen oxides concentration (parts per 10 million).\n",
       "\n",
       "\\item[\\code{rm}] \n",
       "average number of rooms per dwelling.\n",
       "\n",
       "\\item[\\code{age}] \n",
       "proportion of owner-occupied units built prior to 1940.\n",
       "\n",
       "\\item[\\code{dis}] \n",
       "weighted mean of distances to five Boston employment centres.\n",
       "\n",
       "\\item[\\code{rad}] \n",
       "index of accessibility to radial highways.\n",
       "\n",
       "\\item[\\code{tax}] \n",
       "full-value property-tax rate per \\bsl{}\\$10,000.\n",
       "\n",
       "\\item[\\code{ptratio}] \n",
       "pupil-teacher ratio by town.\n",
       "\n",
       "\\item[\\code{black}] \n",
       "\\eqn{1000(Bk - 0.63)^2}{} where \\eqn{Bk}{} is the proportion of blacks\n",
       "by town.\n",
       "\n",
       "\\item[\\code{lstat}] \n",
       "lower status of the population (percent).\n",
       "\n",
       "\\item[\\code{medv}] \n",
       "median value of owner-occupied homes in \\bsl{}\\$1000s.\n",
       "\n",
       "\n",
       "\\end{description}\n",
       "\n",
       "\\end{Format}\n",
       "%\n",
       "\\begin{Source}\\relax\n",
       "Harrison, D. and Rubinfeld, D.L. (1978)\n",
       "Hedonic prices and the demand for clean air.\n",
       "\\emph{J. Environ. Economics and Management}\n",
       "\\bold{5}, 81--102.\n",
       "\n",
       "Belsley D.A., Kuh, E.  and Welsch, R.E. (1980)\n",
       "\\emph{Regression Diagnostics. Identifying Influential Data and Sources\n",
       "of Collinearity.}\n",
       "New York: Wiley.\n",
       "\\end{Source}"
      ],
      "text/plain": [
       "Boston                  package:MASS                   R Documentation\n",
       "\n",
       "_\bH_\bo_\bu_\bs_\bi_\bn_\bg _\bV_\ba_\bl_\bu_\be_\bs _\bi_\bn _\bS_\bu_\bb_\bu_\br_\bb_\bs _\bo_\bf _\bB_\bo_\bs_\bt_\bo_\bn\n",
       "\n",
       "_\bD_\be_\bs_\bc_\br_\bi_\bp_\bt_\bi_\bo_\bn:\n",
       "\n",
       "     The ‘Boston’ data frame has 506 rows and 14 columns.\n",
       "\n",
       "_\bU_\bs_\ba_\bg_\be:\n",
       "\n",
       "     Boston\n",
       "     \n",
       "_\bF_\bo_\br_\bm_\ba_\bt:\n",
       "\n",
       "     This data frame contains the following columns:\n",
       "\n",
       "     ‘crim’ per capita crime rate by town.\n",
       "\n",
       "     ‘zn’ proportion of residential land zoned for lots over 25,000\n",
       "          sq.ft.\n",
       "\n",
       "     ‘indus’ proportion of non-retail business acres per town.\n",
       "\n",
       "     ‘chas’ Charles River dummy variable (= 1 if tract bounds river; 0\n",
       "          otherwise).\n",
       "\n",
       "     ‘nox’ nitrogen oxides concentration (parts per 10 million).\n",
       "\n",
       "     ‘rm’ average number of rooms per dwelling.\n",
       "\n",
       "     ‘age’ proportion of owner-occupied units built prior to 1940.\n",
       "\n",
       "     ‘dis’ weighted mean of distances to five Boston employment\n",
       "          centres.\n",
       "\n",
       "     ‘rad’ index of accessibility to radial highways.\n",
       "\n",
       "     ‘tax’ full-value property-tax rate per \\$10,000.\n",
       "\n",
       "     ‘ptratio’ pupil-teacher ratio by town.\n",
       "\n",
       "     ‘black’ 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by\n",
       "          town.\n",
       "\n",
       "     ‘lstat’ lower status of the population (percent).\n",
       "\n",
       "     ‘medv’ median value of owner-occupied homes in \\$1000s.\n",
       "\n",
       "_\bS_\bo_\bu_\br_\bc_\be:\n",
       "\n",
       "     Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the\n",
       "     demand for clean air.  _J. Environ. Economics and Management_ *5*,\n",
       "     81-102.\n",
       "\n",
       "     Belsley D.A., Kuh, E.  and Welsch, R.E. (1980) _Regression\n",
       "     Diagnostics. Identifying Influential Data and Sources of\n",
       "     Collinearity._ New York: Wiley.\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "?Boston"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "Call:\n",
       "lm(formula = medv ~ lstat, data = Boston)\n",
       "\n",
       "Residuals:\n",
       "    Min      1Q  Median      3Q     Max \n",
       "-15.168  -3.990  -1.318   2.034  24.500 \n",
       "\n",
       "Coefficients:\n",
       "            Estimate Std. Error t value Pr(>|t|)    \n",
       "(Intercept) 34.55384    0.56263   61.41   <2e-16 ***\n",
       "lstat       -0.95005    0.03873  -24.53   <2e-16 ***\n",
       "---\n",
       "Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1\n",
       "\n",
       "Residual standard error: 6.216 on 504 degrees of freedom\n",
       "Multiple R-squared:  0.5441,\tAdjusted R-squared:  0.5432 \n",
       "F-statistic: 601.6 on 1 and 504 DF,  p-value: < 2.2e-16\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "lm.fit = lm(medv~lstat, data=Boston)\n",
    "summary(lm.fit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
--- a/ISLR/notebooks/.ipynb_checkpoints/3.6.2.python-checkpoint.ipynb
+++ b/ISLR/notebooks/.ipynb_checkpoints/3.6.2.python-checkpoint.ipynb
--- a/ISLR/notebooks/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
+++ b/ISLR/notebooks/.ipynb_checkpoints/ch2-9-checkpoint.ipynb
--- a/ISLR/notebooks/3.6.2.R.ipynb
+++ b/ISLR/notebooks/3.6.2.R.ipynb
--- a/ISLR/notebooks/3.6.2.python.ipynb
+++ b/ISLR/notebooks/3.6.2.python.ipynb