Abstract
In previous chapters, we have had data for which there has been a dependent variable (Y ) and an independent variable (X – even though, to be consistent with the notation that is close to universal in the field of experimental design, we have been using factor names, A, B, etc., or “column factor” and “row factor,” instead of, literally, the letter X ). The latter has been treated mostly as a categorical variable, whether actually numerical/metric or not. Often, we have had more than one independent variable. Assuming only one independent variable, if we want to say it this way (and we do!), we can say that we have had n (X, Y ) pairs of data, where n is the total number of data points. With more than one independent variable, we can say that we have n (X 1, X 2, …, Y ) data points.
Notes
- 1.
Unless the scatter diagram indicates dramatically otherwise, we usually consider first a straight line, since it is the simplest functional form.
- 2.
The correlation coefficient is calculated as
$$ r=\frac{\sum_{i=1}^n\left({X}_i-\overline{X}\right)\left({Y}_i-\overline{Y}\right)}{\left(n-1\right){s}_X{s}_Y} $$where s X and s Y are the sample standard deviation of the independent and the dependent variables, respectively, and, of course, \( \overline{X} \) and \( \overline{Y} \) are their sample means. Note that r is the same, regardless of which variable is labeled as independent or dependent.
- 3.
There are an infinite number of possible lines – so that REALLY would take up our free time! Until the availability of personal computers with software packages such as Excel , we would find the parameter values by evaluating equations.
- 4.
It can be shown that the LS line is unique.
- 5.
Confidence intervals for predictions are covered in the next chapter.
Author information
Authors and Affiliations
1 Electronic Supplementary Material
Appendix
Appendix
Example 14.9 Trends in Selling Toys using R
To analyze the same example, we can import the data as we have done previously or create our own in R. We will demonstrate the second option here – after all, it is more fun! This is how it is done:
> advertisement <- c(10, 20, 30, 40, 50, 60, 70, 80) > sales <- c(35, 80, 56, 82, 126, 104, 177, 153) > toy <- data.frame(advertisement, sales)
A quick inspection will show us the data frame was successfully created.
> toy
advertisement | sales | |
1 | 10 | 35 |
2 | 20 | 80 |
3 | 30 | 56 |
4 | 40 | 82 |
5 | 50 | 126 |
6 | 60 | 104 |
7 | 70 | 177 |
8 | 80 | 153 |
Using the plot() function, we can generate a scatter plot of our data, shown in Fig. 14.10. The correlation analysis is shown next.
> plot(toy, pch=16, cex=1.0, main="Sales +vs. Advertisement")
> cor(toy,method="pearson")
advertisement | sales | |
advertisement | 1.0000000 | 0.9060138 |
sales | 0.9060138 | 1.0000000 |
> cor.test(toy$advertisement, toy$sales, method="pearson") Pearson's product-moment correlation data: toy$advertisement and toy$sales t = 5.2434, df = 6, p-value = 0.001932 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.5568723 0.9830591 sample estimates: cor 0.9060138
Now, we will perform a regression analysis , using the lm() function.
> toy_regression <- lm(sales~advertisement, data=toy) > summary(toy_regression) Call: lm(formula = sales ~ advertisement, data = toy)
Residuals:
Min | 1Q | Median | 3Q | Max |
-24.393 | -13.027 | -7.434 | 17.336 | 30.762 |
Coefficients:
Estimate | Std. Error | t value | Pr(>|t|) | ||
(Intercept) | 21.3214 | 17.1861 | 1.241 | 0.26105 | |
advertisement | 1.7845 | 0.3403 | 5.243 | 0.00193 | ** |
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 22.06 on 6 degrees of freedom Multiple R-squared: 0.8209, Adjusted R-squared: 0.791 F-statistic: 27.49 on 1 and 6 DF, p-value: 0.001932 > anova(toy_regression) Analysis of Variance Table Response: sales
Df | Sum Sq | Mean Sq | F value | Pr(>F) | ||
advertisement | 1 | 13375.0 | 13375.0 | 27.494 | 0.001932 | ** |
Residuals | 6 | 2918.9 | 486.5 |
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
To include the regression line in the scatter plot (shown in Fig. 14.11), we can use any of the following commands:
> abline(21.3214, 1.7845) > abline(lm(sales~advertisement))
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Berger, P.D., Maurer, R.E., Celli, G.B. (2018). Introduction to Simple Regression. In: Experimental Design. Springer, Cham. https://doi.org/10.1007/978-3-319-64583-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-64583-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64582-7
Online ISBN: 978-3-319-64583-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)