4.2 The Regression Equation

Because we could draw many different lines through the cluster of data points, we need a method to choose the “best” line. The method, called the least-squares criterion, is based on an analysis of the errors made in using a line to fifit the data points.






Suppose that a scatterplot indicates a linear relationship between two variables. Then,within the range of the observed values of the predictor variable, we can reasonably use the regression equation to make predictions for the response variable. However,to do so outside that range, which is called extrapolation,


In the context of regression, an outlier is a data point that lies far from the regression line

Outliers and Influential Observations


influential observation : a data point whose removal causes the regression equation (and line) to change considerably

Eg.在加入(2,169)前后的直线发生了巨大变化,所以(2,169)是一个influential observation



2.添加influential observation 周围的点

Nonetheless, we may need either to remove it—thus limiting the analysis to Orions between 4 and 7 years old—or to obtain additional data on 2- and 3-year-old Orions so that the regression analysis is not so dependent on one data point

outlier和influential observation实际上很难分清:An outlier may or may not be an inflfluential observation, and an inflfluential observation may or may not be an outlier. Many statistical software packages identify potential outliers and inflfluential observations.


该分布实际上应该为curvilinear regression



