Statistics Review
- 均值 sample mean
- 方差 sample variance
(无偏估计:将上式中$\frac{1}{n}$替换为$\frac{1}{n-1}$)
或:
\[s_x^2 = \frac{1}{n} \sum_{i=1}^n x_i^2 - 2\cdot\overline{x}\cdot\frac{1}{n}\sum_{i=1}^n x_i + \overline{x}^2 = \frac{1}{n} x_i^2 - \overline{x}^2 \\ \Rightarrow \frac{1}{n} \sum_{i=1}^n x_i^2 = s_x^2+\overline{x}^2\]- 标准差 sample standard deviation, SD
- 协方差 sample covariance:描述了${x_i}$和${y_i}$的相关程度
或:
\[s_{xy} = \frac{1}{n} \sum_{i=1}^n x_iy_i - \overline{xy} \\ \Rightarrow \frac{1}{n}\sum_{i=1}^n x_iy_i = s_{xy} + \overline{xy}\]- 相关系数 sample (Pearson) correlation coefficient
Simple Linear Regression
Model
模型($x$是scalar):
\[y \approx \beta_0 + \beta_1 x \triangleq \hat{y}\]其中$\beta_0$是截距intercept,$\beta_1$是斜率slope,$\hat{y}$是预测的标签prediction
称$\mathbf{\beta} = [\beta_0, \beta_1]^T$为模型的参数parameters,也叫做coefficients或weights
Residual
然而$x$并不能确切地预测$y$,即$y \approx \beta_0+\beta_1 x$,我们可以进一步引入残差 (residual) $\epsilon$:
\[y = \beta_0+\beta_1 x + \epsilon\]所以对于第$i$个样本,我们有
- 预测值 predicted value:$\hat{y}_i = \beta_0 + \beta_1 x_i$
- 残差 residual:$\epsilon_i = y_i - \hat{y}_i$
下图中,残差$\epsilon_i$是$y_i$到红色拟合曲线的垂直距离
Least-Squares Fit
我们通过最小化残差平方和(RSS, the residual sum of squares) 来选择模型参数$\mathbf{\beta} = [\beta_0, \beta_1]^T$:
\[RSS(\beta_0, \beta_1) \triangleq \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n (y_i-\hat{y}_i)^2\]其中$\epsilon_i$和$\hat{y}_i$是$[\beta_0, \beta_1]$的函数
几何上看,最小化RSS $\Leftrightarrow$ 最小化样本点到拟合曲线距离平方的和
Minimizing RSS
可以通过求导最小化RSS
\[0 = \frac{\partial RSS(\beta_0, \beta_1)}{\partial \beta_0} = \frac{\partial}{\partial \beta_0} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 = -2\sum_{i=1}^n (y_i-\beta_0-\beta_1x_i) \qquad(1)\\ 0 = \frac{\partial RSS(\beta_0, \beta_1)}{\partial \beta_1} = \frac{\partial}{\partial \beta_1} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 = -2\sum_{i=1}^n x_i(y_i-\beta_0-\beta_1x_i) \quad(2)\](1) 式可得$\beta_0 = \overline{y} - \beta_1 \overline{x} \quad (3)$
(2) 式可得$0 = \dfrac{1}{n} \sum_{i=1}^n x_iy_i - \overline{x} \beta_0 - \beta_1 \dfrac{1}{n} \sum_{i=1}^n x_i^2 \quad (4)$
将(3)代入(4)可得$0=\dfrac{1}{n}\sum_{i=1}^n x_iy_i - \overline{xy} - \beta_1(\dfrac{1}{n} \sum_{i=1}^n x_i^2 - \overline{x}^2) \Leftrightarrow \beta_1 = \dfrac{s_{xy}}{s_{xx}} \quad (5)$
将(3)和(5)代回RSS,可得
\[RSS(\beta_0, \beta_1) = n(1-\frac{s_{xy}^2}{s_{xx}s_{yy}})s_{yy} = n(1-\rho_{xy}^2)s_{yy}\]$R^2$ Goodness-of-Fit
定义
\[R^2 \triangleq \frac{1-RSS/n}{s_y^2}\]$R^2$也叫做决定系数 coefficient of determination
- $R^2=1$:完美预测器 ($\hat{y}_i = y_i$)
- $R^2=0$:平均水平 ($\hat{y}_i = y_i$)
- $R^2<0$:比平均还差
如果用最小二乘法的系数,有$R^2 = \rho_{xy}^2$
另,由于最小二乘法中,$\beta_1 = \dfrac{\rho_{xy}s_y}{s_x}$,可得$sgn(\beta_1) = sgn(\rho_{xy})$
Multi-Linear Regression
Notation
- $\mathbf{X}$: feature matrix