Ridge regression, Lasso regression

Ridge regression/Lasso regression is a regression analysis method that can suppress overfitting by adding a loss term to the loss function according to the magnitude of the regression coefficient (regularization). The idea behind this is that the larger the coefficient, the larger the fluctuation range of the output value for the input to the regression equation, which tends to lead to overfitting, so it is preferable to have as small a coefficient as possible.

The difference between ridge regression and lasso regression is that the loss for the regularization term coefficient is different: ridge regression is the "sum of squares of coefficients," while lasso regression is the "sum of absolute values of coefficients." This is a difference in how far a coefficient is from the standard, with ridge regression based on Euclidean distance (L2 regularization term) and lasso regression based on Manhattan distance (L1 regularization term).

In other words, these regressions can be thought of as "determining (regularizing) the coefficients that minimize the sum of squares of the error within a specified region defined by Euclidean distance or Manhattan distance."

■Parameter (coefficient) derivation method

Let's take ridge regression as an example. The point where the result of differentiating the above ridge regression formula with respect to w becomes 0 is the minimum value of the error. We will handle transposed matrices along the way.

■Ridge regression implementation example (python)

Explanation of an example of ridge regression implementation in Python. The implementation environment and libraries are as follows.

　・python ver：3.9
　・Required Libraries：numpy, matplotlib, scipy　How to install

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

x = np.array([-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1])
y = np.array([ 2, 1.8, 1.7, 1.6, 1.8, 2, 2.2, 2.1, 2.0, 2.3, 2.4])

def ridge(w, x, y, d, lamda):
  y_ = np.poly1d(w)(x)
  error = np.sum((y - y_) ** 2) / len(y_) + lamda * np.sum(w ** 2)   # Loss function (ridge regression)
  #error = np.sum((y - y_) ** 2) / len(y_) + lamda * np.sum(np.abs(w))   # Loss function (Lasso regression)

  return error

d = 5     # Order
lamda = 0.01    # Adjustment Factor
w_init = np.ones(d+1)
result = minimize(ridge, w_init, args=(x, y, d, lamda), method="Nelder-Mead")   # Minimize the value of a function
w = result.x   # Output the coefficients

print(w)
plt.scatter(x, y)
plot_x = np.arange(-1,1.1,0.05)
plt.plot(plot_x, np.poly1d(w)(plot_x))   # Draw a graph
plt.show()

＜Program execution results＞
The result is as follows. You can set the order of the regression equation d in the program, and the adjustment coefficient for the regularization term with lamda. If you set lamda to 0, it will be a simple least squares method that is not a ridge regression.

Compared to the results of simple least squares fitting, we can see that ridge regression can set higher-order regression equations without overfitting.

＜Note＞
If you set lamda too large, the fitting will not work properly, as shown below. The following is the case when lamda=0.1.

■Lasso regression implementation example (python)

All you have to do is change the loss function part of the above program to the lasso regression formula (swap the line in red).

■What are Ridge Regression and Lasso Regression

■Parameter (coefficient) derivation method

■Ridge regression implementation example (python)

■Lasso regression implementation example (python)