Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added Q: Linear Regression using OLS
  • Loading branch information
Jeet009 committed Sep 16, 2025
commit 57e1293a1c23495202e228eb0b0faf865b799896
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
### Problem

Implement simple linear regression using Ordinary Least Squares (OLS). Given 1D inputs `X` and targets `y`, compute the slope `m`, intercept `b`, and use them to predict on a provided test input.

You should implement the closed-form OLS solution:

$$
m = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2},\quad
b = \bar{y} - m\,\bar{x}.
$$

Then, given `X_test`, output predictions `y_pred = m * X_test + b`.

Return values: `m`, `b`, and `y_pred`.

Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"input": "X_train = [1, 2, 3]; y_train = [2, 2.5, 3.5]; X_test = [4]",
"output": "m = 0.75, b = 1.166667, y_pred = [4.166667]",
"reasoning": "Using OLS: m = Cov(X,Y)/Var(X) = 1.5/2 = 0.75 and b = y_bar - m*x_bar = (8/3) - 0.75*2 = 1.166667. Prediction for X_test=[4] is 0.75*4 + 1.166667 = 4.166667."
}


39 changes: 39 additions & 0 deletions questions/186_linear_regression_ordinary_least_squares/learn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## Learning: Ordinary Least Squares for Simple Linear Regression

### Idea and formula
- **Goal**: Fit a line $y = m x + b$ that minimizes the sum of squared errors.
- **Closed-form OLS solution** for 1D features:

$$
m = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2},\quad
b = \bar{y} - m\,\bar{x}
$$

### Intuition
- The numerator is the sample covariance between $x$ and $y$; the denominator is the sample variance of $x$.
- So $m = \operatorname{Cov}(x,y) / \operatorname{Var}(x)$ measures how much $y$ changes per unit change in $x$.
- The intercept $b$ anchors the best-fit line so it passes through the mean point $(\bar{x},\bar{y})$.

### Algorithm steps
1. Compute $\bar{x}$ and $\bar{y}$.
2. Accumulate numerator $\sum_i (x_i-\bar{x})(y_i-\bar{y})$ and denominator $\sum_i (x_i-\bar{x})^2$.
3. Compute $m = \text{numerator}/\text{denominator}$ (guard against zero denominator).
4. Compute $b = \bar{y} - m\,\bar{x}$.
5. Predict: $\hat{y} = m\,x + b$ for any new $x$.

### Edge cases and tips
- If all $x_i$ are identical, $\operatorname{Var}(x)=0$ and the slope is undefined. In practice, return $m=0$ and $b=\bar{y}$ or raise an error.
- Centering data helps numerical stability but is not required for the closed form.
- Outliers can strongly influence OLS; consider robust alternatives if needed.

### Worked example
Given $X = [1,2,3]$ and $y = [2,2.5,3.5]$:

- $\bar{x} = 2$, $\bar{y} = 8/3$.
- $\sum (x_i-\bar{x})(y_i-\bar{y}) = (1-2)(2-8/3) + (2-2)(2.5-8/3) + (3-2)(3.5-8/3) = 1.5$
- $\sum (x_i-\bar{x})^2 = (1-2)^2 + (2-2)^2 + (3-2)^2 = 2$
- $m = 1.5/2 = 0.75$
- $b = \bar{y} - m\,\bar{x} = 8/3 - 0.75\cdot 2 = 1.166666\ldots$

Prediction for $X_{test} = [4]$: $y_{pred} = 0.75\cdot 4 + 1.1666\ldots = 4.1666\ldots$

16 changes: 16 additions & 0 deletions questions/186_linear_regression_ordinary_least_squares/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"id": "186",
"title": "Linear Regression via Ordinary Least Squares (OLS)",
"difficulty": "hard",
"category": "Machine Learning",
"video": "",
"likes": "0",
"dislikes": "0",
"contributor": [
{
"profile_link": "https://github.com/Jeet009",
"name": "Jeet Mukherjee"
}
]
}

23 changes: 23 additions & 0 deletions questions/186_linear_regression_ordinary_least_squares/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from typing import List, Tuple


def fit_and_predict(X_train: List[float], y_train: List[float], X_test: List[float]) -> Tuple[float, float, List[float]]:
n = len(X_train)
x_mean = sum(X_train) / n
y_mean = sum(y_train) / n

num = 0.0
den = 0.0
for i in range(n):
dx = X_train[i] - x_mean
dy = y_train[i] - y_mean
num += dx * dy
den += dx * dx

m = num / den if den != 0 else 0.0
b = y_mean - m * x_mean

y_pred = [m * x + b for x in X_test]
return m, b, y_pred


Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from typing import List, Tuple


def fit_and_predict(X_train: List[float], y_train: List[float], X_test: List[float]) -> Tuple[float, float, List[float]]:
"""
Implement simple linear regression (OLS) to compute slope m, intercept b,
and predictions on X_test.

Returns (m, b, y_pred).
"""
# Your code here
pass


28 changes: 28 additions & 0 deletions questions/186_linear_regression_ordinary_least_squares/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
[
{
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([1,2,3],[2,2.5,3.5],[4]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
"expected_output": "0.75 1.166667 [4.166667]"
},
{
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([0,1,2,3],[1,3,5,7],[4,5]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
"expected_output": "2 1 [9, 11]"
},
{
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([0,1,2],[5,2,-1],[3]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
"expected_output": "-3 5 [-4]"
},
{
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([2,2,2],[1,4,7],[10]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
"expected_output": "0.0 4.0 [4.0]"
},
{
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([1,2,3,4],[1.1,1.9,3.05,3.9],[5]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
"expected_output": "0.955 0.1 [4.875]"
},
{
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([3],[7],[10]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
"expected_output": "0.0 7.0 [7.0]"
}
]