Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Add new Deep-ML problem 188
  • Loading branch information
preethamak committed Oct 14, 2025
commit 091048a90d53be604a2c3903c5829b6909d9c95c
46 changes: 46 additions & 0 deletions build/186.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"id": "186",
"title": "Gaussian Process for Regression",
"difficulty": "medium",
"category": "Machine Learning",
"video": "",
"likes": "0",
"dislikes": "0",
"contributor": [
{
"profile_link": "https://github.com/Coder1010ayush",
"name": "Ayush"
}
],
"description": "## Problem\n\nProblem Statement: Task is to implement GaussianProcessRegression class which is a guassian process model for prediction regression problems.",
"learn_section": "# **Gaussian Processes (GP): From-Scratch Regression Example**\n\n## **1. What’s a Gaussian Process?**\n\nA **Gaussian Process** defines a distribution over functions $f(\\cdot)$.\nFor any finite set of inputs $( X = {x_i}_{i=1}^n )$, the function values $f(X)$ follow a multivariate normal:\n\n$$\nf(X) \\sim \\mathcal{N}\\big(0,; K(X,X)\\big)\n$$\n\nwhere ( K ) is a **kernel** (covariance) function encoding similarity between inputs.\nWith noisy targets $( y = f(X) + \\varepsilon, \\varepsilon \\sim \\mathcal{N}(0,\\sigma_n^2 I) )$,\nGP regression yields a closed-form posterior predictive mean and variance at new points $( X_* )$.\n\n---\n\n## **2. The Implementation at a Glance**\n\nThe provided code builds a minimal yet complete GP regression stack:\n\n* **Kernels implemented**\n\n * Radial Basis Function (RBF / Squared Exponential)\n * Matérn $(( \\nu = 0.5, 1.5, 2.5 ), or general ( \\nu ))$\n * Periodic\n * Linear\n * Rational Quadratic\n\n* **Core GP classes**\n\n * `_GaussianProcessBase`: kernel selection & covariance matrix computation\n * `GaussianProcessRegression`:\n\n * `fit`: $builds ( K )$, does **Cholesky decomposition**, $solves ( \\alpha )$\n * `predict`: returns posterior mean & variance\n * `log_marginal_likelihood`: computes GP evidence\n * `optimize_hyperparameters`: basic optimizer (for RBF hyperparams)\n\n---\n\n## **3. Kernel Cheat-Sheet**\n\nLet $( x, x' \\in \\mathbb{R}^d ), ( r = \\lVert x - x' \\rVert )$.\n\n* **RBF (SE):**\n $$\n k_{\\text{RBF}}(x,x') = \\sigma^2 \\exp!\\left(-\\tfrac{1}{2}\\tfrac{r^2}{\\ell^2}\\right)\n $$\n\n* **Matérn (( \\nu = 1.5 )):**\n $$\n k(x,x') = \\Big(1 + \\tfrac{\\sqrt{3},r}{\\ell}\\Big)\\exp!\\Big(-\\tfrac{\\sqrt{3},r}{\\ell}\\Big)\n $$\n\n* **Periodic:**\n $$\n k(x,x') = \\sigma^2 \\exp!\\left(-\\tfrac{2}{\\ell^2}\\sin^2!\\Big(\\tfrac{\\pi r}{p}\\Big)\\right)\n $$\n\n* **Linear:**\n $$\n k(x,x') = \\sigma_b^2 + \\sigma_v^2,x^\\top x'\n $$\n\n* **Rational Quadratic:**\n $$\n k(x,x') = \\sigma^2\\Big(1 + \\tfrac{r^2}{2\\alpha \\ell^2}\\Big)^{-\\alpha}\n $$\n\n---\n\n## **4. GP Regression Mechanics**\n\n### **Training**\n\n1. Build covariance:\n $$\n K = K(X,X) + \\sigma_n^2 I\n $$\n\n2. Cholesky factorization:\n $$\n K = L L^\\top\n $$\n\n3. Solve ( \\alpha ):\n $$\n L L^\\top \\alpha = y\n $$\n\n### **Prediction**\n\nAt new inputs ( X_* ):\n\n* $( K_* = K(X, X_*) ), ( K_{**} = K(X_*, X_*) )$\n\n* **Mean:**\n $$\n \\mu_* = K_*^\\top \\alpha\n $$\n\n* **Covariance:**\n $$\n \\Sigma_* = K_{**} - V^\\top V, \\quad V = L^{-1} K_*\n $$\n\n### **Model Selection**\n\n* **Log Marginal Likelihood (LML):**\n $$\n \\log p(y \\mid X) = -\\tfrac{1}{2} y^\\top \\alpha - \\sum\\nolimits_i \\log L_{ii} - \\tfrac{n}{2}\\log(2\\pi)\n $$\n\n---\n\n## **5. Worked Example (Linear Kernel)**\n\n```python\nimport numpy as np\ngp = GaussianProcessRegression(kernel='linear',\n kernel_params={'sigma_b': 0.0, 'sigma_v': 1.0},\n noise=1e-8)\n\nX_train = np.array([[1], [2], [4]])\ny_train = np.array([3, 5, 9]) # y = 2x + 1\ngp.fit(X_train, y_train)\n\nX_test = np.array([[3.0]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\") # -> 7.0000\n```\n\n---\n\n## **6. When to Use GP Regression**\n\n* **Small-to-medium datasets** where uncertainty estimates are valuable\n* Cases requiring **predictive intervals** (not just point predictions)\n* **Nonparametric modeling** with kernel priors\n* Automatic hyperparameter tuning via **marginal likelihood**\n\n---\n\n## **7. Practical Tips**\n\n* Always add **jitter** $10^{-6}$ to the diagonal for numerical stability\n* **Standardize inputs/outputs** before training\n* Be aware: Exact GP has complexity **$\\mathcal{O}(n^3)$** in time and **$\\mathcal{O}(n^2)$** in memory\n* Choose kernels to match problem structure:\n\n * **RBF:** smooth functions\n * **Matérn:** rougher functions\n * **Periodic:** seasonal/cyclical data\n * **Linear:** global linear trends",
"starter_code": "import math # ---------------------------------------- utf-8 encoding ---------------------------------\n\n# This file contains Gaussian Process implementation.\nimport numpy as np\nimport math\n\n\ndef matern_kernel(x: np.ndarray, x_prime: np.ndarray, length_scale=1.0, nu=1.5):\n pass\n\n\ndef rbf_kernel(x: np.ndarray, x_prime, sigma=1.0, length_scale=1.0):\n pass\n\n\ndef periodic_kernel(\n x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, period=1.0\n):\n pass\n\n\ndef linear_kernel(x: np.ndarray, x_prime: np.ndarray, sigma_b=1.0, sigma_v=1.0):\n pass\n\n\ndef rational_quadratic_kernel(\n x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, alpha=1.0\n):\n pass\n\n\n# --- BASE CLASS -------------------------------------------------------------\n\n\nclass _GaussianProcessBase:\n def __init__(self, kernel=\"rbf\", noise=1e-5, kernel_params=None):\n pass\n\n def _select_kernel(self, x1, x2):\n \"\"\"Selects and computes the kernel value for two single data points.\"\"\"\n pass\n\n def _compute_covariance(self, X1, X2):\n \"\"\"\n Computes the covariance matrix between two sets of points.\n This method fixes the vectorization bug from the original code.\n \"\"\"\n pass\n\n\n# --- REGRESSION MODEL -------------------------------------------------------\nclass GaussianProcessRegression(_GaussianProcessBase):\n def fit(self, X, y):\n pass\n\n def predict(self, X_test, return_std=False):\n pass\n\n def log_marginal_likelihood(self):\n pass\n\n def optimize_hyperparameters(self):\n pass",
"solution": "# ---------------------------------------- utf-8 encoding ---------------------------------\n# This file contains Gaussian Process implementation.\nimport numpy as np\nimport math\nfrom scipy.spatial.distance import euclidean\nfrom scipy.special import kv as bessel_kv\nfrom scipy.special import gamma\nfrom scipy.linalg import cholesky, solve_triangular\nfrom scipy.optimize import minimize\nfrom scipy.special import expit, softmax\n\n\n# --- KERNEL FUNCTIONS --------------------------------------------------------\ndef matern_kernel(x: np.ndarray, x_prime: np.ndarray, length_scale=1.0, nu=1.5):\n d = euclidean(x, x_prime)\n if d == 0:\n return 1.0 # Covariance with self is 1 before scaling\n if nu == 0.5:\n return np.exp(-d / length_scale)\n elif nu == 1.5:\n return (1 + np.sqrt(3) * d / length_scale) * np.exp(\n -np.sqrt(3) * d / length_scale\n )\n elif nu == 2.5:\n return (\n 1 + np.sqrt(5) * d / length_scale + 5 * d**2 / (3 * length_scale**2)\n ) * np.exp(-np.sqrt(5) * d / length_scale)\n else:\n factor = (2 ** (1 - nu)) / gamma(nu)\n scaled_d = np.sqrt(2 * nu) * d / length_scale\n return factor * (scaled_d**nu) * bessel_kv(nu, scaled_d)\n\n\ndef rbf_kernel(x: np.ndarray, x_prime, sigma=1.0, length_scale=1.0):\n # This is a squared exponential kernel\n\n # Calculate the squared euclidean distance\n sq_norm = np.linalg.norm(x - x_prime) ** 2\n\n # Correctly implement the formula\n return sigma**2 * np.exp(-sq_norm / (2 * length_scale**2))\n\n\ndef periodic_kernel(\n x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, period=1.0\n):\n return sigma**2 * np.exp(\n -2 * np.sin(np.pi * np.linalg.norm(x - x_prime) / period) ** 2 / length_scale**2\n )\n\n\ndef linear_kernel(x: np.ndarray, x_prime: np.ndarray, sigma_b=1.0, sigma_v=1.0):\n return sigma_b**2 + sigma_v**2 * np.dot(x, x_prime)\n\n\ndef rational_quadratic_kernel(\n x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, alpha=1.0\n):\n return sigma**2 * (\n 1 + np.linalg.norm(x - x_prime) ** 2 / (2 * alpha * length_scale**2)\n ) ** (-alpha)\n\n\n# --- BASE CLASS -------------------------------------------------------------\n\n\nclass _GaussianProcessBase:\n def __init__(self, kernel=\"rbf\", noise=1e-5, kernel_params=None):\n self.kernel_name = kernel\n self.noise = noise\n self.kernel_params = kernel_params if kernel_params else {}\n self.X_train = None\n self.y_train = None\n self.K = None\n\n def _select_kernel(self, x1, x2):\n \"\"\"Selects and computes the kernel value for two single data points.\"\"\"\n if self.kernel_name == \"rbf\":\n return rbf_kernel(x1, x2, **self.kernel_params)\n elif self.kernel_name == \"matern\":\n return matern_kernel(x1, x2, **self.kernel_params)\n elif self.kernel_name == \"periodic\":\n return periodic_kernel(x1, x2, **self.kernel_params)\n elif self.kernel_name == \"linear\":\n return linear_kernel(x1, x2, **self.kernel_params)\n elif self.kernel_name == \"rational_quadratic\":\n return rational_quadratic_kernel(x1, x2, **self.kernel_params)\n else:\n raise ValueError(\n \"Unsupported kernel. Choose from ['rbf', 'matern', 'periodic', 'linear', 'rational_quadratic'].\"\n )\n\n def _compute_covariance(self, X1, X2):\n \"\"\"\n Computes the covariance matrix between two sets of points.\n This method fixes the vectorization bug from the original code.\n \"\"\"\n # Ensuring X1 and X2 are 2D arrays\n X1 = np.atleast_2d(X1)\n X2 = np.atleast_2d(X2)\n\n n1, _ = X1.shape\n n2, _ = X2.shape\n K = np.zeros((n1, n2))\n for i in range(n1):\n for j in range(n2):\n K[i, j] = self._select_kernel(X1[i], X2[j])\n return K\n\n\n# --- REGRESSION MODEL -------------------------------------------------------\nclass GaussianProcessRegression(_GaussianProcessBase):\n def fit(self, X, y):\n self.X_train = np.asarray(X)\n self.y_train = np.asarray(y)\n self.K = self._compute_covariance(\n self.X_train, self.X_train\n ) + self.noise * np.eye(len(self.X_train))\n\n # Compute Cholesky decomposition for stable inversion\n self.L = cholesky(self.K, lower=True)\n # alpha = K_inv * y\n self.alpha = solve_triangular(\n self.L.T, solve_triangular(self.L, self.y_train, lower=True)\n )\n\n def predict(self, X_test, return_std=False):\n X_test = np.atleast_2d(X_test)\n K_s = self._compute_covariance(self.X_train, X_test)\n K_ss = self._compute_covariance(X_test, X_test)\n\n # Compute predictive mean\n mu = K_s.T @ self.alpha\n\n # Compute predictive variance\n v = solve_triangular(self.L, K_s, lower=True)\n cov = K_ss - v.T @ v\n\n if return_std:\n return mu, np.sqrt(np.diag(cov))\n return mu\n\n def log_marginal_likelihood(self):\n return (\n -0.5 * (self.y_train.T @ self.alpha)\n - np.sum(np.log(np.diag(self.L)))\n - len(self.X_train) / 2 * np.log(2 * np.pi)\n )\n\n def optimize_hyperparameters(self):\n # NOTE: This is a simplified optimizer for 'rbf' kernel's params.\n def objective(params):\n self.kernel_params = {\n \"length_scale\": np.exp(params[0]),\n \"sigma\": np.exp(params[1]),\n }\n self.fit(self.X_train, self.y_train)\n return -self.log_marginal_likelihood()\n\n init_params = np.log(\n [\n self.kernel_params.get(\"length_scale\", 1.0),\n self.kernel_params.get(\"sigma\", 1.0),\n ]\n )\n res = minimize(\n objective, init_params, method=\"L-BFGS-B\", bounds=[(-5, 5), (-5, 5)]\n )\n\n self.kernel_params = {\n \"length_scale\": np.exp(res.x[0]),\n \"sigma\": np.exp(res.x[1]),\n }\n # Re-fit with optimal hyperparameters\n self.fit(self.X_train, self.y_train)\n\n\nif __name__ == \"__main__\":\n gp = GaussianProcessRegression(\n kernel=\"linear\", kernel_params={\"sigma_b\": 0.0, \"sigma_v\": 1.0}, noise=1e-8\n )\n X_train = np.array([[1], [2], [4]])\n y_train = np.array([3, 5, 9])\n gp.fit(X_train, y_train)\n X_test = np.array([[3.0]])\n mu = gp.predict(X_test)",
"example": {
"input": "import numpy as np\ngp = GaussianProcessRegression(kernel='linear', kernel_params={'sigma_b': 0.0, 'sigma_v': 1.0}, noise=1e-8)\nX_train = np.array([[1], [2], [4]])\ny_train = np.array([3, 5, 9])\ngp.fit(X_train, y_train)\nX_test = np.array([[3.0]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
"output": "7.0000",
"reasoning": "A Gaussian Process with a linear kernel is trained on perfectly linear data that follows the function y = 2x + 1. When asked to predict the value at x=3, the model perfectly interpolates the linear function it has learned, resulting in a prediction of 2*3 + 1 = 7. The near-zero noise ensures the prediction is exact."
},
"test_cases": [
{
"test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.0}, noise=1e-8)\nX_train = np.array([[0], [2.5], [5.0], [7.5], [10.0]])\ny_train = np.sin(X_train).ravel()\ngp.fit(X_train, y_train)\nX_test = np.array([[1.25]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
"expected_output": "0.2814"
},
{
"test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.0}, noise=1e-8)\nX_train = np.array([[0], [2.5], [5.0], [7.5], [10.0]])\ny_train = np.sin(X_train).ravel()\ngp.fit(X_train, y_train)\nX_test = np.array([[1.25]])\nmu, std = gp.predict(X_test, return_std=True)\nprint(f\"mu={mu[0]:.4f}, std={std[0]:.4f}\")",
"expected_output": "mu=0.2814, std=0.7734"
},
{
"test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.0}, noise=1e-8)\nX_train = np.array([[0], [2.5], [5.0]])\ny_train = np.array([1.0, 3.0, 1.5])\ngp.fit(X_train, y_train)\nX_test = np.array([[2.5]])\nmu, std = gp.predict(X_test, return_std=True)\nprint(f\"mu={mu[0]:.4f}, std={std[0]:.4f}\")",
"expected_output": "mu=3.0000, std=0.0001"
},
{
"test": "import numpy as np\ngp = GaussianProcessRegression(kernel='linear', kernel_params={'sigma_b': 0.1, 'sigma_v': 1.0}, noise=1e-8)\nX_train = np.array([[1], [2], [4]])\ny_train = np.array([3, 5, 9])\ngp.fit(X_train, y_train)\nX_test = np.array([[3.0]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
"expected_output": "7.0000"
},
{
"test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.5}, noise=1e-8)\nX_train = np.array([[1, 2], [3, 4], [5, 1]])\ny_train = np.sum(X_train, axis=1)\ngp.fit(X_train, y_train)\nX_test = np.array([[2, 3]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
"expected_output": "5.5553"
}
]
}
78 changes: 78 additions & 0 deletions build/188.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
{
"id": "188",
"title": "Log-Softmax & Cross-Entropy",
"difficulty": "medium",
"category": "Deep Learning",
"video": "",
"likes": "0",
"dislikes": "0",
"contributor": [
{
"profile_link": "https://www.linkedin.com/in/preetham-a-k-18b97931b/",
"name": "Preetham AK"
}
],
"pytorch_difficulty": "medium",
"description": "# Log-Softmax and Cross-Entropy Loss Implementation\n\nImplement a numerically stable log-softmax function and use it to compute the cross-entropy loss from scratch in PyTorch.\n\nYou are **not allowed** to use `torch.nn.functional.log_softmax` or `torch.nn.functional.cross_entropy`. You may only use basic PyTorch tensor operations.\n\nYour function should support:\n- 1D or 2D input tensors\n- Batch computation\n- Numerical stability using the log-sum-exp trick",
"learn_section": "# Learning Goals\n\n1. Understand how **softmax** converts logits to probabilities.\n2. Learn why **log-softmax** is preferred for numerical stability.\n3. Implement **cross-entropy loss** manually.\n4. Apply batch computations in PyTorch.\n5. Understand the connection between logits, probabilities, and loss functions.",
"starter_code": "import torch\nfrom typing import List\n\ndef log_softmax_stable(x: torch.Tensor) -> torch.Tensor:\n \"\"\"\n Compute log-softmax of x in a numerically stable way.\n Args:\n x: torch.Tensor of shape (N,) or (N, C)\n Returns:\n log-softmax tensor of same shape\n \"\"\"\n # Your code here\n pass\n\ndef cross_entropy_loss(y_true: torch.Tensor, y_pred: torch.Tensor) -> torch.Tensor:\n \"\"\"\n Compute cross-entropy loss given true labels and predicted logits.\n Args:\n y_true: torch.Tensor of shape (N,)\n y_pred: torch.Tensor of shape (N, C)\n Returns:\n scalar mean loss\n \"\"\"\n # Your code here\n pass",
"solution": "import torch\n\ndef log_softmax_stable(x: torch.Tensor) -> torch.Tensor:\n x_max = x.max(dim=-1, keepdim=True).values\n log_probs = x - x_max - torch.log(torch.exp(x - x_max).sum(dim=-1, keepdim=True))\n return log_probs\n\ndef cross_entropy_loss(y_true: torch.Tensor, y_pred: torch.Tensor) -> torch.Tensor:\n log_probs = log_softmax_stable(y_pred)\n loss = -log_probs[range(len(y_true)), y_true].mean()\n return loss",
"example": {
"input": {
"y_true": [
0,
1
],
"y_pred": [
[
2.0,
1.0,
0.1
],
[
0.5,
2.5,
0.3
]
]
},
"output": {
"log_softmax": [
[
-0.5514,
-1.5514,
-2.4514
],
[
-2.2808,
-0.2808,
-2.4808
]
],
"cross_entropy_loss": 0.4161
},
"reasoning": "Compute log-softmax using the numerical stability trick (subtract max of each row before exponentiating), then compute the mean negative log probability of the correct classes to get the cross-entropy loss."
},
"test_cases": [
{
"test": "import torch\nfrom starter_code import log_softmax_stable, cross_entropy_loss\n\ny_true = torch.tensor([0,1])\ny_pred = torch.tensor([[2.0,1.0,0.1],[0.5,2.5,0.3]])\n\n# Test log-softmax\nexpected_log_softmax = torch.tensor([[-0.5514,-1.5514,-2.4514],[-2.2808,-0.2808,-2.4808]])\nassert torch.allclose(log_softmax_stable(y_pred), expected_log_softmax, atol=1e-3)\n\n# Test cross-entropy loss\nexpected_loss = 0.4161\nassert abs(cross_entropy_loss(y_true, y_pred).item() - expected_loss) < 1e-3",
"expected_output": "pass"
}
],
"tinygrad_starter_code": "def your_function(...):\n pass",
"tinygrad_solution": "def your_function(...):\n ...",
"tinygrad_test_cases": [
{
"test": "print(your_function(...))",
"expected_output": "..."
}
],
"pytorch_starter_code": "def your_function(...):\n pass",
"pytorch_solution": "def your_function(...):\n ...",
"pytorch_test_cases": [
{
"test": "print(your_function(...))",
"expected_output": "..."
}
]
}
10 changes: 10 additions & 0 deletions questions/188_logsoftmax_crossentropy/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Log-Softmax and Cross-Entropy Loss Implementation

Implement a numerically stable log-softmax function and use it to compute the cross-entropy loss from scratch in PyTorch.

You are **not allowed** to use `torch.nn.functional.log_softmax` or `torch.nn.functional.cross_entropy`. You may only use basic PyTorch tensor operations.

Your function should support:
- 1D or 2D input tensors
- Batch computation
- Numerical stability using the log-sum-exp trick
Loading