finished notes on svm

Theodore Petrou · Theodore Petrou · commit 346e0264569b · 2016-10-12T16:01:04.000-05:00
diff --git a/Introduction to Statistical Learning/Chapter 9.ipynb b/Introduction to Statistical Learning/Chapter 9.ipynb
@@ -104,20 +104,53 @@
    "metadata": {},
    "source": [
     "# for class\n",
-    "do simple linearly separable case (hard margin) with y = 1/2x + 3 or something.\n",
+    "Define hyperplane. Write equation = 0. Sides of hyperplanes determine classification. Not a probabilistic model but can use distance from hyperplane to be a proxy for certainty. Show that coefficients point orthogonal to hyperplane\n",
+    "\n",
+    "Dario - Maximum margin classifiers\n",
+    "\n",
+    "do simple linearly separable case (hard margin) with y = 1/2x + 1 with points (1,4) and (3, 0) as the support vectors\n",
     "\n",
     "Write data points (x1, x2), y where y is -1 or 1\n",
     "\n",
-    "Make data points in a manner that one additional point of one class close to another class has tremendous influence on the line."
+    "Make data points in a manner that one additional point of one class close to another class has tremendous influence on the line.\n",
+    "\n",
+    "Set up problem specification: Maximize margin subject to norm of weights = 1 and y(xb) >= M.\n",
+    "\n",
+    "When norm of weights =1 then y(xb) gives the distance from the point to the hyperplane. and xb = M give the equation to the support vector\n",
+    "\n",
+    "Support vector classifiers\n",
+    "Non-separable case. allow for error. extremely sensitive to one data point. Soft margin classifier. Want robustness, generalization. \n",
+    "\n",
+    "Make specification: Maximize M, subject to norm of weights = 1 and y(xb) > M(1 - e) where sum(e) < C, errors are called slack variables. Hyperplane is still boundary for classification. \n",
+    "\n",
+    "Slack variables: if e = 0, on correct side of margin. if e between 0 and 1 then between margin and hyperplane. If e > 1 then misclassified.\n",
+    "\n",
+    "C: Budget, \"the bank\". If C = 0 then need linear separability. Chosen via cv. \n",
+    "\n",
+    "Only observations that lie on the margin or violate are the support vectors and the only observation that affect the model\n",
+    "\n",
+    "Gerardo - Support vector machines\n",
+    "Needed for non-linear decision boundaries. Can enlarge feature space by using polynomial, interaction terms and a linear classifier can again be used. Kernel approach is very efficient computationally. The linear support vector classifier is just sum of inner product of X and each observation times a constant, but only non-zero constants are the support vectors.\n",
+    "\n",
+    "Instead of just the inner product, a kernel function can be used. The linear kernel is just the inner product. Kernels measure similarity."
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
   "anaconda-cloud": {},
   "kernelspec": {
-   "display_name": "Python [Root]",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "Python [Root]"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -129,7 +162,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.1"
   }
  },
  "nbformat": 4,