Update Chapter1_Introduction.ipynb

CamDavidsonPilon · CamDavidsonPilon · commit cc343192697d · 2013-03-02T16:11:03.000-05:00
diff --git a/Chapter1_Introduction/Chapter1_Introduction.ipynb b/Chapter1_Introduction/Chapter1_Introduction.ipynb
@@ -213,7 +213,7 @@
      "source": [
       "We can see the biggest gains if we observe the $X$ tests passed are when the prior probability, $p$, is low. Let's settle on a specific value for the prior. I'm a (I think) strong programmer, so I'm going to give myself a realistic prior of 0.20, that is, there is a 20% chance that I write code bug-free. To be more realistic, this prior should be a function of how complicated and large the code is, but let's pin it at 0.20. Then my updated belief that my code is bug-free is 0.33. \n",
       "\n",
-      "Let's not forget from the idea that the prior is a probability distribution: $p$ is the prior probability that there *are no bugs*, so $1-p$ is the prior probability that there *are bugs*. What does our prior probability distribution look like?\n",
+      "Recall that the prior is a probability distribution: $p$ is the prior probability that there *are no bugs*, so $1-p$ is the prior probability that there *are bugs*.\n",
       "\n",
       "Similarly, our posterior is also a probability distribution, with $P(A | X)$ the probability there is no bug *given we saw all tests pass*, hence $1-P(A|X)$ is the probability there is a bug *given all tests passed*. What does our posterior probability distribution look like? Below is a graph of both the prior and the posterior distributions. \n"
      ]
@@ -256,10 +256,10 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "Notice that after we observed $X$ occur, the probability of no bugs present increased. By increasing the number of tests, we can approach confidence (probability 1) that there are not bugs.\n",
+      "Notice that after we observed $X$ occur, the probability of bugs being abscent. By increasing the number of tests, we can approach confidence (probability 1) that there are no bugs present.\n",
       "\n",
       "\n",
-      "This was a very simple example, but the mathematics from here only becomes difficult except for artifically constructed instances. We will see that this math is actually unnecessary. First we must broaden our modeling tools. "
+      "This was a very simple example of Bayesian inference and Bayes rule. Unfortunately, the mathematics necessary to perform more complicated Bayesian inference only becomes more difficult, except for artifically constructed cases. We will later see that this type of mathematical anaylsis is actually unnecessary. First we must broaden our modeling tools. "
      ]
     },
     {
@@ -294,7 +294,7 @@
       "\n",
       "$$E\\large[ \\;Z\\; | \\; \\lambda \\;\\large] = \\lambda $$\n",
       "\n",
-      "We will use this property often, so it's something useful to remember. Below we plot the probablity mass distribution for different $\\lambda$ values. The first thing to notice is that by increasing $\\lambda$ we add more probability to larger values occuring. The second notice is although the graph ends at 15, the distributions do not. They assign positive probability to every integer."
+      "We will use this property often, so it's something useful to remember. Below we plot the probablity mass distribution for different $\\lambda$ values. The first thing to notice is that by increasing $\\lambda$ we add more probability to larger values occuring. Secondly, notice that although the graph ends at 15, the distributions do not. They assign positive probability to every non-negative integer."
      ]
     },
     {
@@ -454,15 +454,15 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "Before we begin, with resepect to the plt above, do you say think there is a change in behaviour? \n",
+      "Before we begin, with resepect to the plot above, do you say think there is a change in behaviour?  \n",
       "\n",
-      "How can we start to model this? Well, as I conveniently already introduced, a Poisson random variable would be a very appropriate model for this *count* data. Denoting a day $i$'s text-message count $C_i$, \n",
+      "How can we start to model this? Well, as I conveniently already introduced, a Poisson random variable would be a very appropriate model for this *count* data. Denoting day $i$'s text-message count by $C_i$, \n",
       "\n",
       "$$ C_i \\sim \\text{Poisson}(\\lambda)  $$\n",
       "\n",
-      "We are not sure about what the $\\lambda$ parameter is though. Looking at the chart above, it appears that the rate might become higher at some later date, which is equivalently saying the parameter $\\lambda$ increases at some later date (recall a higher $\\lambda$ means more probability on larger outcomes).\n",
+      "We are not sure about what the $\\lambda$ parameter is though. Looking at the chart above, it appears that the rate might become higher at some later date, which is equivalently saying the parameter $\\lambda$ increases at some later date (recall a higher $\\lambda$ means more probability on larger outcomes, that is, higher probability of many texts.).\n",
       "\n",
-      "How can we mathematically represent this? We can think, that at some later date (call it $\\tau$), the parameter $\\lambda$ suddenly jumps to a higher value. So we create two $\\lambda$ parameters, one for before the day $\\tau$, and one for after. In literature, a sudden transition like this would be called a *switchpoint*:\n",
+      "How can we mathematically represent this? We can think, that at some later date (call it $\\tau$), the parameter $\\lambda$ suddenly jumps to a higher value. So we create two $\\lambda$ parameters, one for behaviour before the $\\tau$, and one for behaviour after. In literature, a sudden transition like this would be called a *switchpoint*:\n",
       "\n",
       "$$\n",
       "\\lambda = \n",
@@ -473,20 +473,20 @@
       "$$\n",
       "\n",
       "\n",
-      " If, in reality, no sudden change occurred, the $\\lambda$'s should look about equal. What would be a good prior distribution on $\\lambda_{1}$ and $\\lambda_2$? \n",
-      "\n",
-      "Recall that $\\lambda_i, \\; i=1,2,$ can be any positive number. The *exponential* random variable has a density function for any positive number. This would be a good choice to model $\\lambda_i$. But again, we need a parameter for this exponential distribution: call it $\\alpha$.\n",
+      " If, in reality, no sudden change occurred and indeed $\\lambda_1 = \\lambda_2$, the $\\lambda$'s posterior distributions should look about equal. \n",
+      "\n"
+      "What would be good prior distributions for $\\lambda_1$ and $\\lambda_2$? Recall that $\\lambda_i, \\; i=1,2,$ can be any positive number. The *exponential* random variable has a density function for any positive number. This would be a good choice to model $\\lambda_i$. But again, we need a parameter for this exponential distribution: call it $\\alpha$.\n",
       "\n",
       "\\begin{align}\n",
       "&\\lambda_1 \\sim \\text{Exp}( \\alpha ) \\\\\\\n",
       "&\\lambda_2 \\sim \\text{Exp}( \\alpha )\n",
       "\\end{align}\n",
       "\n",
-      "$\\alpha$ is called a *hyper-parameter*, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally.  A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since \n",
+      "$\\alpha$ is called a *hyper-parameter*, or *parent* variable, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally.  A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since \n",
       "\n",
       "$$\\frac{1}{N}\\sum_{i=0}^N \\;C_i \\approx E[\\; \\lambda \\; |\\; \\alpha ] = \\frac{1}{\\alpha}$$ \n",
       "\n",
-      "Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda$; creating two exponential distributions with different $\\alpha$ values reflects our belief was that the rate changed (increased) after some period.\n",
+      "Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda_i$; creating two exponential distributions with different $\\alpha$ values reflects our belief was that the rate changed (increased) after some period.\n",
       "\n",
       "What about $\\tau$? Well, due to the randomness, it is too difficult to pick out when $\\tau$ might have occurred. Instead, we can assign an *uniform prior belief* to every possible day. This is equivalent to saying\n",
       "\n",
@@ -631,7 +631,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "The above code will be explained in the Chapter 2, but this is where our results come from. The machinery being employed is called *Monte Carlo Markov Chains*. It returns random variables from the posterior distributions of $\\lambda_1, \\lambda_2$ and $\\tau$. Be can plot a histogram of the random variables to see what the posterior distribution looks like. Below, we collect the samples (called *traces* in MCMC literature). "
+      "The above code will be explained in the Chapter 3, but this is where our results come from. The machinery being employed is called *Monte Carlo Markov Chains* (which I delay explaining until Chapter 3). It returns thousands of random variables from the posterior distributions of $\\lambda_1, \\lambda_2$ and $\\tau$. We can plot a histogram of the random variables to see what the posterior distribution looks like. Below, we collect the samples (called *traces* in MCMC literature) in histograms. "
      ]
     },
     {
@@ -702,11 +702,11 @@
      "source": [
       "### Interpretation\n",
       "\n",
-      "Recall that the Bayesian methodology returns a *distribution*, hence we now have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be. What other observations can you make? Look at the data again, do they seem reasonable? The distributions of the two $\\lambda$s look very different, indicating that it's likely there was a change in the user's text-message behavior.\n",
+      "Recall that the Bayesian methodology returns a *distribution*, hence we now have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be: $\\lambda_1$ is around 18 and $\\lambda_2$ is around 23. What other observations can you make? Look at the data again, do these seem reasonable? The distributions of the two $\\lambda$s are positioned very differently, indicating that it's likely there was a change in the user's text-message behaviour.\n",
       "\n",
-      "Also notice that posteriors' distributions do not look like any Poisson distributions. They are really not anything we recognize. But this is OK. This is one of the benefits of taking a computational point-of-view. If we had instead done this mathematically, we would have been stuck with a very intractable (and messy) distribution. Via computations, we are agnostic to the tractability.\n",
+      "Also notice that posteriors' distributions do not look like any Poisson distributions, though we originally started modelling with Poisson random variables. They are really not anything we recognize. But this is OK. This is one of the benefits of taking a computational point-of-view. If we had instead done this mathematically, we would have been stuck with a very analytically intractable (and messy) distribution. Via computations, we are agnostic to the tractability.\n",
       "\n",
-      "Our analysis also returned a distribution for what $\\tau$ might be. Had no change occurred, or the change been gradual, the posterior distribution of $\\tau$ would have been more spread out. On the contrary, it is very peaked. It appears that near day 50, the individual's text-message behavior suddenly changed.  "
+      "Our analysis also returned a distribution for what $\\tau$ might be. Had no change occurred, or the change been gradual over time, the posterior distribution of $\\tau$ would have been more spread out, relfecting that many values are likely candidates for $\\tau$. On the contrary, it is very peaked. It appears that near day 45, the individual's text-message behavior suddenly changed.  "
      ]
     },
     {
@@ -718,7 +718,7 @@
       "\n",
       "We will deal with this question for the remainder of the book, and it is an understatement to say we can perform amazingly useful things. For now, let's finishing with using posterior samples to answer the following question: what is the expected number of texts at day $t, \\; 0 \\le t \\le70$? Recall that the expected value of a Poisson is equal to its parameter $\\lambda$, then the question is equivalent to *what is the expected value of $\\lambda$ at time $t$*?\n",
       "\n",
-      "In the code below, we are calculating the following: Let $i$ index a particular sample from the posterior distributions. Given a day $t$, we average over all $\\lambda_i$ on that day $t$, using $\\lambda_{1,i}$ if $t \\lt \\tau_i$ else we use $\\lambda_{2,i}$. \n",
+      "In the code below, we are calculating the following: Let $i$ index samples from the posterior distributions. Given a day $t$, we average over all possible $\\lambda_i$ for that day $t$, using $\\lambda_i = \\lambda_{1,i}$ if $t \\lt \\tau_i$ (that is, if the behaviour change hadn't happened yet), else we use $\\lambda_i = \\lambda_{2,i}$. \n",
       "\n",
       "\n"
      ]
@@ -948,4 +948,4 @@
    "metadata": {}
   }
  ]
-}
+}