Merge pull request CamDavidsonPilon#6 from pmagwene/master

CamDavidsonPilon · CamDavidsonPilon · commit 2a5cf6bbe3d9 · 2013-02-23T07:35:41.000-08:00
Minor editorial tweaks to Chap 1
diff --git a/Chapter1_Introduction/Chapter1_Introduction.ipynb b/Chapter1_Introduction/Chapter1_Introduction.ipynb
@@ -101,7 +101,7 @@
       "\n",
       "Bayesian inference differs from more traditional statistical analysis by preserving *uncertainty* about our beliefs. At first, this sounds like a bad statistical technique. Isn't statistics all about deriving *certainty* from randomness? The Bayesian method interprets probability as measure of *believability in an event*, that is, how confident one is in an event occuring. In fact, we will see in a moment that this is the natural interpretation of probability. \n",
       "\n",
-      "For this to be clearer, we consider an alternative interpretation of probability: *Frequentist* methods assume that probability is the long-run frequency of events (hence the bestowed title). For example, the *probability of plane accidents* under a frequentist philosophy is interpreted as the *long-term frequency of plane accidents*. This makes logical sense for many probabilities and events, but becomes more difficult to understand when events have no long-term frequency of occurances. Consider: we often assign probabilities to outcomes of presidential elections, but the election itself only happens once! Frequentists get around this by invoking alternative realities and saying across all these universes, the frequency of occurances is the probability. \n",
+      "For this to be clearer, we consider an alternative interpretation of probability: *Frequentist* methods assume that probability is the long-run frequency of events (hence the bestowed title). For example, the *probability of plane accidents* under a frequentist philosophy is interpreted as the *long-term frequency of plane accidents*. This makes logical sense for many probabilities and events, but becomes more difficult to understand when events have no long-term frequency of occurences. Consider: we often assign probabilities to outcomes of presidential elections, but the election itself only happens once! Frequentists get around this by invoking alternative realities and saying across all these universes, the frequency of occurences is the probability. \n",
       "\n",
       "Bayesians, on the other hand, have a more intuitive approach. Bayesians interpret a probability as measure of *belief*, or confidence, of an event occurring. An individual who assigns a belief of 0 to an event has no confidence that the event will occur; conversely, assigning a belief of 1 implies that the individual is absolutely certain of an event occurring. Beliefs between 0 and 1 allow for weightings of other outcomes. This definition agrees with the probability of a plane accident example, for having observed the frequency of plane accidents, an individual's belief should be equal to that frequency. Similarly, under this definition of probability being equal to beliefs, it is clear how we can speak about probabilities (beliefs) of presidential election outcomes. \n",
       "\n",
@@ -111,15 +111,15 @@
       "\n",
       "-  Your code either has a bug in it or not, but we do not know for certain which is true. Though we have a belief about the presence or absence of a bug.  \n",
       "\n",
-      "-  A medical patient is exhibiting symptoms $x$, $y$ and $z$. There are a number of diseases that could be causing all of them,  but only has a single disease is present. A doctor has beliefs about which disease.\n",
+      "-  A medical patient is exhibiting symptoms $x$, $y$ and $z$. There are a number of diseases that could be causing all of them, but only a single disease is present. A doctor has beliefs about which disease.\n",
       "\n",
-      "- You believe that the pretty girl in your English class doesn't have a crush you. You assign a low probability that she does. Her, on the other hand, knows for certain that she *does indeed* like you. She (implicitly) assigns a probability 1. \n",
+      "- You believe that the pretty girl in your English class doesn't have a crush you. You assign a low probability that she does. She, on the other hand, knows for certain that she *does indeed* like you. She (implicitly) assigns a probability 1. \n",
       "\n",
       "This philosophy of treating beliefs as probability is natural to humans. We employ it constantly as we interact with the world and only see partial evidence. Alternatively, you have to be *trained* to think like a frequentist. \n",
       "\n",
       "To align ourselves with traditional probability notation, we denote our belief about event $A$ as $P(A)$.\n",
       "\n",
-      "John Maynard Keynes, a great economist and thinker, said \"When the facts change, I change my mind. What do you do, sir?\" This quote reflects the way a Bayesian updates his or her beliefs after seeing evidence. Even -especially- if the evidence is counter to what was initialed believed, it cannot be ignored. We denote our updated belief as $P(A |X )$, interpreted as the probability of $A$ given the evidence $X$. We call it the *posterior probability* so as to contrast the pre-evidence *prior probability*. Consider the posterior probabilities (read: posterior belief) of the above examples, after observing evidence $X$.:\n",
+      "John Maynard Keynes, a great economist and thinker, said \"When the facts change, I change my mind. What do you do, sir?\" This quote reflects the way a Bayesian updates his or her beliefs after seeing evidence. Even -especially- if the evidence is counter to what was initially believed, it cannot be ignored. We denote our updated belief as $P(A |X )$, interpreted as the probability of $A$ given the evidence $X$. We call it the *posterior probability* so as to contrast the pre-evidence *prior probability*. Consider the posterior probabilities (read: posterior belief) of the above examples, after observing evidence $X$.:\n",
       "\n",
       "1\\.   $P(A): \\;\\;$  This big, complex code likely has a bug in it. $P(A | X): \\;\\;$ The code passed all $X$ tests; there still might be a bug, but its presence is less likely now.\n",
       "\n",
@@ -151,7 +151,7 @@
       "This is very different from the answer the frequentist function returned. Notice that the Bayesian function accepted an additional argument:  *\"Often my code has bugs\"*. This parameter, the *prior*, is that intuition in your head that says \"wait- something looks different with this situation\", or conversely \"yes, this is what I expected\". In our example, the programmer often sees debugging tests fail, but this time we didn't, which signals an alert in our head. By including the prior parameter, we are telling the Bayesian function to include our personal intuition. Technically this parameter in the Bayesian function is optional, but we will see excluding it has its own consequences. \n",
       "\n",
       "\n",
-      "As we aquire more and more instances of evidence, our prior belief is *washed out* by the new evidence. This is to be expected. For example, if your prior belief is something ridiculous, like \"I expect the sun to explode today\", and each day you are proved wrong, you would hope that any inference would correct you, or at least align your beliefs. \n",
+      "As we acquire more and more instances of evidence, our prior belief is *washed out* by the new evidence. This is to be expected. For example, if your prior belief is something ridiculous, like \"I expect the sun to explode today\", and each day you are proved wrong, you would hope that any inference would correct you, or at least align your beliefs. \n",
       "\n",
       "\n",
       "Denote $N$ as the number of instances of evidence we possess. As we gather an *infinite* amount of evidence, say as $N \\rightarrow \\infty$, our Bayesian results align with frequentist results. Hence for large $N$, statistical inference is more or less objective. On the other hand, for small $N$, inference is much more *unstable*: frequentist estimates have more variance and larger confidence intervals. This is where Bayesian analysis excels. By introducing a prior, and returning a distribution (instead of an scalar estimate), we *preserve the uncertainity* to reflect the instability of stasticial inference of a small $N$ dataset. \n",
@@ -261,7 +261,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "We can see the biggest gains if we observe the $X$ tests passed are when the prior probability, $p$, is low. Let's settle on a specific value for the prior. I'm a (I think) strong programmer, so I'm going to give myself a realistic prior of 0.20, that is, there is a 20% chance that I write code bug-free. To be more realistic, this prior should be a function of how complicated is code is and large the code is, but let's pin it at 0.20. Then my updated belief that my code is bug-free is 0.33. \n",
+      "We can see the biggest gains if we observe the $X$ tests passed are when the prior probability, $p$, is low. Let's settle on a specific value for the prior. I'm a (I think) strong programmer, so I'm going to give myself a realistic prior of 0.20, that is, there is a 20% chance that I write code bug-free. To be more realistic, this prior should be a function of how complicated and large the code is, but let's pin it at 0.20. Then my updated belief that my code is bug-free is 0.33. \n",
       "\n",
       "Let's not forget from the idea that the prior is a probability distribution: $p$ is the prior probability that there *are no bugs*, so $1-p$ is the prior probability that there *are bugs*. What does our prior probability distribution look like?\n",
       "\n",
@@ -465,7 +465,7 @@
       "_____\n",
       "Let's try to model a more interesting example, concerning text-message rates:\n",
       "\n",
-      ">  You are given a series of text-message counts from a user of your system. The data, plotted over time, appears in the graph below. You are curious if the user's text-messaging habits changed over time, either gradually or suddenly. How can you model this? (This is infact my own text-message data. Judge my popularity as you wish.)\n"
+      ">  You are given a series of text-message counts from a user of your system. The data, plotted over time, appears in the graph below. You are curious if the user's text-messaging habits changed over time, either gradually or suddenly. How can you model this? (This is in fact my own text-message data. Judge my popularity as you wish.)\n"
      ]
     },
     {
@@ -743,7 +743,7 @@
      "source": [
       "### Interpretation\n",
       "\n",
-      "Recall that the Bayesian methodology returns a *distribution*, hence why we have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be. What other observations can you make? Look at the data again, do they seem reasonable? The distributions of the two $\\lambda$s look very different, suggesting likely there was a change in the user's text-message behavior.\n",
+      "Recall that the Bayesian methodology returns a *distribution*, hence we now have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be. What other observations can you make? Look at the data again, do they seem reasonable? The distributions of the two $\\lambda$s look very different, indicating that it's likely there was a change in the user's text-message behavior.\n",
       "\n",
       "Also notice that posteriors' distributions do not look like any Poisson distributions. They are really not anything we recognize. But this is OK. This is one of the benefits of taking a computational point-of-view. If we had instead done this mathematically, we would have been stuck with a very intractable (and messy) distribution. Via computations, we are agnostic to the tractability.\n",
       "\n",
@@ -757,7 +757,7 @@
       "Why would I want samples from the posterior, anyways?\n",
       "-------\n",
       "\n",
-      "We will deal with this question for the remainder of the book, and it is an understatement to say we can perform amazingly useful things. For now, let's finishing with using posterior samples to answer the follow question: what is the expected number of texts at day $t, \\; 0 \\le t \\le70$? Recall that the expected value of a Poisson is equal to its parameter $\\lambda$, then the question is equivalent to *what is the expected value of $\\lambda$ at time $t$*?\n",
+      "We will deal with this question for the remainder of the book, and it is an understatement to say we can perform amazingly useful things. For now, let's finishing with using posterior samples to answer the following question: what is the expected number of texts at day $t, \\; 0 \\le t \\le70$? Recall that the expected value of a Poisson is equal to its parameter $\\lambda$, then the question is equivalent to *what is the expected value of $\\lambda$ at time $t$*?\n",
       "\n",
       "In the code below, we are calculating the following: Let $i$ index a particular sample from the posterior distributions. Given a day $t$, we average over all $\\lambda_i$ on that day $t$, using $\\lambda_{1,i}$ if $t \\lt \\tau_i$ else we use $\\lambda_{2,i}$. \n",
       "\n",
diff --git a/Chapter2_MorePyMC/MorePyMC.ipynb b/Chapter2_MorePyMC/MorePyMC.ipynb
@@ -89,13 +89,13 @@
       "\n",
       "### Parent and Child relationships\n",
       "\n",
-      "To assist with terminology, and to be consistent with PyMC's documentation, we introduce *parent and children* variables. \n",
+      "To assist with terminology, and to be consistent with PyMC's documentation, we introduce *parent and child* variables. \n",
       "\n",
       "*  *parent variables* are variables that influence another variable. \n",
       "\n",
-      "*  *children variable* are variables that are affected by other variables, i.e. are the subject of parent variables. \n",
+      "*  *child variable* are variables that are affected by other variables, i.e. are the subject of parent variables. \n",
       "\n",
-      "Variables can be both parent and children variables. For example, consider the PyMC code below"
+      "Variables can be both parents and children. For example, consider the PyMC code below"
      ]
     },
     {
@@ -178,7 +178,7 @@
      "source": [
       "### PyMC Variables\n",
       "\n",
-      "All PyMC variables also expose a `value` attribute. This method produces the *current* (possible random) value of the variable, given the variable's parents. To use the same variables from before:"
+      "All PyMC variables also expose a `value` attribute. This method produces the *current* (possibly random) value of the variable, given the variable's parents. To use the same variables from before:"
      ]
     },
     {
@@ -240,7 +240,7 @@
       "\n",
       "Rather than creating a Python array of stochastic variables, addressing the `size` keyword in the call to a `Stochastic` variable creates multivariate array of (independent) stochastic variables. The array behaves like a Numpy array when used like one, and references to its `value` attribute return Numpy arrays.  \n",
       "\n",
-      "The also solves the annoying case where you may have many variables $\\beta_i, \\; i = 1,...,N$ you wish to model. Instead of creating arbitrary names and variables for each one, like:\n",
+      "The `size` argument also solves the annoying case where you may have many variables $\\beta_i, \\; i = 1,...,N$ you wish to model. Instead of creating arbitrary names and variables for each one, like:\n",
       "\n",
       "    beta_1 = mc.Uniform( \"beta_1\", 0, 1)\n",
       "    beta_2 = mc.Uniform( \"beta_2\", 0, 1)\n",
@@ -349,7 +349,7 @@
      "source": [
       "#### Determinstic variables\n",
       "\n",
-      "Since most variables you will be modeling are stochastic, we distinguish deterministic variables with a `pymc.deterministic` wrapper. If you are unfamiliar with Python wrappers, that's no problem. Just preppend the `pymc.deterministic` and your good to go. No need to know know more. Preprending with the wrapper is the easist way, but not the only way, to create deterministic variables. This is not completely true: elementary operations, like addition, exponentials etc. implicity create determinsitic variables. "
+      "Since most variables you will be modeling are stochastic, we distinguish deterministic variables with a `pymc.deterministic` wrapper. If you are unfamiliar with Python wrappers (also called decorators), that's no problem. Just preppend the `pymc.deterministic` decorator and you're good to go. No need to know know more. Preprending with the wrapper is the easist way, but not the only way, to create deterministic variables. This is not completely true: elementary operations, like addition, exponentials etc. implicity create determinsitic variables. "
      ]
     },
     {