Merge pull request CamDavidsonPilon#57 from pablooliveira/Chapter2-typo

CamDavidsonPilon · CamDavidsonPilon · commit e74ed12a569f · 2013-06-02T08:24:00.000-07:00
Fix typos in Chapter 2
diff --git a/Chapter2_MorePyMC/MorePyMC.ipynb b/Chapter2_MorePyMC/MorePyMC.ipynb
@@ -15,7 +15,7 @@
       "======\n",
       "______\n",
       "\n",
-      "This chapter introduces more PyMC syntax and design patterns, and ways to think about how to model a system from a Bayesian perspective. It also contains tips and data visualization techniques for assesing goodness-of-fit for your Bayesian model."
+      "This chapter introduces more PyMC syntax and design patterns, and ways to think about how to model a system from a Bayesian perspective. It also contains tips and data visualization techniques for assessing goodness-of-fit for your Bayesian model."
      ]
     },
     {
@@ -188,7 +188,7 @@
      "source": [
       "PyMC is concerned with two types of programming variables: `stochastic` and `deterministic`.\n",
       "\n",
-      "*  *stochastic variables* are variables that are not deterministic, i.e., even if you knew all the values of the variables' parents (if it even has any parents), it would still be random. Included in this catagory are instances of classes `Poisson`, `DiscreteUniform`, and `Exponential`.\n",
+      "*  *stochastic variables* are variables that are not deterministic, i.e., even if you knew all the values of the variables' parents (if it even has any parents), it would still be random. Included in this category are instances of classes `Poisson`, `DiscreteUniform`, and `Exponential`.\n",
       "\n",
       "*  *deterministic variables* are variables that are not random if the variables' parents were known. This might be confusing at first: a quick mental check is *if I knew all of variable `foo`'s parent variables, I could determine what `foo`'s value is.* \n",
       "\n",
@@ -335,7 +335,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "#### Determinstic variables\n",
+      "#### Deterministic variables\n",
       "\n",
       "Since most variables you will be modeling are stochastic, we distinguish deterministic variables with a `pymc.deterministic` wrapper. (If you are unfamiliar with Python wrappers (also called decorators), that's no problem. Just prepend the `pymc.deterministic` decorator before the variable declaration and you're good to go. No need to know more. ) The declaration of a deterministic variable uses a Python function:\n",
       "\n",
@@ -345,7 +345,7 @@
       "\n",
       "For all purposes, we can treat the object `some_deterministic_var` as a variable and not a Python function. \n",
       "\n",
-      "Prepending with the wrapper is the easiest way, but not the only way, to create deterministic variables. This is not completely true: elementary operations, like addition, exponentials etc. implicity create determinsitic variables. For example, the following returns a deterministic variable:"
+      "Prepending with the wrapper is the easiest way, but not the only way, to create deterministic variables. This is not completely true: elementary operations, like addition, exponentials etc. implicitly create deterministic variables. For example, the following returns a deterministic variable:"
      ]
     },
     {
@@ -464,7 +464,7 @@
      "source": [
       "To frame this in the notation of the first chapter, though this is a slight abuse of notation, we have specified $P(A)$. Our next goal is to include data/evidence/observations $X$ into our model. \n",
       "\n",
-      "PyMC stochastic variables have a keyword argument `observed` which accepts a boolean (`False` by default). The keyword `observed` has a very simple role: fix the variable's current value, i.e. make `value` immutable. We have to specify an intial `value` in the variable's creation, equal to the observations we wish to include, typically an array (and it should be an Numpy array for speed). For example:"
+      "PyMC stochastic variables have a keyword argument `observed` which accepts a boolean (`False` by default). The keyword `observed` has a very simple role: fix the variable's current value, i.e. make `value` immutable. We have to specify an initial `value` in the variable's creation, equal to the observations we wish to include, typically an array (and it should be an Numpy array for speed). For example:"
      ]
     },
     {
@@ -588,7 +588,7 @@
       "<img src=\"http://i.imgur.com/7J30oCG.png\" width = 700/>\n",
       "\n",
       "\n",
-      "PyMC, and other probablistic programming languages, have been designed to tell these data-generation *stories*. More generally, B. Cronin writes [5]:\n",
+      "PyMC, and other probabilistic programming languages, have been designed to tell these data-generation *stories*. More generally, B. Cronin writes [5]:\n",
       "\n",
       "> Probabilistic programming will unlock narrative explanations of data, one of the holy grails of business analytics and the unsung hero of scientific persuasion. People think in terms of stories - thus the unreasonable power of the anecdote to drive decision-making, well-founded or not. But existing analytics largely fails to provide this kind of story; instead, numbers seemingly appear out of thin air, with little of the causal context that humans prefer when weighing their options."
      ]
@@ -759,7 +759,7 @@
      "source": [
       "## An algorithm for human deceit\n",
       "\n",
-      "Likely the most common statistical task is estimating the frequency of events. However, there is a difference between the *observed frequency* and the *true frequency* of an event. The true frequency can be interpreted as the probability of an event occuring. For example, the true frequency of rolling a 1 on a 6-sided die is 0.166. Knowing the frequency of events like baseball home runs, frequency of social attributes, fraction of internet users with cats etc. are common requests we ask of Nature. Unfortunately, in general Nature hides the true frequency from us and we must *infer* it from observed data.\n",
+      "Likely the most common statistical task is estimating the frequency of events. However, there is a difference between the *observed frequency* and the *true frequency* of an event. The true frequency can be interpreted as the probability of an event occurring. For example, the true frequency of rolling a 1 on a 6-sided die is 0.166. Knowing the frequency of events like baseball home runs, frequency of social attributes, fraction of internet users with cats etc. are common requests we ask of Nature. Unfortunately, in general Nature hides the true frequency from us and we must *infer* it from observed data.\n",
       "\n",
       "The *observed frequency* is then the frequency we observe: say rolling the die 100 times you may observe 20 rolls of 1. The observed frequency, 0.2, differs from the true frequency, 0.166. We can use Bayesian statistics to infer probable values of the true frequency using an appropriate prior and observed data.\n",
       "\n",
@@ -769,7 +769,7 @@
       "\n",
       "### The Binomial Distribution\n",
       "\n",
-      "The binomial distribution is one of the most popular distributions, mostly because of its simplicity and usefulness. Unlike the other distributions we have encountered thus far in the book, the binomial distribution has 2 parameters: $N$, a positive integer representing $N$ trials or number of instances of potential events, and $p$, the probability of an event occuring in a single trial. Like the Poisson distribution, it is a discrete distribution, but unlike the Poisson distribution, it only weighs integers from $0$ to $N$. The mass distribution looks like:\n",
+      "The binomial distribution is one of the most popular distributions, mostly because of its simplicity and usefulness. Unlike the other distributions we have encountered thus far in the book, the binomial distribution has 2 parameters: $N$, a positive integer representing $N$ trials or number of instances of potential events, and $p$, the probability of an event occurring in a single trial. Like the Poisson distribution, it is a discrete distribution, but unlike the Poisson distribution, it only weighs integers from $0$ to $N$. The mass distribution looks like:\n",
       "\n",
       "$$P( X = k ) =  {{N}\\choose{k}}  p^k(1-p)^{N-k}$$\n",
       "\n",
@@ -829,7 +829,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "##### Example: Cheating amoung students\n",
+      "##### Example: Cheating among students\n",
       "\n",
       "We will use the binomial distribution to determine the frequency of students cheating during an exam. If we let $N$ be the total number of students who took the exam, and assuming each student is interviewed post-exam (answering without consequence), we will receive integer $X$ \"Yes I did cheat\" answers. We then find the posterior distribution of $p$, given $N$, some specified prior on $p$, and observed data $X$. \n",
       "\n",
@@ -844,7 +844,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "Suppose 100 students are being surveyed for cheating, and we wish to find $p$, the proportion of cheaters. There a few ways we can model this in PyMC. I'll demonstrate the most explict way, and later show a simplified version. Both versions arrive at the same inference. In our data-generation model, we sample $p$, the true proportion of cheaters, from a prior. Since we are quite ignorant about $p$, we will assign it a $\\text{Uniform}(0,1)$ prior."
+      "Suppose 100 students are being surveyed for cheating, and we wish to find $p$, the proportion of cheaters. There a few ways we can model this in PyMC. I'll demonstrate the most explicit way, and later show a simplified version. Both versions arrive at the same inference. In our data-generation model, we sample $p$, the true proportion of cheaters, from a prior. Since we are quite ignorant about $p$, we will assign it a $\\text{Uniform}(0,1)$ prior."
      ]
     },
     {
@@ -1115,9 +1115,9 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "I could have typed `p_skewed  = 0.5*p + 0.25` instead for a one-liner, as the elementary operations of addition and scalar multiplication will implicity create a `deterministic` variable, but I wanted to make the deterministic boilerplate explicit for clarity's sake. \n",
+      "I could have typed `p_skewed  = 0.5*p + 0.25` instead for a one-liner, as the elementary operations of addition and scalar multiplication will implicitly create a `deterministic` variable, but I wanted to make the deterministic boilerplate explicit for clarity's sake. \n",
       "\n",
-      "If we know the probability of respondents saying \"Yes\", which is `p_skewed`, and we have $N=100$ students, the number of \"Yes\" responses is a binomial random variable with paramters `N` and `p_skewed`.\n",
+      "If we know the probability of respondents saying \"Yes\", which is `p_skewed`, and we have $N=100$ students, the number of \"Yes\" responses is a binomial random variable with parameters `N` and `p_skewed`.\n",
       "\n",
       "This is were we include our observed 35 \"Yes\" responses. In the declaration of the `mc.Binomial`, we include `value = 35` and `observed = True`."
      ]
@@ -1269,7 +1269,7 @@
       " alpha = 0.5) \n",
       "plt.yticks([0,1])\n",
       "plt.ylabel(\"Damage Incident?\")\n",
-      "plt.xlabel(\"Outside temperature (Farhenhit)\" )\n",
+      "plt.xlabel(\"Outside temperature (Fahrenheit)\" )\n",
       "plt.title(\"Defects of the Space Shuttle O-Rings vs temperature\");\n"
      ],
      "language": "python",
@@ -1323,7 +1323,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "It looks clear that *the probability* of damage incidents occuring increases as the outside temperature decreases. We are interested in modeling the probability here because it does not look like there is a strict cutoff point between temperature and a damage incident ocurring. The best we can do is ask \"At temperature $t$, what is the probability of a damage incident?\". The goal of this example is to answer that question.\n",
+      "It looks clear that *the probability* of damage incidents occurring increases as the outside temperature decreases. We are interested in modeling the probability here because it does not look like there is a strict cutoff point between temperature and a damage incident occurring. The best we can do is ask \"At temperature $t$, what is the probability of a damage incident?\". The goal of this example is to answer that question.\n",
       "\n",
       "We need a function of temperature, call it $p(t)$, that is bounded between 0 and 1 (so as to model a probability) and changes from 1 to 0 as we increase temperature. There are actually many such functions, but the most popular choice is the *logistic function.*\n",
       "\n",
@@ -1506,7 +1506,7 @@
       "\n",
       "$$ \\text{Defect Incident, $D_i$} \\sim \\text{Ber}( \\;p(t_i)\\; ), \\;\\; i=1..N$$\n",
       "\n",
-      "where $p(t)$ is our logistic function and $t_i$ are the temperatures we have observations about. Notice in the above code we had to set the values of `beta` and `alpha` to 0. The reason for this is that if `beta` and `alpha` are very large, they make `p` equal to 1 or 0. Unfortunately, `mc.Bernoulli` does not like probabilities of exactly 0 or 1, though they are mathematically well-defined probabilties. So by setting the coefficient values to `0`, we set the variable `p` to be a reasonable starting value. This has no effect on our results, nor does it mean we are including any additional information in our prior. It is simply a computational caveat in PyMC. "
+      "where $p(t)$ is our logistic function and $t_i$ are the temperatures we have observations about. Notice in the above code we had to set the values of `beta` and `alpha` to 0. The reason for this is that if `beta` and `alpha` are very large, they make `p` equal to 1 or 0. Unfortunately, `mc.Bernoulli` does not like probabilities of exactly 0 or 1, though they are mathematically well-defined probabilities. So by setting the coefficient values to `0`, we set the variable `p` to be a reasonable starting value. This has no effect on our results, nor does it mean we are including any additional information in our prior. It is simply a computational caveat in PyMC. "
      ]
     },
     {
@@ -1716,7 +1716,7 @@
      "source": [
       "### What about the day of the Challenger disaster?\n",
       "\n",
-      "On the day of the Challenger disaster, the outside temperature was 31 degrees fahrenheit. What is the posterior distribution of a defect occuring,  given this temperature? The distribution is plotted below. It looks almost guaranteed that the Challenger was going to be subject to defective O-rings."
+      "On the day of the Challenger disaster, the outside temperature was 31 degrees Fahrenheit. What is the posterior distribution of a defect occurring,  given this temperature? The distribution is plotted below. It looks almost guaranteed that the Challenger was going to be subject to defective O-rings."
      ]
     },
     {
@@ -1748,11 +1748,11 @@
      "source": [
       "### Is our model appropriate?\n",
       "\n",
-      "The skeptical reader will say \"You deliberately chose the logistic function for $p(t)$ and the specific priors. Perhaps other functions or priors will give different results. How do I know I have chosen a good model?\" This is absolutely true. To consider an extreme situation, what if I had chosen the function $p(t) = 1,\\; \\forall t$, which guarantees a defect always occurring: I would have again predicted disaster on January 28th. Yet this is clearly a poorly chosen model. On the other hand, if I did choose the logistic function for $p(t)$, but specificed all my priors to be very tight around 0, likely we would have very different posterior distributions. How do we know our model is an expression of the data? This encourages us to measure the model's **goodness of fit**.\n",
+      "The skeptical reader will say \"You deliberately chose the logistic function for $p(t)$ and the specific priors. Perhaps other functions or priors will give different results. How do I know I have chosen a good model?\" This is absolutely true. To consider an extreme situation, what if I had chosen the function $p(t) = 1,\\; \\forall t$, which guarantees a defect always occurring: I would have again predicted disaster on January 28th. Yet this is clearly a poorly chosen model. On the other hand, if I did choose the logistic function for $p(t)$, but specified all my priors to be very tight around 0, likely we would have very different posterior distributions. How do we know our model is an expression of the data? This encourages us to measure the model's **goodness of fit**.\n",
       "\n",
-      "We can think: *how can we test whether our model is a bad fit?* An idea is to compare observed data (which if we recall is a *fixed* stochastic variable) with artifical dataset which we can simulate. The rational is that if the simulated dataset does not appear similar, statistically, to the observed dataset, then likely our model is not accurately represented the observed data. \n",
+      "We can think: *how can we test whether our model is a bad fit?* An idea is to compare observed data (which if we recall is a *fixed* stochastic variable) with artificial dataset which we can simulate. The rational is that if the simulated dataset does not appear similar, statistically, to the observed dataset, then likely our model is not accurately represented the observed data. \n",
       "\n",
-      "Previously in this Chapter, we simulated artifical dataset for the SMS example. To do this, we sampled values from the priors. We saw how varied the resulting datasets looked like, and rarely did they mimic our observed dataset. In the current example,  we should sample from the *posterior* distributions to create *very plausible datasets*. Luckily, our Bayesian framework makes this very easy. We only need to create a new `Stochastic` variable, that is exactly the same as our variable that stored the observations, but minus the observations themselves. If you recall, our `Stochastic` variable that stored our observed data was:\n",
+      "Previously in this Chapter, we simulated artificial dataset for the SMS example. To do this, we sampled values from the priors. We saw how varied the resulting datasets looked like, and rarely did they mimic our observed dataset. In the current example,  we should sample from the *posterior* distributions to create *very plausible datasets*. Luckily, our Bayesian framework makes this very easy. We only need to create a new `Stochastic` variable, that is exactly the same as our variable that stored the observations, but minus the observations themselves. If you recall, our `Stochastic` variable that stored our observed data was:\n",
       "\n",
       "    observed = mc.Bernoulli( \"bernoulli_obs\", p, value = D, observed=True)\n",
       "\n",
@@ -2006,7 +2006,7 @@
       "plt.title(\"Temperature-dependent model\")\n",
       "\n",
       "# perfect model\n",
-      "# i.e. the probability of defect is equal to if a defect occured or not.\n",
+      "# i.e. the probability of defect is equal to if a defect occurred or not.\n",
       "p = D\n",
       "separation_plot( p, D)\n",
       "plt.title(\"Perfect model\")\n",
@@ -2204,4 +2204,4 @@
    "metadata": {}
   }
  ]
-}
+}