You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Chapter1_Introduction/Chapter1_Introduction.ipynb
+19-19Lines changed: 19 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -213,7 +213,7 @@
213
213
"source": [
214
214
"We can see the biggest gains if we observe the $X$ tests passed are when the prior probability, $p$, is low. Let's settle on a specific value for the prior. I'm a (I think) strong programmer, so I'm going to give myself a realistic prior of 0.20, that is, there is a 20% chance that I write code bug-free. To be more realistic, this prior should be a function of how complicated and large the code is, but let's pin it at 0.20. Then my updated belief that my code is bug-free is 0.33. \n",
215
215
"\n",
216
-
"Let's not forget from the idea that the prior is a probability distribution: $p$ is the prior probability that there *are no bugs*, so $1-p$ is the prior probability that there *are bugs*. What does our prior probability distribution look like?\n",
216
+
"Recall that the prior is a probability distribution: $p$ is the prior probability that there *are no bugs*, so $1-p$ is the prior probability that there *are bugs*.\n",
217
217
"\n",
218
218
"Similarly, our posterior is also a probability distribution, with $P(A | X)$ the probability there is no bug *given we saw all tests pass*, hence $1-P(A|X)$ is the probability there is a bug *given all tests passed*. What does our posterior probability distribution look like? Below is a graph of both the prior and the posterior distributions. \n"
219
219
]
@@ -256,10 +256,10 @@
256
256
"cell_type": "markdown",
257
257
"metadata": {},
258
258
"source": [
259
-
"Notice that after we observed $X$ occur, the probability of no bugs present increased. By increasing the number of tests, we can approach confidence (probability 1) that there are not bugs.\n",
259
+
"Notice that after we observed $X$ occur, the probability of bugs being abscent. By increasing the number of tests, we can approach confidence (probability 1) that there are no bugs present.\n",
260
260
"\n",
261
261
"\n",
262
-
"This was a very simple example, but the mathematics from here only becomes difficult except for artifically constructed instances. We will see that this math is actually unnecessary. First we must broaden our modeling tools. "
262
+
"This was a very simple example of Bayesian inference and Bayes rule. Unfortunately, the mathematics necessary to perform more complicated Bayesian inference only becomes more difficult, except for artifically constructed cases. We will later see that this type of mathematical anaylsis is actually unnecessary. First we must broaden our modeling tools. "
"We will use this property often, so it's something useful to remember. Below we plot the probablity mass distribution for different $\\lambda$ values. The first thing to notice is that by increasing $\\lambda$ we add more probability to larger values occuring. The second notice is although the graph ends at 15, the distributions do not. They assign positive probability to every integer."
297
+
"We will use this property often, so it's something useful to remember. Below we plot the probablity mass distribution for different $\\lambda$ values. The first thing to notice is that by increasing $\\lambda$ we add more probability to larger values occuring. Secondly, notice that although the graph ends at 15, the distributions do not. They assign positive probability to every non-negative integer."
298
298
]
299
299
},
300
300
{
@@ -454,15 +454,15 @@
454
454
"cell_type": "markdown",
455
455
"metadata": {},
456
456
"source": [
457
-
"Before we begin, with resepect to the plt above, do you say think there is a change in behaviour? \n",
457
+
"Before we begin, with resepect to the plot above, do you say think there is a change in behaviour?\n",
458
458
"\n",
459
-
"How can we start to model this? Well, as I conveniently already introduced, a Poisson random variable would be a very appropriate model for this *count* data. Denoting a day $i$'s text-message count $C_i$, \n",
459
+
"How can we start to model this? Well, as I conveniently already introduced, a Poisson random variable would be a very appropriate model for this *count* data. Denoting day $i$'s text-message count by $C_i$, \n",
460
460
"\n",
461
461
"$$ C_i \\sim \\text{Poisson}(\\lambda) $$\n",
462
462
"\n",
463
-
"We are not sure about what the $\\lambda$ parameter is though. Looking at the chart above, it appears that the rate might become higher at some later date, which is equivalently saying the parameter $\\lambda$ increases at some later date (recall a higher $\\lambda$ means more probability on larger outcomes).\n",
463
+
"We are not sure about what the $\\lambda$ parameter is though. Looking at the chart above, it appears that the rate might become higher at some later date, which is equivalently saying the parameter $\\lambda$ increases at some later date (recall a higher $\\lambda$ means more probability on larger outcomes, that is, higher probability of many texts.).\n",
464
464
"\n",
465
-
"How can we mathematically represent this? We can think, that at some later date (call it $\\tau$), the parameter $\\lambda$ suddenly jumps to a higher value. So we create two $\\lambda$ parameters, one for before the day $\\tau$, and one for after. In literature, a sudden transition like this would be called a *switchpoint*:\n",
465
+
"How can we mathematically represent this? We can think, that at some later date (call it $\\tau$), the parameter $\\lambda$ suddenly jumps to a higher value. So we create two $\\lambda$ parameters, one for behaviour before the $\\tau$, and one for behaviour after. In literature, a sudden transition like this would be called a *switchpoint*:\n",
466
466
"\n",
467
467
"$$\n",
468
468
"\\lambda = \n",
@@ -473,20 +473,20 @@
473
473
"$$\n",
474
474
"\n",
475
475
"\n",
476
-
" If, in reality, no sudden change occurred, the $\\lambda$'s should look about equal. What would be a good prior distribution on $\\lambda_{1}$ and $\\lambda_2$?\n",
477
-
"\n",
478
-
"Recall that $\\lambda_i, \\; i=1,2,$ can be any positive number. The *exponential* random variable has a density function for any positive number. This would be a good choice to model $\\lambda_i$. But again, we need a parameter for this exponential distribution: call it $\\alpha$.\n",
476
+
" If, in reality, no sudden change occurred and indeed $\\lambda_1 = \\lambda_2$, the $\\lambda$'s posterior distributions should look about equal. \n",
477
+
"\n"
478
+
"What would be good prior distributions for $\\lambda_1$ and $\\lambda_2$? Recall that $\\lambda_i, \\; i=1,2,$ can be any positive number. The *exponential* random variable has a density function for any positive number. This would be a good choice to model $\\lambda_i$. But again, we need a parameter for this exponential distribution: call it $\\alpha$.\n",
"$\\alpha$ is called a *hyper-parameter*, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since \n",
485
+
"$\\alpha$ is called a *hyper-parameter*, or *parent* variable, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since \n",
"Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda$; creating two exponential distributions with different $\\alpha$ values reflects our belief was that the rate changed (increased) after some period.\n",
489
+
"Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda_i$; creating two exponential distributions with different $\\alpha$ values reflects our belief was that the rate changed (increased) after some period.\n",
490
490
"\n",
491
491
"What about $\\tau$? Well, due to the randomness, it is too difficult to pick out when $\\tau$ might have occurred. Instead, we can assign an *uniform prior belief* to every possible day. This is equivalent to saying\n",
492
492
"\n",
@@ -631,7 +631,7 @@
631
631
"cell_type": "markdown",
632
632
"metadata": {},
633
633
"source": [
634
-
"The above code will be explained in the Chapter 2, but this is where our results come from. The machinery being employed is called *Monte Carlo Markov Chains*. It returns random variables from the posterior distributions of $\\lambda_1, \\lambda_2$ and $\\tau$. Be can plot a histogram of the random variables to see what the posterior distribution looks like. Below, we collect the samples (called *traces* in MCMC literature). "
634
+
"The above code will be explained in the Chapter 3, but this is where our results come from. The machinery being employed is called *Monte Carlo Markov Chains* (which I delay explaining until Chapter 3). It returns thousands of random variables from the posterior distributions of $\\lambda_1, \\lambda_2$ and $\\tau$. We can plot a histogram of the random variables to see what the posterior distribution looks like. Below, we collect the samples (called *traces* in MCMC literature) in histograms. "
635
635
]
636
636
},
637
637
{
@@ -702,11 +702,11 @@
702
702
"source": [
703
703
"### Interpretation\n",
704
704
"\n",
705
-
"Recall that the Bayesian methodology returns a *distribution*, hence we now have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be. What other observations can you make? Look at the data again, do they seem reasonable? The distributions of the two $\\lambda$s look very different, indicating that it's likely there was a change in the user's text-message behavior.\n",
705
+
"Recall that the Bayesian methodology returns a *distribution*, hence we now have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be: $\\lambda_1$ is around 18 and $\\lambda_2$ is around 23. What other observations can you make? Look at the data again, do these seem reasonable? The distributions of the two $\\lambda$s are positioned very differently, indicating that it's likely there was a change in the user's text-message behaviour.\n",
706
706
"\n",
707
-
"Also notice that posteriors' distributions do not look like any Poisson distributions. They are really not anything we recognize. But this is OK. This is one of the benefits of taking a computational point-of-view. If we had instead done this mathematically, we would have been stuck with a very intractable (and messy) distribution. Via computations, we are agnostic to the tractability.\n",
707
+
"Also notice that posteriors' distributions do not look like any Poisson distributions, though we originally started modelling with Poisson random variables. They are really not anything we recognize. But this is OK. This is one of the benefits of taking a computational point-of-view. If we had instead done this mathematically, we would have been stuck with a very analytically intractable (and messy) distribution. Via computations, we are agnostic to the tractability.\n",
708
708
"\n",
709
-
"Our analysis also returned a distribution for what $\\tau$ might be. Had no change occurred, or the change been gradual, the posterior distribution of $\\tau$ would have been more spread out. On the contrary, it is very peaked. It appears that near day 50, the individual's text-message behavior suddenly changed. "
709
+
"Our analysis also returned a distribution for what $\\tau$ might be. Had no change occurred, or the change been gradual over time, the posterior distribution of $\\tau$ would have been more spread out, relfecting that many values are likely candidates for $\\tau$. On the contrary, it is very peaked. It appears that near day 45, the individual's text-message behavior suddenly changed. "
710
710
]
711
711
},
712
712
{
@@ -718,7 +718,7 @@
718
718
"\n",
719
719
"We will deal with this question for the remainder of the book, and it is an understatement to say we can perform amazingly useful things. For now, let's finishing with using posterior samples to answer the following question: what is the expected number of texts at day $t, \\; 0 \\le t \\le70$? Recall that the expected value of a Poisson is equal to its parameter $\\lambda$, then the question is equivalent to *what is the expected value of $\\lambda$ at time $t$*?\n",
720
720
"\n",
721
-
"In the code below, we are calculating the following: Let $i$ index a particular sample from the posterior distributions. Given a day $t$, we average over all $\\lambda_i$ on that day $t$, using $\\lambda_{1,i}$ if $t \\lt \\tau_i$ else we use $\\lambda_{2,i}$. \n",
721
+
"In the code below, we are calculating the following: Let $i$ index samples from the posterior distributions. Given a day $t$, we average over all possible $\\lambda_i$ for that day $t$, using $\\lambda_i = \\lambda_{1,i}$ if $t \\lt \\tau_i$ (that is, if the behaviour change hadn't happened yet), else we use $\\lambda_i = \\lambda_{2,i}$. \n",
0 commit comments