You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Chapter1_Introduction/Chapter1_Introduction.ipynb
+21-22Lines changed: 21 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@
63
63
"\n",
64
64
"- A medical patient is exhibiting symptoms $x$, $y$ and $z$. There are a number of diseases that could be causing all of them, but only a single disease is present. A doctor has beliefs about which disease.\n",
65
65
"\n",
66
-
"- You believe that the pretty girl in your English class doesn't have a crush you. You assign a low probability that she does. She, on the other hand, knows for certain that she *does indeed* like you. She (implicitly) assigns a probability 1. \n",
66
+
"- You believe that the beautiful girl in your English class doesn't have a crush you. You assign a low probability that she does. She, on the other hand, knows for certain that she *does indeed* like you. She (implicitly) assigns a probability 1. \n",
67
67
"\n",
68
68
"This philosophy of treating beliefs as probability is natural to humans. We employ it constantly as we interact with the world and only see partial evidence. Alternatively, you have to be *trained* to think like a frequentist. \n",
69
69
"\n",
@@ -75,7 +75,7 @@
75
75
"\n",
76
76
"2\\. $P(A):\\;\\;$ The patient could have any number of diseases. $P(A | X):\\;\\;$ Performing a blood test generated evidence $X$, ruling out some of the possible diseases from consideration.\n",
77
77
"\n",
78
-
"3\\. $P(A):\\;\\;$ That girl in your class probably doesn't have a crush on you. $P(A | X): \\;\\;$ She sent you an SMS message about some statistics homework. Maybe she does like me... \n",
78
+
"3\\. $P(A):\\;\\;$ That beautiful girl in your class probably doesn't have a crush on you. $P(A | X): \\;\\;$ She sent you an SMS message about this Friday night. Interesting... \n",
79
79
"\n",
80
80
"It's clear that in each example we did not completely discard the prior belief after seeing new evidence, but we *re-weighted the prior* to incorporate the new evidence (i.e. we put more weight, or confidence, on some beliefs versus others). \n",
81
81
"\n",
@@ -213,7 +213,7 @@
213
213
"source": [
214
214
"We can see the biggest gains if we observe the $X$ tests passed are when the prior probability, $p$, is low. Let's settle on a specific value for the prior. I'm a (I think) strong programmer, so I'm going to give myself a realistic prior of 0.20, that is, there is a 20% chance that I write code bug-free. To be more realistic, this prior should be a function of how complicated and large the code is, but let's pin it at 0.20. Then my updated belief that my code is bug-free is 0.33. \n",
215
215
"\n",
216
-
"Let's not forget from the idea that the prior is a probability distribution: $p$ is the prior probability that there *are no bugs*, so $1-p$ is the prior probability that there *are bugs*. What does our prior probability distribution look like?\n",
216
+
"Recall that the prior is a probability distribution: $p$ is the prior probability that there *are no bugs*, so $1-p$ is the prior probability that there *are bugs*.\n",
217
217
"\n",
218
218
"Similarly, our posterior is also a probability distribution, with $P(A | X)$ the probability there is no bug *given we saw all tests pass*, hence $1-P(A|X)$ is the probability there is a bug *given all tests passed*. What does our posterior probability distribution look like? Below is a graph of both the prior and the posterior distributions. \n"
219
219
]
@@ -256,10 +256,9 @@
256
256
"cell_type": "markdown",
257
257
"metadata": {},
258
258
"source": [
259
-
"Notice that after we observed $X$ occur, the probability of no bugs present increased. By increasing the number of tests, we can approach confidence (probability 1) that there are not bugs.\n",
259
+
"Notice that after we observed $X$ occur, the probability of bugs being abscent. By increasing the number of tests, we can approach confidence (probability 1) that there are no bugs present.\n",
260
260
"\n",
261
-
"\n",
262
-
"This was a very simple example, but the mathematics from here only becomes difficult except for artifically constructed instances. We will see that this math is actually unnecessary. First we must broaden our modeling tools. "
261
+
"This was a very simple example of Bayesian inference and Bayes rule. Unfortunately, the mathematics necessary to perform more complicated Bayesian inference only becomes more difficult, except for artifically constructed cases. We will later see that this type of mathematical anaylsis is actually unnecessary. First we must broaden our modeling tools."
"We will use this property often, so it's something useful to remember. Below we plot the probablity mass distribution for different $\\lambda$ values. The first thing to notice is that by increasing $\\lambda$ we add more probability to larger values occuring. The second notice is although the graph ends at 15, the distributions do not. They assign positive probability to every integer."
296
+
"We will use this property often, so it's something useful to remember. Below we plot the probablity mass distribution for different $\\lambda$ values. The first thing to notice is that by increasing $\\lambda$ we add more probability to larger values occuring. Secondly, notice that although the graph ends at 15, the distributions do not. They assign positive probability to every non-negative integer.."
298
297
]
299
298
},
300
299
{
@@ -454,15 +453,15 @@
454
453
"cell_type": "markdown",
455
454
"metadata": {},
456
455
"source": [
457
-
"Before we begin, with resepect to the plt above, do you say think there is a change in behaviour? \n",
456
+
"Before we begin, with resepect to the plot above, do you say think there is a change in behaviour? \n",
458
457
"\n",
459
-
"How can we start to model this? Well, as I conveniently already introduced, a Poisson random variable would be a very appropriate model for this *count* data. Denoting a day $i$'s text-message count $C_i$, \n",
458
+
"How can we start to model this? Well, as I conveniently already introduced, a Poisson random variable would be a very appropriate model for this *count* data. Denoting day $i$'s text-message count by $C_i$, \n",
460
459
"\n",
461
460
"$$ C_i \\sim \\text{Poisson}(\\lambda) $$\n",
462
461
"\n",
463
-
"We are not sure about what the $\\lambda$ parameter is though. Looking at the chart above, it appears that the rate might become higher at some later date, which is equivalently saying the parameter $\\lambda$ increases at some later date (recall a higher $\\lambda$ means more probability on larger outcomes).\n",
462
+
"We are not sure about what the $\\lambda$ parameter is though. Looking at the chart above, it appears that the rate might become higher at some later date, which is equivalently saying the parameter $\\lambda$ increases at some later date (recall a higher $\\lambda$ means more probability on larger outcomes, that is, higher probability of many texts.).\n",
464
463
"\n",
465
-
"How can we mathematically represent this? We can think, that at some later date (call it $\\tau$), the parameter $\\lambda$ suddenly jumps to a higher value. So we create two $\\lambda$ parameters, one for before the day $\\tau$, and one for after. In literature, a sudden transition like this would be called a *switchpoint*:\n",
464
+
"How can we mathematically represent this? We can think, that at some later date (call it $\\tau$), the parameter $\\lambda$ suddenly jumps to a higher value. So we create two $\\lambda$ parameters, one for behaviour before the $\\tau$, and one for behaviour after. In literature, a sudden transition like this would be called a *switchpoint*:\n",
466
465
"\n",
467
466
"$$\n",
468
467
"\\lambda = \n",
@@ -473,20 +472,20 @@
473
472
"$$\n",
474
473
"\n",
475
474
"\n",
476
-
" If, in reality, no sudden change occurred, the $\\lambda$'s should look about equal. What would be a good prior distribution on $\\lambda_{1}$ and $\\lambda_2$? \n",
475
+
" If, in reality, no sudden change occurred and indeed $\\lambda_1 = \\lambda_2$, the $\\lambda$'s posterior distributions should look about equal.\n",
477
476
"\n",
478
-
"Recall that $\\lambda_i, \\; i=1,2,$ can be any positive number. The *exponential* random variable has a density function for any positive number. This would be a good choice to model $\\lambda_i$. But again, we need a parameter for this exponential distribution: call it $\\alpha$.\n",
477
+
"What would be good prior distributions for $\\lambda_1$ and $\\lambda_2$? Recall that $\\lambda_i, \\; i=1,2,$ can be any positive number. The *exponential* random variable has a density function for any positive number. This would be a good choice to model $\\lambda_i$. But again, we need a parameter for this exponential distribution: call it $\\alpha$.\n",
"$\\alpha$ is called a *hyper-parameter*, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since \n",
484
+
"$\\alpha$ is called a *hyper-parameter*, or a *parent-variable*, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since \n",
"Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda$; creating two exponential distributions with different $\\alpha$ values reflects our belief was that the rate changed (increased) after some period.\n",
488
+
"Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda_i$; creating two exponential distributions with different $\\alpha$ values reflects our belief was that the rate changed (increased) after some period.\n",
490
489
"\n",
491
490
"What about $\\tau$? Well, due to the randomness, it is too difficult to pick out when $\\tau$ might have occurred. Instead, we can assign an *uniform prior belief* to every possible day. This is equivalent to saying\n",
492
491
"\n",
@@ -631,7 +630,7 @@
631
630
"cell_type": "markdown",
632
631
"metadata": {},
633
632
"source": [
634
-
"The above code will be explained in the Chapter 2, but this is where our results come from. The machinery being employed is called *Monte Carlo Markov Chains*. It returns random variables from the posterior distributions of $\\lambda_1, \\lambda_2$ and $\\tau$. Be can plot a histogram of the random variables to see what the posterior distribution looks like. Below, we collect the samples (called *traces* in MCMC literature). "
633
+
"The above code will be explained in the Chapter 3, but this is where our results come from. The machinery being employed is called *Monte Carlo Markov Chains* (which I delay explaining until Chapter 3). It returns thousands of random variables from the posterior distributions of $\\lambda_1, \\lambda_2$ and $\\tau$. We can plot a histogram of the random variables to see what the posterior distribution looks like. Below, we collect the samples (called *traces* in MCMC literature) in histograms."
635
634
]
636
635
},
637
636
{
@@ -702,11 +701,11 @@
702
701
"source": [
703
702
"### Interpretation\n",
704
703
"\n",
705
-
"Recall that the Bayesian methodology returns a *distribution*, hence we now have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be. What other observations can you make? Look at the data again, do they seem reasonable? The distributions of the two $\\lambda$s look very different, indicating that it's likely there was a change in the user's text-message behavior.\n",
704
+
"Recall that the Bayesian methodology returns a *distribution*, hence we now have distributions to describe the unknown $\\lambda$'s and $\\tau$. What have we gained? Immediately we can see the uncertainty in our estimates: the more variance in the distribution, the less certain our posterior belief should be. We can also say what a plausible value for the parameters might be: $\\lambda_1$ is around 18 and $\\lambda_2$ is around 23. What other observations can you make? Look at the data again, do these seem reasonable? The distributions of the two $\\\\lambda$s are positioned very differently, indicating that it's likely there was a change in the user's text-message behaviour.\n",
706
705
"\n",
707
-
"Also notice that posteriors' distributions do not look like any Poisson distributions. They are really not anything we recognize. But this is OK. This is one of the benefits of taking a computational point-of-view. If we had instead done this mathematically, we would have been stuck with a very intractable (and messy) distribution. Via computations, we are agnostic to the tractability.\n",
706
+
"Also notice that posteriors' distributions do not look like any Poisson distributions, though we originally started modelling with Poisson random variables. They are really not anything we recognize. But this is OK. This is one of the benefits of taking a computational point-of-view. If we had instead done this mathematically, we would have been stuck with a very analytically intractable (and messy) distribution. Via computations, we are agnostic to the tractability.\n",
708
707
"\n",
709
-
"Our analysis also returned a distribution for what $\\tau$ might be. Had no change occurred, or the change been gradual, the posterior distribution of $\\tau$ would have been more spread out. On the contrary, it is very peaked. It appears that near day 50, the individual's text-message behavior suddenly changed."
708
+
"Our analysis also returned a distribution for what $\\tau$ might be. Had no change occurred, or the change been gradual over time, the posterior distribution of $\\tau$ would have been more spread out, reflecting that many values are likely candidates for $\\tau$. On the contrary, it is very peaked. It appears that near day 45, the individual's text-message behavior suddenly changed."
710
709
]
711
710
},
712
711
{
@@ -718,7 +717,7 @@
718
717
"\n",
719
718
"We will deal with this question for the remainder of the book, and it is an understatement to say we can perform amazingly useful things. For now, let's finishing with using posterior samples to answer the following question: what is the expected number of texts at day $t, \\; 0 \\le t \\le70$? Recall that the expected value of a Poisson is equal to its parameter $\\lambda$, then the question is equivalent to *what is the expected value of $\\lambda$ at time $t$*?\n",
720
719
"\n",
721
-
"In the code below, we are calculating the following: Let $i$ index a particular sample from the posterior distributions. Given a day $t$, we average over all $\\lambda_i$ on that day $t$, using $\\lambda_{1,i}$ if $t \\lt \\tau_i$ else we use $\\lambda_{2,i}$. \n",
720
+
"In the code below, we are calculating the following: Let $i$ index samples from the posterior distributions. Given a day $t$, we average over all possible $\\\\lambda_i$ for that day $t$, using $\\lambda_i = \\lambda_{1,i}$ if $t \\lt \\tau_i$ (that is, if the behaviour change hadn't occured yet), else we use $\\lambda_i = \\lambda_{2,i}$. \n",
722
721
"\n",
723
722
"\n"
724
723
]
@@ -776,7 +775,7 @@
776
775
"cell_type": "markdown",
777
776
"metadata": {},
778
777
"source": [
779
-
"Our analysis shows strong support that the user's behavior suddenly changed, versus no change ($\\lambda_1$ would appear like $\\lambda_2$ had this been true), versus a gradual change (more variation in the posterior of $\\tau$ had this been true). We can speculate what might have caused this: a cheaper text-message rate, a recent weather-2-text subscription, or a new relationship. (The 45th day corresponds to Christmas, and I moved away to Toronto the next month leaving a lovely girlfriend behind ;)\n"
778
+
"Our analysis shows strong support that the user's behavior suddenly changed, versus no change ($\\lambda_1$ would appear like $\\lambda_2$ had this been true), versus a gradual change (more variation in the posterior of $\\tau$ had this been true). We can speculate what might have caused this: a cheaper text-message rate, a recent weather-2-text subscription, or a new relationship. (The 45th day corresponds to Christmas, and I moved away to Toronto the next month leaving a girlfriend behind)\n"
0 commit comments