Skip to content

Commit a4180aa

Browse files
committed
Merge remote-tracking branch 'upstream/master'
2 parents 53244a6 + dbff64b commit a4180aa

File tree

10 files changed

+95
-85
lines changed

10 files changed

+95
-85
lines changed

Chapter1_Introduction/Chapter1_Introduction.ipynb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
"========\n",
1818
"\n",
1919
"#####Version 0.1\n",
20-
"Welcome to *Bayesian Methods for Hackers*. The full Github repository is available at [github/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers). The other chapters can be found on the projects [homepage](camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/). We hope you enjoy the book, and we encourage any contributions!"
20+
"Welcome to *Bayesian Methods for Hackers*. The full Github repository is available at [github/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers). The other chapters can be found on the project's [homepage](camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/). We hope you enjoy the book, and we encourage any contributions!"
2121
]
2222
},
2323
{
@@ -427,6 +427,7 @@
427427
"a = np.arange(16)\n",
428428
"poi = stats.poisson\n",
429429
"lambda_ = [1.5, 4.25]\n",
430+
"colours = [\"#348ABD\", \"#A60628\"]\n",
430431
"\n",
431432
"plt.bar(a, poi.pmf(a, lambda_[0]), color=colours[0],\n",
432433
" label=\"$\\lambda = %.1f$\" % lambda_[0], alpha=0.60,\n",

Chapter2_MorePyMC/MorePyMC.ipynb

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@
444444
"source": [
445445
"### Finally...\n",
446446
"\n",
447-
"We wrap all the created variables into a `mc.Model` class. With this `Model` class, we can analyze the variables as a single unit. This is an optional step, as the fitting algorithms can be sent an array of the variables rather than a `Model` class. I may or may not use this class in future examples ;)"
447+
"We wrap all the created variables into a `pm.Model` class. With this `Model` class, we can analyze the variables as a single unit. This is an optional step, as the fitting algorithms can be sent an array of the variables rather than a `Model` class. I may or may not use this class in future examples ;)"
448448
]
449449
},
450450
{
@@ -662,7 +662,7 @@
662662
"\n",
663663
"A/B testing is a statistical design pattern for determining the difference of effectiveness between two different treatments. For example, a pharmaceutical company is interested in the effectiveness of drug A vs drug B. The company will test drug A on some fraction of their trials, and drug B on the other fraction (this fraction is often 1/2, but we will relax this assumption). After performing enough trials, the in-house statisticians sift through the data to determine which drug yielded better results. \n",
664664
"\n",
665-
"Similarly, front-end web developers are interested in which design of their website yields more sales or some other metric of interest. They will route some fraction of visitors to site A, and the other fraction to site B, and record if the visit yielded a sale of not. The data is recorded (in real-time), and analyzed afterwards. \n",
665+
"Similarly, front-end web developers are interested in which design of their website yields more sales or some other metric of interest. They will route some fraction of visitors to site A, and the other fraction to site B, and record if the visit yielded a sale or not. The data is recorded (in real-time), and analyzed afterwards. \n",
666666
"\n",
667667
"Often, the post-experiment analysis is done using something called a hypothesis test like *difference of means test* or *difference of proportions test*. This involves often misunderstood quantities like a \"Z-score\" and even more confusing \"p-values\" (please don't ask). If you have taken a statistics course, you have probably been taught this technique (though not necessarily *learned* this technique). And if you were like me, you may have felt uncomfortable with their derivation -- good: the Bayesian approach to this problem is much more natural. \n",
668668
"\n",
@@ -854,7 +854,7 @@
854854
"\n",
855855
"### *A* and *B* Together\n",
856856
"\n",
857-
"A similar analysis can be done for site B's response data to determine the analgous $p_B$. But what we are really interested in is the *difference* between $p_A$ and $p_B$. Let's infer $p_A$, $p_B$, *and* $\\text{delta} = p_A - p_B$, all at once. We can do this using PyMC's deterministic variables. (We'll assume for this exercise that $p_B = 0.04$, so $\\text{delta} = 0.01$, $N_B = 750$ (signifcantly less than $N_A$) and we will simulate site B's data like we did for site A's data )"
857+
"A similar anaylsis can be done for site B's response data to determine the analogous $p_B$. But what we are really interested in is the *difference* between $p_A$ and $p_B$. Let's infer $p_A$, $p_B$, *and* $\\text{delta} = p_A - p_B$, all at once. We can do this using PyMC's deterministic variables. (We'll assume for this exercise that $p_B = 0.04$, so $\\text{delta} = 0.01$, $N_B = 750$ (signifcantly less than $N_A$) and we will simulate site B's data like we did for site A's data )"
858858
]
859859
},
860860
{
@@ -1316,7 +1316,7 @@
13161316
"cell_type": "markdown",
13171317
"metadata": {},
13181318
"source": [
1319-
"Next we need a dataset. After performing our coin-flipped interviews the researchers received 35 \"Yes\" responses. To put this into a relative perspective, if there truly were no cheaters, we should expect to see on average 1/4 of all responses being a \"Yes\" (half chance of having first coin land Tails, and another half chance of having second coin land Heads), so about 25 responses in a cheat-free world. On the other hand, if *all students cheated*, we should expected to see on approximately 3/4 of all response be \"Yes\". \n",
1319+
"Next we need a dataset. After performing our coin-flipped interviews the researchers received 35 \"Yes\" responses. To put this into a relative perspective, if there truly were no cheaters, we should expect to see on average 1/4 of all responses being a \"Yes\" (half chance of having first coin land Tails, and another half chance of having second coin land Heads), so about 25 responses in a cheat-free world. On the other hand, if *all students cheated*, we should expected to see approximately 3/4 of all responses be \"Yes\". \n",
13201320
"\n",
13211321
"The researchers observe a Binomial random variable, with `N = 100` and `p = observed_proportion` with `value = 35`: "
13221322
]
@@ -1460,7 +1460,7 @@
14601460
"\n",
14611461
"If we know the probability of respondents saying \"Yes\", which is `p_skewed`, and we have $N=100$ students, the number of \"Yes\" responses is a binomial random variable with parameters `N` and `p_skewed`.\n",
14621462
"\n",
1463-
"This is were we include our observed 35 \"Yes\" responses. In the declaration of the `mc.Binomial`, we include `value = 35` and `observed = True`."
1463+
"This is were we include our observed 35 \"Yes\" responses. In the declaration of the `pm.Binomial`, we include `value = 35` and `observed = True`."
14641464
]
14651465
},
14661466
{
@@ -1902,7 +1902,7 @@
19021902
"\n",
19031903
"$$ \\text{Defect Incident, $D_i$} \\sim \\text{Ber}( \\;p(t_i)\\; ), \\;\\; i=1..N$$\n",
19041904
"\n",
1905-
"where $p(t)$ is our logistic function and $t_i$ are the temperatures we have observations about. Notice in the above code we had to set the values of `beta` and `alpha` to 0. The reason for this is that if `beta` and `alpha` are very large, they make `p` equal to 1 or 0. Unfortunately, `mc.Bernoulli` does not like probabilities of exactly 0 or 1, though they are mathematically well-defined probabilities. So by setting the coefficient values to `0`, we set the variable `p` to be a reasonable starting value. This has no effect on our results, nor does it mean we are including any additional information in our prior. It is simply a computational caveat in PyMC. "
1905+
"where $p(t)$ is our logistic function and $t_i$ are the temperatures we have observations about. Notice in the above code we had to set the values of `beta` and `alpha` to 0. The reason for this is that if `beta` and `alpha` are very large, they make `p` equal to 1 or 0. Unfortunately, `pm.Bernoulli` does not like probabilities of exactly 0 or 1, though they are mathematically well-defined probabilities. So by setting the coefficient values to `0`, we set the variable `p` to be a reasonable starting value. This has no effect on our results, nor does it mean we are including any additional information in our prior. It is simply a computational caveat in PyMC. "
19061906
]
19071907
},
19081908
{

Chapter2_MorePyMC/separation_plot.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
def separation_plot( p, y, **kwargs ):
1212
"""
13-
This function creates a separation plot for logitisc and probit classification.
13+
This function creates a separation plot for logistic and probit classification.
1414
See http://mdwardlab.com/sites/default/files/GreenhillWardSacks.pdf
1515
1616
p: The proportions/probabilities, can be a nxM matrix which represents M models.
@@ -52,4 +52,4 @@ def separation_plot( p, y, **kwargs ):
5252
return
5353

5454

55-
55+

0 commit comments

Comments
 (0)