|
111 | 111 | "\n",
|
112 | 112 | "Denote $N$ as the number of instances of evidence we possess. As we gather an *infinite* amount of evidence, say as $N \\rightarrow \\infty$, our Bayesian results align with frequentist results. Hence for large $N$, statistical inference is more or less objective. On the other hand, for small $N$, inference is much more *unstable*: frequentist estimates have more variance and larger confidence intervals. This is where Bayesian analysis excels. By introducing a prior, and returning a distribution (instead of a scalar estimate), we *preserve the uncertainity* to reflect the instability of stasticial inference of a small $N$ dataset. \n",
|
113 | 113 | "\n",
|
114 |
| - "One may think that for large $N$, one can be indifferent between the two techniques, and might lean towards the computational-simpler, frequentist methods. An analysist in this position should consider the following quote by Andrew Gelman (2005)[1], before making such a decision:\n", |
| 114 | + "One may think that for large $N$, one can be indifferent between the two techniques, and might lean towards the computational-simpler, frequentist methods. An analyst in this position should consider the following quote by Andrew Gelman (2005)[1], before making such a decision:\n", |
115 | 115 | "\n",
|
116 | 116 | "> Sample sizes are never large. If $N$ is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once $N$ is \"large enough,\" you can start subdividing the data to learn more (for example, in a public opinion poll, once you have a good estimate for the entire country, you can estimate among men and women, northerners and southerners, different age groups, etc etc). $N$ is never enough because if it were \"enough\" you'd already be on to the next problem for which you need more data.\n",
|
117 | 117 | "\n",
|
|
247 | 247 | "cell_type": "markdown",
|
248 | 248 | "metadata": {},
|
249 | 249 | "source": [
|
250 |
| - "Notice that after we observed $X$ occur, the probability of bugs being absent. By increasing the number of tests, we can approach confidence (probability 1) that there are no bugs present.\n", |
| 250 | + "Notice that after we observed $X$ occur, the probability of bugs being absent increased. By increasing the number of tests, we can approach confidence (probability 1) that there are no bugs present.\n", |
251 | 251 | "\n",
|
252 | 252 | "This was a very simple example of Bayesian inference and Bayes rule. Unfortunately, the mathematics necessary to perform more complicated Bayesian inference only becomes more difficult, except for artifically constructed cases. We will later see that this type of mathematical anaylsis is actually unnecessary. First we must broaden our modeling tools."
|
253 | 253 | ]
|
|
405 | 405 | "n_count_data = len(count_data)\n",
|
406 | 406 | "plt.bar( np.arange( n_count_data ), count_data, color =\"#348ABD\" )\n",
|
407 | 407 | "plt.xlabel( \"Time (days)\")\n",
|
408 |
| - "plt.ylabel(\"# Text-msg recieved\")\n", |
| 408 | + "plt.ylabel(\"# Text-msg received\")\n", |
409 | 409 | "plt.title(\"Did the user's texting habits change over time?\")\n",
|
410 | 410 | "plt.xlim( 0, n_count_data );"
|
411 | 411 | ],
|
|
430 | 430 | "cell_type": "markdown",
|
431 | 431 | "metadata": {},
|
432 | 432 | "source": [
|
433 |
| - "Before we begin, with resepect to the plot above, do you say think there is a change in behaviour? \n", |
| 433 | + "Before we begin, with resepect to the plot above, would you say there was a change in behaviour\n", |
| 434 | + "during the time period? \n", |
434 | 435 | "\n",
|
435 | 436 | "How can we start to model this? Well, as I conveniently already introduced, a Poisson random variable would be a very appropriate model for this *count* data. Denoting day $i$'s text-message count by $C_i$, \n",
|
436 | 437 | "\n",
|
|
458 | 459 | "&\\lambda_2 \\sim \\text{Exp}( \\alpha )\n",
|
459 | 460 | "\\end{align}\n",
|
460 | 461 | "\n",
|
461 |
| - "$\\alpha$ is called a *hyper-parameter*, or a *parent-variable*, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since \n", |
| 462 | + "$\\alpha$ is called a *hyper-parameter*, or a *parent-variable*, literally a parameter that influences other parameters. The influence is not too strong, so we can choose $\\alpha$ liberally. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data, since we're modeling $\\\\lambda$ using an Exponential distribution we can use the expected value identity shown earlier to get:\n", |
462 | 463 | "\n",
|
463 | 464 | "$$\\frac{1}{N}\\sum_{i=0}^N \\;C_i \\approx E[\\; \\lambda \\; |\\; \\alpha ] = \\frac{1}{\\alpha}$$ \n",
|
464 | 465 | "\n",
|
465 |
| - "Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda_i$; creating two exponential distributions with different $\\alpha$ values reflects our belief was that the rate changed (increased) after some period.\n", |
| 466 | + "Alternatively, and something I encourage the reader to try, is to have two priors: one for each $\\lambda_i$; creating two exponential distributions with different $\\alpha$ values reflects a prioer belief that the rate changed after some period.\n", |
466 | 467 | "\n",
|
467 | 468 | "What about $\\tau$? Well, due to the randomness, it is too difficult to pick out when $\\tau$ might have occurred. Instead, we can assign an *uniform prior belief* to every possible day. This is equivalent to saying\n",
|
468 | 469 | "\n",
|
|
479 | 480 | "\n",
|
480 | 481 | "PyMC is a Python library for programming Bayesian analysis [3]. It is a fast, well-maintained library. The only unfortunate part is that documentation can be lacking in areas, especially the bridge between problem to solution. One this book's main goals is to solve that problem, and also to demonstrate why PyMC is so cool.\n",
|
481 | 482 | "\n",
|
482 |
| - "We will model the above problem using the PyMC library. This type of programming is called *probabilistic programming*, an unfortunate misnomer that invokes ideas of randomly-generated code and has likely confused and frightened users from this field. The code is not random. The title is given because we create probability models using pogramming variables as the model's components. This will be the last time I use the term *probablistic programming*. Instead, I'll simply use *programming*, as that is what it really is. \n", |
| 483 | + "We will model the above problem using the PyMC library. This type of programming is called *probabilistic programming*, an unfortunate misnomer that invokes ideas of randomly-generated code and has likely confused and frightened users away from this field. The code is not random. The title is given because we create probability models using pogramming variables as the model's components. This will be the last time I use the term *probablistic programming*. Instead, I'll simply use *programming*, as that is what it really is. \n", |
483 | 484 | "\n",
|
484 | 485 | "\n",
|
485 | 486 | "The PyMC code is easy to follow along: the only novel thing should be the syntax, and I will interrupt the code to explain sections. Simply remember we are representing the model's components ($\\tau, \\lambda_1, \\lambda_2$ ) as variables:"
|
|
540 | 541 | "def lambda_( tau = tau, lambda_1 = lambda_1, lambda_2 = lambda_2 ):\n",
|
541 | 542 | " out = np.zeros( n ) \n",
|
542 | 543 | " out[:tau] = lambda_1 #lambda before tau is lambda1\n",
|
543 |
| - " out[tau:] = lambda_2 #lambda after tau is lambda1\n", |
| 544 | + " out[tau:] = lambda_2 #lambda after tau is lambda2\n", |
544 | 545 | " return out"
|
545 | 546 | ],
|
546 | 547 | "language": "python",
|
|
632 | 633 | "figsize(12.5, 10)\n",
|
633 | 634 | "#histogram of the samples:\n",
|
634 | 635 | "\n",
|
635 |
| - "plt.subplot(311)\n", |
| 636 | + "ax = plt.subplot(311)\n", |
| 637 | + "ax.set_autoscaley_on(False)\n", |
636 | 638 | "\n",
|
637 | 639 | "plt.hist( lambda_1_samples, histtype='stepfilled', bins = 70, alpha = 0.85, \n",
|
638 | 640 | " label = \"posterior of $\\lambda_1$\", color = \"#A60628\",normed = True )\n",
|
639 |
| - "plt.legend()\n", |
| 641 | + "plt.legend(loc = \"upper left\")\n", |
640 | 642 | "plt.title(r\"Posterior distributions of the variables $\\lambda_1,\\;\\lambda_2,\\;\\tau$\")\n",
|
| 643 | + "plt.xlim([15,30])\n", |
| 644 | + "plt.xlabel(\"$\\lambda$ value\")\n", |
| 645 | + "plt.ylabel(\"probability\")\n", |
641 | 646 | "\n",
|
642 |
| - "\n", |
643 |
| - "plt.subplot(312)\n", |
| 647 | + "ax = plt.subplot(312)\n", |
| 648 | + "ax.set_autoscaley_on(False)\n", |
644 | 649 | "\n",
|
645 | 650 | "plt.hist( lambda_2_samples,histtype='stepfilled', bins = 70, alpha = 0.85, \n",
|
646 | 651 | " label = \"posterior of $\\lambda_2$\",color=\"#7A68A6\", normed = True )\n",
|
647 | 652 | "plt.legend(loc = \"upper left\")\n",
|
648 |
| - "\n", |
| 653 | + "plt.xlim([15,30])\n", |
| 654 | + "plt.xlabel(\"$\\lambda$ value\")\n", |
| 655 | + "plt.ylabel(\"probability\")\n", |
649 | 656 | "\n",
|
650 | 657 | "plt.subplot(313)\n",
|
651 | 658 | "\n",
|
652 | 659 | "plt.hist( tau_samples, bins = n_count_data, alpha = 0.85, \n",
|
653 | 660 | " label = r\"posterior of $\\tau$\",\n",
|
654 | 661 | " color=\"#467821\", normed = True, histtype='stepfilled' )\n",
|
655 |
| - "plt.legend(loc = \"upper left\");\n" |
| 662 | + "plt.legend(loc = \"upper left\");\n", |
| 663 | + "plt.ylim([0,10])\n", |
| 664 | + "plt.xlim([0, len(count_data)])\n", |
| 665 | + "plt.xlabel(\"days\")\n", |
| 666 | + "plt.ylabel(\"probability\")" |
656 | 667 | ],
|
657 | 668 | "language": "python",
|
658 | 669 | "metadata": {},
|
|
691 | 702 | "###Why would I want samples from the posterior, anyways?\n",
|
692 | 703 | "\n",
|
693 | 704 | "\n",
|
694 |
| - "We will deal with this question for the remainder of the book, and it is an understatement to say we can perform amazingly useful things. For now, let's finishing with using posterior samples to answer the following question: what is the expected number of texts at day $t, \\; 0 \\le t \\le70$? Recall that the expected value of a Poisson is equal to its parameter $\\lambda$, then the question is equivalent to *what is the expected value of $\\lambda$ at time $t$*?\n", |
| 705 | + "We will deal with this question for the remainder of the book, and it is an understatement to say we can perform amazingly useful things. For now, let's , let's end this chapter with one more example. We'll use the posterior samples to answer the following question: what is the expected number of texts at day $t, \\; 0 \\le t \\le70$? Recall that the expected value of a Poisson is equal to its parameter $\\lambda$, then the question is equivalent to *what is the expected value of $\\lambda$ at time $t$*?\n", |
695 | 706 | "\n",
|
696 |
| - "In the code below, we are calculating the following: Let $i$ index samples from the posterior distributions. Given a day $t$, we average over all possible $\\\\lambda_i$ for that day $t$, using $\\lambda_i = \\lambda_{1,i}$ if $t \\lt \\tau_i$ (that is, if the behaviour change hadn't occured yet), else we use $\\lambda_i = \\lambda_{2,i}$. \n", |
| 707 | + "In the code below, we are calculating the following: Let $i$ index samples from the posterior distributions. Given a day $t$, we average over all possible $\\lambda_i$ for that day $t$, using $\\lambda_i = \\lambda_{1,i}$ if $t \\lt \\tau_i$ (that is, if the behaviour change hadn't occured yet), else we use $\\lambda_i = \\lambda_{2,i}$. \n", |
697 | 708 | "\n",
|
698 | 709 | "\n"
|
699 | 710 | ]
|
|
711 | 722 | "collapsed": false,
|
712 | 723 | "input": [
|
713 | 724 | "figsize( 12.5, 4)\n",
|
| 725 | + "# tau_samples, lambda_1_samples, lambda_2_samples contain\n", |
| 726 | + "# N samples from the corresponding posterior distribution\n", |
714 | 727 | "N = tau_samples.shape[0]\n",
|
715 | 728 | "expected_texts_per_day = np.zeros(n_count_data)\n",
|
716 | 729 | "for day in range(0, n_count_data):\n",
|
| 730 | + " # ix is a bool index of all tau samples corresponding to\n", |
| 731 | + " # the switchpoint occuring prior to value of 'day'\n", |
717 | 732 | " ix = day < tau_samples\n",
|
| 733 | + " # Each posterior sample corresponds to a value for tau.\n", |
| 734 | + " # for each day, that value of tau indicates whether we're \"before\" (in the lambda1 \"regime\")\n", |
| 735 | + " # or \"after\" (in the lambda2 \"regime\") the switchpoint.\n", |
| 736 | + " # by taking the posterior sample of lambda1/2 accordingly, we can average\n", |
| 737 | + " # over all samples to get an expected value for lambda on that day.\n", |
| 738 | + " # As explained, the \"message count\" random variable is Poisson distributed, \n", |
| 739 | + " # and therefore lambda (the poisson parameter) is the expected value of \"message count\"\n", |
718 | 740 | " expected_texts_per_day[day] = (lambda_1_samples[ix].sum() \n",
|
719 | 741 | " + lambda_2_samples[~ix].sum() ) /N\n",
|
720 | 742 | "\n",
|
|
723 | 745 | "plt.xlim( 0, n_count_data )\n",
|
724 | 746 | "plt.xlabel( \"Day\" )\n",
|
725 | 747 | "plt.ylabel( \"Expected # text-messages\" )\n",
|
726 |
| - "plt.title( \"Expected number of text-messages recieved\")\n", |
| 748 | + "plt.title( \"Expected number of text-messages received\")\n", |
727 | 749 | "#plt.ylim( 0, 35 )\n",
|
728 | 750 | "plt.bar( np.arange( len(count_data) ), count_data, color =\"#348ABD\", alpha = 0.5,\n",
|
729 | 751 | " label=\"observed texts per day\")\n",
|
|
751 | 773 | "cell_type": "markdown",
|
752 | 774 | "metadata": {},
|
753 | 775 | "source": [
|
754 |
| - "Our analysis shows strong support that the user's behavior suddenly changed, versus no change ($\\lambda_1$ would appear like $\\lambda_2$ had this been true), versus a gradual change (more variation in the posterior of $\\tau$ had this been true). We can speculate what might have caused this: a cheaper text-message rate, a recent weather-2-text subscription, or a new relationship. (The 45th day corresponds to Christmas, and I moved away to Toronto the next month leaving a girlfriend behind)\n" |
| 776 | + "Our analysis shows strong support for believing the user's behavior did change ($\\lambda_1$ would have been been close in value to $\\lambda_2$ had this been true), and the the change was sudden rather then gradual (demonstrated by $\\tau$ 's strongly peaked posterior distribution). We can speculate what might have caused this: a cheaper text-message rate, a recent weather-2-text subscription, or a new relationship. (The 45th day corresponds to Christmas, and I moved away to Toronto the next month leaving a girlfriend behind)\n" |
755 | 777 | ]
|
756 | 778 | },
|
757 | 779 | {
|
|
0 commit comments