|
35 | 35 | "The Philosophy of Bayesian Inference\n",
|
36 | 36 | "------\n",
|
37 | 37 | " \n",
|
38 |
| - "> You are a skilled programmer, but bugs still slip into your code. After a particularly difficult implementation of an algorithm, you decide to test your code on a trivial example. It passes. You test the code on a harder problem. It passes once again. And it passes the next, *even more difficult*, test too! You are starting to believe that there are may be no bugs present...\n", |
| 38 | + "> You are a skilled programmer, but bugs still slip into your code. After a particularly difficult implementation of an algorithm, you decide to test your code on a trivial example. It passes. You test the code on a harder problem. It passes once again. And it passes the next, *even more difficult*, test too! You are starting to believe that there may be no bugs in this code...\n", |
39 | 39 | "\n",
|
40 |
| - "If you think this way, then congratulations, you already are a Bayesian practitioner! Bayesian inference is simply updating your beliefs after considering new evidence. A Bayesian can rarely be certain about a result, but he or she can be very confident. Just like in the example above, we can never be 100% sure that our code is bug-free unless we test it on every possible problem; something rarely possible in practice. Instead, we can test it on a large number of problems, and if it succeeds we can feel more *confident* about our code. Bayesian inference works identically: we can only update our beliefs about an outcome, rarely can we be absolutely sure unless we rule out all other alternatives. We will see that being uncertain can have its advantages. \n" |
| 40 | + "If you think this way, then congratulations, you already are a Bayesian practitioner! Bayesian inference is simply updating your beliefs after considering new evidence. A Bayesian can rarely be certain about a result, but he or she can be very confident. Just like in the example above, we can never be 100% sure that our code is bug-free unless we test it on every possible problem; something rarely possible in practice. Instead, we can test it on a large number of problems, and if it succeeds we can feel more *confident* about our code. Bayesian inference works identically: we update our beliefs about an outcome; rarely can we be absolutely sure unless we rule out all other alternatives. \n" |
41 | 41 | ]
|
42 | 42 | },
|
43 | 43 | {
|
|
48 | 48 | "###The Bayesian state of mind\n",
|
49 | 49 | "\n",
|
50 | 50 | "\n",
|
51 |
| - "Bayesian inference differs from more traditional statistical analysis by preserving *uncertainty* about our beliefs. At first, this sounds like a bad statistical technique. Isn't statistics all about deriving *certainty* from randomness? The Bayesian method interprets probability as measure of *believability in an event*, that is, how confident one is in an event occuring. In fact, we will see in a moment that this is the natural interpretation of probability. \n", |
| 51 | + "Bayesian inference differs from more traditional statistical inference by preserving *uncertainty* about our beliefs. At first, this sounds like a bad statistical technique. Isn't statistics all about deriving *certainty* from randomness? To reconcile this, we need to start thinking like Bayesians. \n", |
52 | 52 | "\n",
|
53 |
| - "For this to be clearer, we consider an alternative interpretation of probability: *Frequentist* methods assume that probability is the long-run frequency of events (hence the bestowed title). For example, the *probability of plane accidents* under a frequentist philosophy is interpreted as the *long-term frequency of plane accidents*. This makes logical sense for many probabilities and events, but becomes more difficult to understand when events have no long-term frequency of occurences. Consider: we often assign probabilities to outcomes of presidential elections, but the election itself only happens once! Frequentists get around this by invoking alternative realities and saying across all these universes, the frequency of occurences is the probability. \n", |
| 53 | + "The Bayesian world-view interprets probability as measure of *believability in an event*, that is, how confident we are in an event occuring. In fact, we will see in a moment that this is the natural interpretation of probability. \n", |
54 | 54 | "\n",
|
55 |
| - "Bayesians, on the other hand, have a more intuitive approach. Bayesians interpret a probability as measure of *belief*, or confidence, of an event occurring. An individual who assigns a belief of 0 to an event has no confidence that the event will occur; conversely, assigning a belief of 1 implies that the individual is absolutely certain of an event occurring. Beliefs between 0 and 1 allow for weightings of other outcomes. This definition agrees with the probability of a plane accident example, for having observed the frequency of plane accidents, an individual's belief should be equal to that frequency. Similarly, under this definition of probability being equal to beliefs, it is clear how we can speak about probabilities (beliefs) of presidential election outcomes. \n", |
| 55 | + "For this to be clearer, we consider an alternative interpretation of probability: *Frequentist* methods assume that probability is the long-run frequency of events (hence the bestowed title). For example, the *probability of plane accidents* under a frequentist philosophy is interpreted as the *long-term frequency of plane accidents*. This makes logical sense for many probabilities of events, but becomes more difficult to understand when events have no long-term frequency of occurences. Consider: we often assign probabilities to outcomes of presidential elections, but the election itself only happens once! Frequentists get around this by invoking alternative realities and saying across all these universes, the frequency of occurences defines the probability. \n", |
56 | 56 | "\n",
|
57 |
| - "Notice in the paragraph above, I assigned the belief (probability) measure to an *individual*, not to Nature. This is very interesting, as this definition leaves room for conflicting beliefs between individuals. Again, this is appropriate for what naturally occurs: different individuals have different beliefs of events occuring, because they possess different *information* about the world.\n", |
| 57 | + "Bayesians, on the other hand, have a more intuitive approach. Bayesians interpret a probability as measure of *belief*, or confidence, of an event occurring. An individual who assigns a belief of 0 to an event has no confidence that the event will occur; conversely, assigning a belief of 1 implies that the individual is absolutely certain of an event occurring. Beliefs between 0 and 1 allow for weightings of other outcomes. This definition agrees with the probability of a plane accident example, for having observed the frequency of plane accidents, an individual's belief should be equal to that frequency, exluding any outside information. Similarly, under this definition of probability being equal to beliefs, it is clear how we can speak about probabilities (beliefs) of presidential election outcomes: how confident are you candidate A will win?\n", |
58 | 58 | "\n",
|
59 |
| - "- I flip a coin, and we both guess the result. We would both agree, assuming the coin is fair, that the probability of heads if 1/2. Assume, then, that I peek at the coin. Now I know for certain what the result is: I assign probability 1.0 to either heads or tails. Now what is *your* belief that the coin is heads? My knowledge of the outcome has not changed the coin's results. Thus we assign different probabilities to the result. \n", |
| 59 | + "Notice in the paragraph above, I assigned the belief (probability) measure to an *individual*, not to Nature. This is very interesting, as this definition leaves room for conflicting beliefs between individuals. Again, this is appropriate for what naturally occurs: different individuals have different beliefs of events occuring, because they possess different *information* about the world. Consider the following examples demonstrating the relationship between individual beliefs and probabilties:\n", |
| 60 | + "\n", |
| 61 | + "- I flip a coin, and we both guess the result. We would both agree, assuming the coin is fair, that the probability of heads is 1/2. Assume, then, that I peek at the coin. Now I know for certain what the result is: I assign probability 1.0 to either heads or tails. Now what is *your* belief that the coin is heads? My knowledge of the outcome has not changed the coin's results. Thus we assign different probabilities to the result. \n", |
60 | 62 | "\n",
|
61 | 63 | "- Your code either has a bug in it or not, but we do not know for certain which is true. Though we have a belief about the presence or absence of a bug. \n",
|
62 | 64 | "\n",
|
|
66 | 68 | "\n",
|
67 | 69 | "This philosophy of treating beliefs as probability is natural to humans. We employ it constantly as we interact with the world and only see partial evidence. Alternatively, you have to be *trained* to think like a frequentist. \n",
|
68 | 70 | "\n",
|
69 |
| - "To align ourselves with traditional probability notation, we denote our belief about event $A$ as $P(A)$.\n", |
| 71 | + "To align ourselves with traditional probability notation, we denote our belief about event $A$ as $P(A)$. We call this quantity the *prior probability*.\n", |
70 | 72 | "\n",
|
71 |
| - "John Maynard Keynes, a great economist and thinker, said \"When the facts change, I change my mind. What do you do, sir?\" This quote reflects the way a Bayesian updates his or her beliefs after seeing evidence. Even --especially-- if the evidence is counter to what was initially believed, the evidence cannot be ignored. We denote our updated belief as $P(A |X )$, interpreted as the probability of $A$ given the evidence $X$. We call the updated belief the *posterior probability* so as to contrast it with the *prior probability*. For example, consider the posterior probabilities (read: posterior belief) of the above examples, after observing some evidence $X$.:\n", |
| 73 | + "John Maynard Keynes, a great economist and thinker, said \"When the facts change, I change my mind. What do you do, sir?\" This quote reflects the way a Bayesian updates his or her beliefs after seeing evidence. Even --especially-- if the evidence is counter to what was initially believed, the evidence cannot be ignored. We denote our updated belief as $P(A |X )$, interpreted as the probability of $A$ given the evidence $X$. We call the updated belief the *posterior probability* so as to contrast it with the prior probability. For example, consider the posterior probabilities (read: posterior beliefs) of the above examples, after observing some evidence $X$.:\n", |
72 | 74 | "\n",
|
73 |
| - "1\\. $P(A): \\;\\;$ the coin has a 50 percent chance of being heads. $P(A | X):\\;\\;$ You look at the coin, observe a heads, denote this information $X$, and trivially assign probability 1.0 to heads and 0.0 to tails.\n", |
| 75 | + "1\\. $P(A): \\;\\;$ the coin has a 50 percent chance of being heads. $P(A | X):\\;\\;$ You look at the coin, observe a heads has landed, denote this information $X$, and trivially assign probability 1.0 to heads and 0.0 to tails.\n", |
74 | 76 | "\n",
|
75 | 77 | "2\\. $P(A): \\;\\;$ This big, complex code likely has a bug in it. $P(A | X): \\;\\;$ The code passed all $X$ tests; there still might be a bug, but its presence is less likely now.\n",
|
76 | 78 | "\n",
|
77 | 79 | "3\\. $P(A):\\;\\;$ The patient could have any number of diseases. $P(A | X):\\;\\;$ Performing a blood test generated evidence $X$, ruling out some of the possible diseases from consideration.\n",
|
78 | 80 | "\n",
|
79 | 81 | "4\\. $P(A):\\;\\;$ You believe that the probability that the lovely girl in your class likes you is low. $P(A | X): \\;\\;$ She sent you an SMS message about this Friday night. Interesting... \n",
|
80 | 82 | "\n",
|
81 |
| - "It's clear that in each example we did not completely discard the prior belief after seeing new evidence, but we *re-weighted the prior* to incorporate the new evidence (i.e. we put more weight, or confidence, on some beliefs versus others). \n", |
| 83 | + "It's clear that in each example we did not completely discard the prior belief after seeing new evidence $X$, but we *re-weighted the prior* to incorporate the new evidence (i.e. we put more weight, or confidence, on some beliefs versus others). \n", |
82 | 84 | "\n",
|
83 | 85 | "By introducing prior uncertainity about events, we are already admitting that any guess we make is potentially very wrong. After observing data, evidence, or other information, and we update our beliefs, our guess becomes *less wrong*. This is the alternative side of the prediction coin, where typically we try to be *more right*.\n"
|
84 | 86 | ]
|
|
0 commit comments