Skip to content

ZhenghaoXiao32/propensity_scores_r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Propensity score analysis using R

Introduction

Rubin's Causal Model

We start with Rubin's casual model. The causal effect of a treatment () over the control (), for a given individual or unit () and an interval of time from 0 to 1 ( and ) is the difference between what would have happened at time 1 if the unit had been exposed to treatment initiated at time 0 () and what would have happened at time 1 if the unit had been exposed to control initiated at time 0 ().

In Rubin's causal model, we have in fact four potential outcomes:

  • at time 0 with control:
  • at time 0 with treatment:
  • at time 1 with control:
  • at time 1 with treatment:

For example, we have Tom to test if a new drug can reduce his blood pressure level or not. We tested his blood pressure at time 0, so we got , then we asked Tom to take the pill, and tested his blood pressure again after an hour, now we got . But what we have now is not enough to figure out the causal effect, we need which is the blood pressure level of Tom at time 1 if he had not taken the pill (or had taken some kind of placebo).

You should have noticed that in the example what would have happend at time 1 if the unit had been exposed to control initiated at time 0 is impossible to observe for a individual who had been exposed to treatment initiated at time 0. It is obvious that we cannot know the both results of two different conditions on one person of the same time. However, they are what we needed to calculate the causal effect of a treatment. That is called the fundamental problem of causal inference.

Stable Unit Treatment Value Assumption

We need an assumption to solve the fundamental problem. The assumption is: "The potential outcome of the observation on one unit should be unaffected by the particular assignment of treatments to the other unit" (Cox 1958). The math formula for it is: given observed covariates , (Rosenbaum & Rubin, 1983).

We can use a example to better illustrate it: Tom's change in blood pressure may depend on whether or not Jerry receives the drug. Assuming Tom and Jerry live together and Jerry always cooks. The drug causes change of Jerry's blood pressure and makes Jerry want to eat salty foods. So Jerry cooks food with more salt and it increases Tom's blood pressure. In that situation, we will say Tom's potential outcome is correlated to which treatment Jerry receives. That is a volation of the stable unit treatment value assumption.

How We Solve The Problem with Experimental Design

To solve the fundamental problem of causal inference, we have two methods in essence. The first one is to repeat the experiment with the same individual or unit (repeated measure). As it is the same individual or unit, the main noise should be associated with different time. Generally we need to assume that responses of the individual, we say Tom, is independent of each other. This assumption might be unstable in real life, like if the blood concentratioin of the drug needs some time (maybe a month) to decrease, and the interval of the measurements is less than that time, then we violates the assumption. However, in experimental design, we can a particular design to cancel the bias out:

month first month second month third month fourth month
last month control treatment treatment control
this month treatment treatment control control

The problem of this design is it increases the number of treatment assignment and makes the calculation of average causal effect harder. Another method is multiple subjects (by groups). We can recruit multiple people for the hypertension drug test, assign them to treatment group and control group. The difference of different individuals can be reduced by random assignment. However, we are not able to control everything, in most studies, we cannot use random assignment nor manipulation of conditions. Research designs to estimate treatment effects but have no random assignment to conditions is called quasi-experimental or nonexperimental designs. The propensity scores and methods of apply them are designed to reduce the bias the quasi-experimental designs.

Types of Treatment Effects

Before we get started with propensity scores, we still need to know about the types of treatment effects:

  • The average treatment effect (ATE) is the difference between the expected values of the potential outcomes of all individuals in the treated and untreated conditions:
  • The average treatment effect on the treated (ATT) is the difference between the expected values of the potential outcomes of treated individuals:
  • The average treatment effect on the untreated (control) (ATC) is the difference between the expected values of the potential outcomes of the untreated individuals:

In experimental designs, the ATE is euqal to the ATT and the ATC because of the balance achieved by random assignment. But in quasi-experimental design, these values could differ substantially. So, what type of treatment effect we can research in quasi-experimental design is depended on whether we achieved balance in the particular group.

Propensity Scores

Propensity Scores Matching

Propensity Scores Weighting

Propensity Scores stratification

Contents

  • Propensity scores weighting for ATT
  • Propensity scores weighting for ATE
  • Propensity scores stratification for ATT
  • Propensity scores stratification for ATE
  • Propensity scores matching for ATT
  • Propensity scores matching for ATE

Maintenances

  • Fix unclear variable names in stratification part

License

The contents of this repository are covered under the MIT License.

About

propensity score analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages