class: center, middle, inverse, title-slide # Observational studies ## What If: Chapter 3 ### Elena Dudukina ### 2020-11-12 --- # Observational studies as conditionally randomized experiments - If three assumptions hold * Consistency: well-defined intervention (or all versions of the treatment are captured) * Exchangeability: conditional probability of receiving each level of the treatment depends only on measured covariate(s), L * Positivity: the probability of receiving each level of treatment conditional on L is greater than zero, i.e., positive * Non-interference: PO outcomes of one individual is independent of PO of other individuals - These conditions are identifiability conditions - Causal interpretation = data + assumptions - Identifiability assumptions can be tracked on a DAG - In ideal randomized experiments the identifiability conditions hold by design --- # Instrumental variables - Demand different assumptions and different set of identifiability criteria --- # Exchageability - `\(Y^{a}\perp\perp\ A\)` - Had the treated be untreated their risk of PO `\(Y^{a}\)` would have been the same - Confounding is a lack of exchangeability - Confounders are variables, which when adjusted for, restore exchangeability, or remove confounding - Untestable --- ```r # association greek_gods_condrand %>% group_by(A) %>% count(Y_obs) %>% mutate( denominator = sum(n), risk = round(n/sum(n), digits = 2) ) %>% filter(Y_obs == 1) ``` ``` ## # A tibble: 2 x 5 ## # Groups: A [2] ## A Y_obs n denominator risk ## <dbl> <dbl> <int> <int> <dbl> ## 1 0 1 3 7 0.43 ## 2 1 1 7 13 0.54 ``` --- ```r # when controlling confounding by L using stratification greek_gods_condrand %>% group_by(L, A) %>% count(Y_obs) %>% mutate( denominator = sum(n), risk = round(n/sum(n), digits = 2) ) %>% filter(Y_obs == 1) ``` ``` ## # A tibble: 4 x 6 ## # Groups: L, A [4] ## L A Y_obs n denominator risk ## <dbl> <dbl> <dbl> <int> <int> <dbl> ## 1 0 0 1 1 4 0.25 ## 2 0 1 1 1 4 0.25 ## 3 1 0 1 2 3 0.67 ## 4 1 1 1 6 9 0.67 ``` --- # Conditionally randomized experiment - If L is the only source of confounding and conditional exchangeability holds `\(Y^{a}\perp\perp\ A|L\)` * this is "an observational study in which the probability of treatment A = 1 is 0.75 among those with L = 1 and 0.50 among those with A = 0" * this is "a (non blinded) conditionally randomized experiment in which investigators randomly assigned treatment A = 1 with probability 0.75 to those with L = 1 and 050 to those with L= 0" ```r greek_gods_condrand %>% group_by(L) %>% count(A) %>% mutate( denominator = sum(n), pr_A = round(n/sum(n), digits = 2) ) %>% filter(A == 1) ``` ``` ## # A tibble: 2 x 5 ## # Groups: L [2] ## L A n denominator pr_A ## <dbl> <dbl> <int> <int> <dbl> ## 1 0 1 4 8 0.5 ## 2 1 1 9 12 0.75 ``` --- # Expert knowledge - Since exchangeability is untestable, domain knowledge is necessary to guide our inferences on whether or not exchangeability assumption may or may not hold --- # Positivity - Positive probability of observing each level of treatment in each strata of L - `\(Pr(A=a|L=l > 0)\)` for all `\(a\)` and `\(l\)` - Only relevant for variables L required for exchangeability - Can be empirically verified (see chapter 12) --- # Consistency - We observe PO - the one under actually received treatment - `\(Pr[Y^{a=1}|A=1] = Pr[Y=1|A=1]\)` - Unpacking consistency * definition of `\(Y^{a=1}\)` via detailed `\(a\)` (given value of treatment) * linking the observed and the counterfactual outcome --- # Well-defined intervention paradigm `\(Y^{a}\)` - Treatment as several versions of the intervention - Are all observed and measured? - Do all versions of the treatment have the same causal effect? - Not well-defined values of `\(a\)` lead to not well-defined PO `\(Y^{a}\)` under the levels of treatment and the causal contrast `\(Pr[Y^{a=1}=1] - Pr[Y^{a=0}=1]\)` is not well-defined - Obesity/weight-loss example : duration, frequency, intensity, and type of the intervention of being "less obese" - Challenging causal questions involving biological and social constructs/SES (p. 34) - Sufficiently well-defined, meaning in the detail enough for causal inference - Domain knowledge - Communication of the results --- # Counterfactuals and observed data - "Hypothetical intervention" must be linked to actually observed version of treatment, otherwise mathematical notation of consistency `\(Pr[Y^{a=1}|A=1] = Pr[Y=1|A=1]\)` cannot be translated into the "real world" and no causal inference is possible -- - Data granularity -- - When dealing with treatments with multiple versions --> assuming treatment variation irrelevance -- - Transparency --- # Target trial - Causal effect - a contrast between average counterfactual outcomes under different treatment values -- - Imagine a (hypothetical) randomized experiment to quantify it -- - Components of the "protocol" - Eligibility criteria - Interventions (or treatment strategies) - Outcome(s) - Follow-up - Causal contrast - Statistical analysis -- - Oversimplified analysis example - Contrasting the risk of death in obese vs non-obese individuals means emulating a target trial in which obese individuals are instantaneously become to non-obese --- # Causal effect, Pearl 2018 - Does Obesity Shorten Life? Or is it the Soda? On Non-manipulable Causes - Factors not fitting into experimentalist concept of causation **44% of U.S. adults could have been obese by 2030, compared to 35.7% in 2012** - $66 billion a year in obesity-related medical costs - New York City adopted a regulation banning the sale of sugary drinks in containers larger than 16 ounces at restaurants - Is obesity well-defined and "obesity-related medical costs" exist? --- # Arguments against obesity as an exposure - BMI used to measure degree of obesity is a proxy for it -- - Obesity is not well-defined "intervention" --> consistency does not hold -- - Ill-defined intervention may undermine the exchangeability logic - If we cannot define the exposure, we cannot define what may confound its effect on the outcome -- - Ill-defined intervention may may threaten positivity - Restricting data to confounders, within whose levels the positivity holds may result in a population different from the original one --- # Arguments for obesity as an exposure (in short) - Practical interventions may have side effects --> yet, are deemed well-defined -- - Root of obesity being ill-defined "intervention" - consequences of obesity depend on how we "manipulate" it -- - At the same time, the quantity `\(Pr(mortality|do(obesity))\)` (notation of PO using do-operator; means the same as `\(Pr[mortality^{obesity=1}|obesity = 1]\)`) describes an intervention set by nature (via complex processes) -- - Causal effects of anatomical/physiological conditions may be described in terms of their presence/absence not necessarily via the means they can be manipulated --- # Take home messages - Define causal question - Define the exposure. Does it it have one version or several? Inference still possible? - Can conditional exchangeability be reached given current domain knowledge? - Is prediction a better target when exposure cannot be sufficiently well-defined?