Definitions and Terminology#
Understanding the terminology and concepts in causal inference is crucial for grasping the methodologies and their applications. This chapter is designed to be a vademecum that you can consult whenever you have a doubt. Whether you are just starting your causal inference course or revisiting the concepts after a few months, this glossary will provide you with a refresher on the terms you need to remember.
Causal Structures and Graphical Representations#
Causal graph: a visual representation of causal relationships among a set of variables, often depicted as a directed acyclic graph (DAG).
Collider: a variable that is influenced by two or more other variables in a causal graph, potentially introducing bias when conditioned upon.
Confounder: a variable that influences both the treatment and the outcome, potentially biasing the estimated effect of the treatment.
Directed acyclic graph: a graphical representation of causal relationships between variables. Each node represents a variable, and each directed edge represents a causal effect from one variable to another. The graph is acyclic, meaning it does not contain any cycles.
D-separation: a criterion used to determine conditional independence between two sets of nodes in a directed acyclic graph (DAG), given a third set of nodes. It helps identify whether paths between nodes are blocked by a conditioning set.
Instrumental variable: a variable that is used to estimate causal relationships when controlled experiments are not feasible. The IV affects the treatment but has no direct effect on the outcome except through the treatment.
Local Markov assumption: states that a node in a DAG is conditionally independent of its non-descendants given its parents. This assumption allows for the decomposition of the joint probability distribution of all variables in the DAG into simpler conditional distributions.
Mediator: a variable that lies on the causal path between the treatment and the outcome, helping to explain the mechanism through which the treatment affects the outcome.
Minimality assumption: asserts that the DAG representing the causal structure is minimal, meaning that removing any edge would violate the Markov condition for the observed data. It ensures that the model encodes only necessary dependency relationships.
Experimental Design and Analysis#
A/B testing: a randomised controlled experiment used to compare two versions of a variable to determine which performs better, commonly used in marketing and product development.
Control group and treatment group: in experimental design, the control group is the group that does not receive the treatment or intervention, while the treatment group is the group that receives the treatment or intervention. Comparing the outcomes of these two groups helps to estimate the causal effect of the treatment.
Counterfactuals: this framework allows for the analysis of counterfactual questions—what would happen to one variable if another were changed, holding everything else constant.
Design of experiments (DoE): a systematic approach to planning, conducting, analysing, and interpreting controlled experiments to ensure valid, reliable, and replicable results.
Ignorability/exchangeability assumption: sssumes that the way individuals are assigned to treatments can be ignored, allowing for the assumption of random assignment in observational studies. This is crucial for identifying causal effects.
Placebo effect: a phenomenon where subjects experience a perceived or actual improvement in their condition despite receiving a non-active treatment, due to their expectations of the treatment’s efficacy.
Randomisation: the process of assigning subjects to treatment and control groups by chance, reducing bias and ensuring that the groups are comparable.
Regression discontinuity design: a quasi-experimental design that exploits a cutoff or threshold to assign treatments in order to estimate causal effects.
Selection bias: a bias that occurs when the subjects included in a study are not representative of the population intended to be analysed, leading to incorrect conclusions.
Synthetic control method: a method for estimating causal effects by comparing treated units to a weighted combination of untreated units that best approximates the characteristics of the treated units.
Effects and Estimates#
Average treatment effect: the difference in the average outcomes between the treatment group and the control group. It measures the overall impact of the treatment on the population.
Causal effect: the change in an outcome directly attributable to a change in a treatment or exposure.
Conditional average treatment effect: the average treatment effect for a specific subset of the population, defined by certain characteristics or conditions. It provides a more granular understanding of how the treatment effect varies across different groups within the population.
Confounding bias: a bias that occurs when the treatment effect is mixed with the effect of a confounder, leading to an incorrect estimation of the causal effect.
Endogeneity: a situation in which an explanatory variable is correlated with the error term in a regression model, leading to biased and inconsistent parameter estimates.
Exogeneity: a condition where an explanatory variable is not correlated with the error term, ensuring unbiased and consistent parameter estimates.
External validity: the extent to which the results of a study can be generalised to other settings, populations, or time periods.
Heterogeneity: the variation in treatment effects across different subgroups or individuals within a population.
Hidden common causes: variables that are not observed but influence multiple observed variables, leading to dependent error terms. Addressing hidden common causes is crucial for accurate causal inference.
Internal validity: the extent to which the observed effects in a study are due to the treatment and not to other factors.
Propensity score: the probability of a unit being assigned to the treatment group given a set of observed covariates. Used to reduce bias in the estimation of treatment effects in observational studies.
Robustness check: methods used to test the stability of the estimated effects under different model specifications or assumptions.
Sensitivity analysis: a technique used to determine how the results of a study or model change when the assumptions or parameters are varied.
Spillover effect: an effect where the treatment of one group or individual influences the outcomes of another group or individual, potentially biasing the estimated treatment effect.
SUTVA: the stable unit treatment value assumption (SUTVA) is a fundamental assumption in causal inference and experimental design. It asserts that:
The treatment assigned to one unit does not affect the outcomes of other units.
Each unit’s outcome depends only on its own treatment, not on the treatments assigned to other units.
There are no different versions of the treatment.
The treatment is applied uniformly across all treated units.