All filters off — toggle a chip or lower the importance slider to see nodes.
Top hubs · by degree
Legend
concept
claim
result
method
entity
MAP
Interactive version —
how to use this graph
✓
fast mental map
Click ▶ Guided tour for a 60-second walk through the editor's pick. Or hover any node to focus; click for source; ★ nodes you want to come back to; ⌘+click two nodes to compare.
✓
share a specific view
Select any node, copy URL — the link encodes selection, zoom, and filters. Save it as a named view (⌘ views). Annotations save locally per paper. </> embed generates an iframe.
✗
not a citable source
Do not quote the graph as an authority. Edge labels and importance scores are interpretive judgments by the generating agent. Any claim worth citing must be traced back to the original paper.
reliability noteHeadline structure and importance-5 nodes are stable across runs. Mid-tier nodes (importance 2–3) and edge type distinctions are interpretive and may differ between runs. Click any node to see its source citation — nodes marked "training memory" or "inferred" were not directly verified against the source document.
LOOMUS™ and the Knowledge-Loom methodology are proprietary. Visual system is original to LOOMUS.
Knowledge Graph: The Book of Why: The New Science of Cause and Effect (Judea Pearl & Dana Mackenzie, 2018)
Editorial spotlight: ↑ the Ladder of Causation — three rungs that separate correlation machines from causal reasoners
Concepts
Pearl's Ladder of Causation (importance 5): Three-level hierarchy: seeing (association), doing (intervention), imagining (counterfactuals). Each rung requires fundamentally different mathematics. Modern ML is stuck at rung one.. Source: (from training memory of book).
Pearl's structural causal models (SCM) (importance 5): Mathematical objects combining causal graphs with functional equations. Each variable defined as a function of its parents plus noise. Foundation for counterfactual reasoning.. Source: (from training memory of book).
Rung One: Association (seeing) (importance 4): Statistical correlation, pattern recognition, curve fitting. What current ML excels at. P(Y|X) — probability of Y given we observe X.. Source: (from training memory of book).
Rung Two: Intervention (doing) (importance 4): Causal questions about actions. What happens if we do X? P(Y|do(X)) — probability of Y if we force X to happen. Requires causal models, not just data.. Source: (from training memory of book).
Rung Three: Counterfactuals (imagining) (importance 4): Retrospective causal reasoning. What if I had done X instead? Why did Y happen? Requires structural causal models and the ability to rewind causal chains.. Source: (from training memory of book).
causal identification problem (importance 4): Can we estimate causal effect from observational data given a causal graph? Do-calculus provides algorithmic answer. Not always possible.. Source: (from training memory of book).
Pearl's counterfactual Y_x(u) (importance 4): Value that Y would take in unit u if we set X to x, keeping all else (background factors u) the same. Requires structural model.. Source: (from training memory of book).
confounder (Pearl definition) (importance 3): Variable that affects both treatment and outcome, creating spurious association. Opens a backdoor path. Must be blocked to estimate causal effect.. Source: (from training memory of book).
collider (Pearl definition) (importance 3): Variable caused by two others. Conditioning on a collider creates spurious association between its parents. The opposite of a confounder.. Source: (from training memory of book).
confounding bias (importance 3): Difference between observed association and true causal effect. Arises from open backdoor paths. Zero when all confounders are measured and controlled.. Source: (from training memory of book).
selection bias as collider conditioning (importance 3): Many forms of selection bias arise from conditioning on colliders. Recognizing the collider structure prevents spurious conclusions.. Source: (from training memory of book).
probability of causation (PN, PS, PNS) (importance 3): Quantifying legal/attribution questions: probability that X was necessary for Y, sufficient for Y, or both. Computable from SCM even when not identifiable from data alone.. Source: (from training memory of book).
external validity and transportability (importance 3): Can we generalize causal effect from study population to target population? Pearl's do-calculus provides conditions for valid transport across populations.. Source: (from training memory of book).
causal discovery/learning (importance 3): Inferring causal structure from data. Harder than parameter estimation. Requires assumptions like causal sufficiency and faithfulness. Active research area.. Source: (from training memory of book).
causal representation learning (importance 3): Learning representations that capture causal structure, not just correlations. Promises better generalization and interpretability. Active research combining deep learning with causality.. Source: (from training memory of book).
causal modularity principle (importance 3): Each variable's mechanism is autonomous. Intervening on X doesn't change how Y's parents affect Y. Foundation of graphical surgery and transportability.. Source: (from training memory of book).
mediator (importance 2): Variable on the causal path from treatment to outcome. Part of the mechanism. Controlling for it blocks the effect you're trying to measure.. Source: (from training memory of book).
Rubin's conditional ignorability (importance 2): Assumption that treatment assignment is independent of potential outcomes given covariates. Corresponds to blocking all backdoor paths in Pearl's framework.. Source: (from training memory of book).
natural direct effect (NDE) (importance 2): Effect of treatment on outcome while holding mediator at its natural value (what it would be without treatment). Rung 3 quantity.. Source: (from training memory of book).
natural indirect effect (NIE) (importance 2): Effect transmitted through mediator. Change in outcome from changing mediator from control to treatment value while holding treatment at control. NDE + NIE = total effect.. Source: (from training memory of book).
faithfulness assumption (importance 2): Parameters don't conspire to create conditional independencies beyond those implied by graph structure. Needed for causal discovery. Can be violated in practice.. Source: (from training memory of book).
causal sufficiency assumption (importance 2): All common causes of measured variables are themselves measured. No unmeasured confounding. Strong assumption, often violated. Relaxed by allowing latent variables.. Source: (from training memory of book).
Markov equivalence class (importance 2): Set of causal graphs that imply the same conditional independencies. Observational data alone often cannot distinguish between graphs in the same class. Need interventions or assumptions.. Source: (from training memory of book).
effect modification (interaction) (importance 2): When causal effect of X on Y differs across levels of Z. Not the same as confounding. Important for personalized medicine and targeting.. Source: (from training memory of book).
average treatment effect (ATE) (importance 2): Expected causal effect of treatment across population. E[Y|do(X=1)] - E[Y|do(X=0)]. Rung 2 quantity. Most common causal estimand.. Source: (from training memory of book).
conditional ATE (CATE) (importance 2): Average treatment effect within subgroup defined by covariates. Used for personalization. E[Y|do(X=1),Z=z] - E[Y|do(X=0),Z=z].. Source: (from training memory of book).
sparse mechanism shift principle (importance 2): When environment changes, only a few causal mechanisms change. Exploited for causal discovery and robust prediction. Causal models compose across contexts.. Source: (from training memory of book).
Claims
Pearl's Causal Revolution thesis (importance 5): Statistics has been shackled by the ban on causal language since the 1930s. We now have the mathematics to answer causal questions rigorously. This changes everything.. Source: (from training memory of book).
strong AI requires causal reasoning (importance 5): Current ML is curve fitting. Human-level intelligence requires asking why, planning interventions, learning from sparse data. All require climbing the ladder of causation.. Source: (from training memory of book).
statistics' causal language taboo (1930s-1990s) (importance 4): Statistics textbooks taught 'correlation does not imply causation' as a commandment. Causal questions were deemed unscientific. Fisher enforced this doctrine.. Source: (from training memory of book).
Bayesian networks → causal diagrams leap (importance 4): Pearl realized Bayesian network edges could represent causal relationships, not just conditional independence. The key insight that launched causal inference as a mathematical discipline.. Source: (from training memory of book).
optimization ≠ causation (importance 4): Modern ML optimizes loss functions. This is curve fitting. Understanding requires causal models. You cannot optimize your way to causal knowledge.. Source: (from training memory of book).
deep learning is stuck at rung 1 (importance 4): Neural networks are universal function approximators. But functions are rung 1 (association). To reach rung 2, you need a model of interventions. To reach rung 3, you need structural equations.. Source: (from training memory of book).
data cannot replace causal assumptions (importance 4): No amount of observational data can tell you causal direction without assumptions. Big data does not solve causal inference. You need a model (graph or equations).. Source: (from training memory of book).
AGI requires intervention reasoning (importance 4): To plan, an agent must predict consequences of actions. This is rung 2 reasoning. An agent that only predicts what will happen (rung 1) cannot choose actions strategically.. Source: (from training memory of book).
AGI requires counterfactual reasoning (importance 4): To learn from mistakes, an agent must ask 'what should I have done?' To attribute credit, it must reason about alternative actions. Rung 3 is necessary for sophisticated learning.. Source: (from training memory of book).
RCT limitations (Pearl argument) (importance 3): RCTs are often unethical, expensive, or impossible. Observational causal inference with do-calculus can answer the same questions from passive data when graph structure is known.. Source: (from training memory of book).
Pearl-Rubin equivalence with graph transparency advantage (importance 3): Pearl argues causal diagrams make assumptions explicit and checkable, while Rubin's framework hides them in untestable ignorability conditions.. Source: (from training memory of book).
Fisher's rejection of causal graphs (importance 3): R.A. Fisher opposed Wright's path diagrams, calling them unscientific speculation. Insisted only randomized experiments could establish causation.. Source: (from training memory of book).
Pearl's graphical IV conditions (importance 3): An IV must: affect treatment, not affect outcome except through treatment, share no unmeasured causes with outcome. Easily verified on a causal diagram.. Source: (from training memory of book).
causal graphs have testable implications (importance 3): d-separation implies conditional independencies that can be tested against data. Falsifiable predictions without knowing parameters. Distinguishes causal graphs from pure speculation.. Source: (from training memory of book).
counterfactuals resolve legal causation (importance 3): Courts ask 'but-for' questions. Did defendant's action cause harm? Requires counterfactual reasoning. Pearl's framework formalizes what law has done intuitively.. Source: (from training memory of book).
interventions identify causal direction (importance 3): Observational data cannot always determine arrow direction. Intervening breaks incoming arrows, revealing structure. This is why experiments are powerful.. Source: (from training memory of book).
causality enables transfer learning (importance 3): Causal relationships are stable across environments; correlations are not. Agents that learn causal models can generalize to new contexts. Pure pattern matching fails.. Source: (from training memory of book).
explanations are counterfactual (importance 3): To explain Y, we must say what would have happened without X. This is inherently counterfactual (rung 3). Black-box ML cannot explain because it cannot climb to rung 3.. Source: (from training memory of book).
good controls vs bad controls (importance 3): Controlling for a confounder removes bias. Controlling for a mediator blocks the effect. Controlling for a collider creates bias. Graph structure determines which is which.. Source: (from training memory of book).
interventional distributions factor on mutilated graph (importance 3): P(v|do(x)) factors according to the graph with X's incoming edges removed. Interventions preserve local mechanisms except for the intervened variable.. Source: (from training memory of book).
attribution is a counterfactual question (importance 3): Did this specific instance of X cause this specific instance of Y? Requires unit-level counterfactual reasoning. Cannot be answered at population level (rung 2). Needs rung 3.. Source: (from training memory of book).
causal meta-analysis (importance 2): Combining studies requires causal assumptions, not just statistical ones. Different populations may have different confounders. Causal diagrams make combination rules explicit.. Source: (from training memory of book).
missing data as selection bias (importance 2): Missingness mechanism creates selection bias. Whether data are missing completely at random (MCAR), at random (MAR), or not at random (MNAR) corresponds to graph structure.. Source: (from training memory of book).
free will requires counterfactual reasoning (importance 2): Choosing requires imagining consequences of different actions. If we could not reason about 'what if I do X?', we could not make deliberate choices. Rung 2 minimum for agency.. Source: (from training memory of book).
moral responsibility requires rung 3 (importance 2): Holding someone responsible requires 'would the harm have occurred without their action?' This is counterfactual. Determinism does not eliminate responsibility if defined counterfactually.. Source: (from training memory of book).
heterogeneous treatment effects (importance 2): Effects vary across individuals. Average effect may hide important subgroup differences. Requires rung 2 reasoning about specific groups or rung 3 for individuals.. Source: (from training memory of book).
Empirical results
Simpson's Paradox (importance 4): Statistical phenomenon where an association reverses when data is stratified. Resolved by causal diagrams: depends on whether the stratifying variable is a confounder or mediator.. Source: (from training memory of book).
birth weight paradox (importance 3): Smoking appears protective for low-birth-weight babies. Actually a collider bias: conditioning on birth weight creates spurious negative association.. Source: (from training memory of book).
identification algorithm completeness (importance 3): Shpitser-Pearl 2006: complete algorithm for determining if causal effect is identifiable from graph. If algorithm fails, no formula exists. Settles the identification problem.. Source: (from training memory of book).
Berkson's paradox (importance 2): Negative correlation between diseases in hospitalized patients even when uncorrelated in population. Hospital admission is a collider.. Source: (from training memory of book).
Table 2 Fallacy (importance 2): Common mistake in epidemiology: controlling for everything in a regression. Often includes mediators and colliders, biasing causal estimates. Graphs prevent this.. Source: (from training memory of book).
Methods
Pearl's do-calculus (importance 5): Three formal rules for manipulating causal expressions. Allows converting observational data into answers about interventions under certain graphical conditions.. Source: (from training memory of book).
Causal diagrams (DAGs) (importance 5): Directed acyclic graphs encoding causal relationships. Nodes are variables, arrows are direct causal influences. The foundation of Pearl's framework.. Source: (from training memory of book).
Pearl's do-operator (importance 5): Mathematical operator do(X=x) representing an intervention that sets X to x. The bridge from seeing to doing. Not reducible to conditioning. Requires causal model to evaluate.. Source: (from training memory of book).
Pearl's backdoor criterion (importance 4): Graphical rule for identifying which variables to control for to estimate causal effects. Blocks all spurious paths between treatment and outcome.. Source: (from training memory of book).
Pearl's Bayesian networks (1980s) (importance 4): Probabilistic graphical models for efficient inference. Pearl's early work on uncertainty in AI. Foundation for later causal diagrams.. Source: (from training memory of book).
Pearl's d-separation (importance 4): Graphical criterion for conditional independence. If X and Y are d-separated given Z in the graph, they're conditionally independent in all compatible distributions.. Source: (from training memory of book).
Pearl's frontdoor criterion (importance 3): Alternative identification strategy when backdoor fails. Uses mediating variables to estimate causal effects even with unmeasured confounders.. Source: (from training memory of book).
randomized controlled trial (RCT) (importance 3): Gold standard for causal inference. Randomization breaks all backdoor paths by design. Implements do(X) in the real world.. Source: (from training memory of book).
Rubin's potential outcomes framework (importance 3): Dominant statistical framework before Pearl. Focuses on what-if comparisons between treatment and control. Algebraically equivalent to Pearl's framework but less transparent about assumptions.. Source: (from training memory of book).
Bradford Hill's causal criteria (importance 3): Nine guidelines for inferring causation from association: strength, consistency, specificity, temporality, dose-response, plausibility, coherence, experiment, analogy. Pre-Pearl heuristics.. Source: (from training memory of book).
instrumental variable (IV) (importance 3): Variable that affects outcome only through treatment. Allows causal estimation despite unmeasured confounding. Must satisfy exclusion restriction.. Source: (from training memory of book).
Pearl's twin network method (importance 3): Technique for answering counterfactual queries. Create parallel copy of causal graph, one for factual world and one for counterfactual intervention. Linked by shared noise terms.. Source: (from training memory of book).
Pearl's causal mediation analysis (importance 3): Decompose total effect into direct effect and indirect effect through mediator. Answers 'how much of the effect goes through this mechanism?' Requires cross-world counterfactuals.. Source: (from training memory of book).
Pearl's adjustment formula (importance 3): P(y|do(x)) = Σ_z P(y|x,z)P(z) when Z satisfies backdoor criterion. Converts intervention to observational conditioning. First major do-calculus result.. Source: (from training memory of book).
truncated factorization formula (importance 3): P(y|do(x)) = P(y|pa_Y) × [product over non-descendants of X]. Intervention removes edges into X, then compute probability normally. Fundamental to do-calculus.. Source: (from training memory of book).
graph surgery (intervention as edge removal) (importance 3): do(X=x) implemented by removing all incoming edges to X and fixing its value. Mutilated graph. Shows interventions change the causal structure.. Source: (from training memory of book).
Wright's path coefficient (importance 2): Quantitative measure of direct causal influence along a path in a diagram. Computed from correlations using structural equations.. Source: (from training memory of book).
Rosenbaum-Rubin propensity score (importance 2): Probability of treatment given covariates. Matching or weighting by propensity score can estimate causal effects under ignorability. Pearl shows when this works graphically.. Source: (from training memory of book).
encouragement design (importance 2): Randomize encouragement to take treatment rather than treatment itself. Encouragement serves as instrumental variable. Estimates effect of treatment on compliers.. Source: (from training memory of book).
PC algorithm (Spirtes-Glymour-Scheines) (importance 2): Constraint-based causal discovery algorithm. Uses conditional independence tests to infer graph structure. Assumes causal sufficiency and faithfulness.. Source: (from training memory of book).
difference-in-differences (importance 2): Quasi-experimental method comparing change over time in treatment group vs control group. Assumes parallel trends. Pearl shows graphical assumptions.. Source: (from training memory of book).
regression discontinuity design (importance 2): Quasi-experimental method exploiting threshold in treatment assignment. Compares units just above and below threshold. Local randomization assumption.. Source: (from training memory of book).
natural experiments (importance 2): Observational studies where nature or policy creates quasi-random assignment. Examples: draft lottery, policy changes. Approximate instrumental variables.. Source: (from training memory of book).
sensitivity analysis for unmeasured confounding (importance 2): How much unmeasured confounding would it take to explain away the observed effect? Quantifies robustness. Pearl: easier to express graphically.. Source: (from training memory of book).
frontdoor adjustment formula (importance 2): P(y|do(x)) = Σ_m P(m|x) Σ_x' P(y|x',m)P(x') when M is a frontdoor set. Eliminates unmeasured confounding via mediator.. Source: (from training memory of book).
invariant causal prediction (importance 2): Method for finding causal parents of target variable by looking for predictive relationships that are stable across environments. Exploits that causation is stable, correlation is not.. Source: (from training memory of book).
Entities
smoking-lung cancer controversy (1950s-60s) (importance 4): Tobacco industry argued correlation doesn't prove causation. Fisher proposed genetic confounders. Resolved by combining Hill's criteria with causal reasoning and RCTs on animals.. Source: (from training memory of book).
Sewall Wright's path analysis (1920s) (importance 4): Geneticist who invented causal diagrams and path coefficients decades before statisticians. Rejected by Fisher and the statistics establishment. Pearl's intellectual ancestor.. Source: (from training memory of book).
Berkeley admissions case (1973) (importance 3): Famous Simpson's Paradox example. Overall admission rates favored men, but within departments favored women. Department choice was a mediator, not a confounder.. Source: (from training memory of book).
Galton's regression (1886) (importance 3): Discovery of regression to the mean. Galton interpreted it causally but couldn't distinguish causation from correlation. Birth of statistics as correlation-focused field.. Source: (from training memory of book).
Karl Pearson's correlation coefficient (importance 3): Quantifies linear association. Pearson explicitly rejected causal interpretation, setting statistics on a century-long detour.. Source: (from training memory of book).
Monty Hall problem (importance 2): Famous probability puzzle. Pearl uses it to illustrate distinction between seeing (P(car|opened-door)) and doing (P(car|do(switch))). Intervention changes the problem.. Source: (from training memory of book).
firing squad example (importance 2): Pearl's favorite toy example. Captain's order causes both soldiers to fire, which causes death. Illustrates confounding, mediation, counterfactuals.. Source: (from training memory of book).
Neyman-Rubin model (importance 2): Potential outcomes framework for causal inference. Jerzy Neyman's 1923 agricultural experiment paper, formalized by Donald Rubin in 1970s-80s. Dominant in statistics.. Source: (from training memory of book).
Rubin Causal Model (RCM) (importance 2): Donald Rubin's formalization of potential outcomes. Emphasizes assignment mechanism and ignorability. Less transparent than Pearl's graphs about confounding structure.. Source: (from training memory of book).
econometrics tradition (importance 2): Economics developed causal inference methods (IV, panel data, difference-in-differences) independently. Expressed in equations rather than graphs. Pearl shows graphical equivalents.. Source: (from training memory of book).
Relations
Pearl's Ladder of Causation exemplifies Rung One: Association (seeing)
Pearl's Ladder of Causation exemplifies Rung Two: Intervention (doing)
Pearl's Ladder of Causation exemplifies Rung Three: Counterfactuals (imagining)
Rung Two: Intervention (doing) builds-on Rung One: Association (seeing)