Saturday, December 19, 2009

Spurious Relationship

Source
In statistics, a spurious relationship (or, sometimes, spurious correlation or spurious regression) is a mathematical relationship in which two occurrences have no causal connection, yet it may be inferred that they do, due to a certain third, unseen factor (referred to as a "confounding factor" or "lurking variable"). The spurious relationship gives an impression of a worthy link between two groups that is invalid when objectively examined.

The misleading correlation between two variables is produced through the operation of a third causal variable. In other words we find a correlation between A and B. So we have three possible relationships:

A causes B,
B causes A,
-OR-
C causes both A and B.

The last is a spurious correlation. In a regression model, where A is regressed on B, but C is found to be the true causal factor for B; this is called specification error. It is therefore often said that "Correlation does not imply causation".

The true causal chain may be

C => A => B

or even

A => C => B

or as illiterated above,

C =>A and C =>B