Friday, May 25, 2007

On Causation, Correlation, Confounding Factors and Post Hoc Ergo Propter Hoc

I am not a social scientist and much less a statistician (except for the stuff I did in math class which was possibly the most useful thing I learned considering my eventual decision to enter law school). What I understand about numerical literacy and statistical literacy was almost entirely self taught and I fear possibly superficial at best. The purpose of this disclaimer is to proclaim my ignorance and amateur status in this area and hope I don't screw the following discussion up too badly.

One of the biggest difficulties in the social sciences as it relates to making policy is deciding causality i.e. what causes Y? In natural science, the ability to repeat experiments, the fact that the Universe appears mechanical and we have no objections on experimenting on water (as opposed to humans) makes it fairly easy to determine causality, even in complex systems. In the social sciences (or the humanities), however, dealing with humans and the interactions between humans makes experimentation more difficult and extrapolation of data much more complicated.

So any answer to any social problem is not likely to be reducible to a large extent. This is not to say that we cannot operate on a reductionist model but we have to be very careful that that model does not cause more harm than good by basing itself on an erroneous cause to the exclusion of other possible or likely causes.

To take a simple case and one close to heart, my grades have drastically improved since I came over to UW and like any student concerned about his grades, given that his grades determines his career to a large extent in this industry, knowing what was the cause or causes of it would be eminently useful when I have to return to NUS.

If I had remained in NUS in this year and my grades improved (as seems to be the case for almost all law students), one might well hypothesize that this was due to perhaps an end to compulsory classes which allows students to take a class that they are personally interested in and presumably that interest translates into a better understanding of the topic and therefore better grades. Or it might be that now that there are smaller classes and there is no need to mark on a curve, that students are able to show that they are in fact just as proficient as the former A students but no longer get a lower grade simply because of the mandatory grade distribution. Or it might even simply be that with 2 years of law school experience, one had always had the capability but has finally figured out how to ace that examination. Or it might be a result of all three plus a combination of luck and fortitude etc.

Now translate that into doing a law program in a different country under a fairly different system and the potential factors affecting the improvement in grades multiple. Certain differences offhand between UW and NUS that might be contributing factors include i) opportunity to use a laptop for the final exams and therefore handwriting is no longer a factor in the marking scheme, ii) a lack of a social life and extra curricular activities that gives more time for studies, iii) actually spending more time on studies, iv) change in studying technique which involves a painstaking transcription of notes onto the laptop from the various sources and v) everything mentioned in the prior paragraph.

Now everything I have said so far is probably causative in one form or another. There is a reasonable rational mechanism in which each may translate into better grades one way or another.

The opposite of that of correlation wherein there might well be a statistically significant link between the two but it does not necessarily mean one causes the other. A good example of this with regards to the above mentioned situation might be that "because" I am hanging around more foreigners, therefore my grades have improved. There is a strong correlative tie if one should plot on one axis, Shaun's association with foreigners and on the other Shaun's law grades. But it should be clear that there is no necessary link between the two. Or another example near and dear to my heart (because I'm a Pastafarian), the decrease in the total number of pirates worldwide is closely correlated to the increase in mean global temperatures. Now that is primarily satirical but given the abuses of correlation (attempting to conflate it with causation) I hope that this would be something that we can keep in mind.

Somewhat related to this problem is that of the logical fallacy of Post Hoc, Propter Ergo Hoc i.e. After It, therefore because of It. Or to rephrase it, just because B occurred after A does not mean A caused B. There is a temptation because of the way we view and experience time as linear and of our strong innate (possibly biological imperative) to associate any occurrence (effect) with a cause that this fallacy crops up.




