Tuesday, July 20, 2010

(Statistical) bias

I think I can say quite safely that nobody in our organisation has ever suggested that substituting a voluntary survey for the mandatory long form of the census could in any circumstances provide the same kind of information, no matter how much money is spent or how large a mail-out sample is attempted. ... No statistician in their right mind would believe that this provides an equivalent information base.
- Don McLeish, president of the Statistical Society of Canada

Accurate data collection and analysis depends on tested statistical methods. At the core of statistical science is appreciating that any introduction of voluntary choice must automatically bias the results. It cannot be otherwise. Once a census is made voluntary, bias is built into the results.

Taking larger samples cannot remove or in any way negate that bias. Quite the contrary: larger samples can only tend to exaggerate it.

Replacing a mandatory form with a voluntary opt-in specifically biases results toward stable lifestyles, especially those with extreme views toward trusting the status quo. Those who don't trust the status quo are less likely to give up private information. Those with moderate views are less likely to find data collection relevant enough to voluntarily give up the extra hour or two to fill out an "unnecessary" form. Those without stable lifestyles are less likely to be drawn into any kind of voluntary data collection.

With a voluntary opt-in, there is no longer an obligation to explicitly make the data collection form accessible to all groups. Those who fell between the cracks before are even more likely to do so now: with the result that previously underrepresented groups become even more likely to be underrepresented.

Taken together, the primary bias introduced by converting a mandatory system into a voluntary system is a tendency to conceal change and reinforce existing beliefs.

If the purpose of government is to serve the greater good of the country, one wonders why any government would deliberately undermine the very data it needs to do so. Yet quietly on a summer weekend, without consultation or any advance warning, the current government has scrapped Canada's previous mandatory long-form census and replaced it with a voluntary opt-in.

Curiously, at the same time, representatives from Statistics Canada are no longer permitted to give independent interviews -- although in those questions they are still permitted to answer, StatsCan officials have been very careful to emphasise that their only role is to provide options at the government's request and to execute orders. They have no part in the decision-making. That lies with cabinet.

In the absence of further enlightenment from StatsCan, we are left only with Tony Clement's quixotic statement that a voluntary census was one of StatsCan's recommendations, without any context whatsoever for that statement. We have been told only that StatsCan recommended a voluntary long-form census. We have not been told what the original request actually was.

So, let us take a hypothetical scenario where the government had told StatsCan that there would no longer be a mandatory long-form census form based on privacy issues, and to find options within that demand. Since StatsCan cannot initiate recommendations, only respond to government orders: what would be StatsCan's remaining options? I can think of only three:
  1. Abandon the long form entirely.
  2. Identify which particular questions on the long form are problematic, and omit those and only those.
  3. Keep the long form, but make it voluntary.
#1 is highly politically undesirable: for it places the government in the position of being responsible for having removed the mechanism for obtaining all future long-form data. #2 avoids this by retaining most of the data, but it requires the government to analyse the long form and decide which particular questions are problematic. The monetary cost of this option is higher. Also, reasons for each choice may be required, which may lead to unwanted and potentially complicating debate: awkward in light of an approaching election.

Politically, #3 is the best of the lot. The government will not have to account for having dropped specific questions or long-form data altogether. The only explanation needed is respecting the individual's privacy (and who would not want that?). From StatsCan's perspective, #3 will permanently bias all future long-form data: but that is not the political party's problem (and may even act to its benefit).

Coincidentally, Tony Clement has mentioned that StatsCan offered three options, one of them the voluntary long-form census the government has chosen.

The study of politics does not itself require statistics. (That is what polling agencies are for.) The study of politics is how to make best use of statistics in shaping public opinion for the purpose of being re-elected. Nowhere does politics inherently require that statistics be as accurately representative as possible -- no, not even for the purpose of serving the greater good of the country as a whole. That is the realm of ethics.


