The “selection fallacy”

Following on from my post on the publication bias:

https://da-boss.com/2012/06/04/the-publication-bias/

here is some disturbing news on another type of bias in scientific research. It does not, as yet, have a proper name but Steve McIntyre refers to it as “selection fallacy” (“selection bias” is a different, largely unrelated problem). I am out of my depth with advanced statistics so you are encouraged to check out the following source links:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2841687/pdf/nihms-184032.pdf

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1850704

http://climateaudit.org/2005/02/06/jacoby-1-a-few-good-series/

http://climateaudit.org/2012/06/10/more-on-screening-in-gergis-et-al-2012/

http://cooley.libarts.wsu.edu/schwartj/pdf/Geddes1.pdf

http://www.scientificamerican.com/article.cfm?id=an-epidemic-of-false-claims

As you can see the problem has been reported in the fields of neuroscience, psychology, climatology, political research and pharmacology. For those too busy to read the linked articles and papers I will try to present my understanding of what the issue is about.

In scientific research we try to uncover the set of rules governing some process, using observational data. By nature, the data available for analysis is “noisy”, meaning it contains the fingerprints of the background process as well as various systematic and random errors. Typically, scientists do initial screening to eliminate the data which is likely to contain too much noise and the analysis proper is carried out only on what is left. Which can lead to two kinds of problems.

In the first scenario an arbitrary set of rules is applied to weed out the “bad” (tainted) data. Some of these rules appear quite rational, others may be more subjective but there is not one universally accepted set of standards on data quality that can be applied. The fact a researcher has freedom to decide which data is “in” leads to suspicion of a possible bias. Let’s assume someone sets out to prove a certain hypothesis (drug “A” is more effective than drug “B”, 20th century was the warmest in the last millennium etc). If there is a number of plausible selection criteria for mining “good” data the researcher can run preliminary analysis on all of them and, in the published version of the paper, only include the one producing “favourable” results.

Alternatively, individual data sets can be screened according to how they fit in with the trend represented by all available data. The statistics involved here is somewhat above my head but from what I understand this method can lead to “auto-correlation” problems. Here is a simplified explanation of the issue I found on one of the climate blogs. Let us imagine we are trying to determine the growth rate of the US. To do this we have collected the statistical data from all 50 states. We are then screening the data as follows. First the (weighted) average growth rate of the 50 states is calculated. Then the 10 states closest to the average growth rate are kept and others rejected. Then we are calculating the second (weighted) average – this time only of the 10 states which have passed the original screening. Now, which average is closer to the real growth rate of the US – the one derived from the full population of data or the 10 states admitted to the final analysis? It is obvious that, in this example, full data carries more information and produces a more meaningful average.

There are ways of dealing with the “selection fallacy” problem. Here are the excerpts from one of the papers linked above:

We propose the following six requirements for authors.

1. Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article. Following this requirement may mean reporting the outcome of power calculations or disclosing arbitrary rules, such as “we decided to collect 100 observations” or “we decided to collect as many observations as we could before the end of the semester.” The rule itself is secondary, but it must be determined ex ante [beforehand] and be reported.

(…)

4. Authors must report all experimental conditions, including failed manipulations. This requirement prevents authors from selectively choosing only to report the condition comparisons that yield results that are consistent with their hypothesis.

(…)

5. If observations are eliminated, authors must also report what the statistical results are if those observations are included. This requirement makes transparent the extent to which a finding is reliant on the exclusion of observations, puts appropriate pressure on authors to justify the elimination of data, and encourages reviewers to explicitly consider whether such exclusions are warranted. Correctly interpreting a finding may require some data exclusions; this requirement is merely designed to draw attention to those results that hinge on ex post decisions about which data to exclude.

What is critical is that the data sets rejected during the initial screening have also been “used” in the research and must be reported, along with the criteria used to reject them. These criteria must be set BEFORE analysis begins, without doing trial runs to see which approach yields “favourable” results.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: