A/B testing and irreducible complexity

The Philosopher Developer

January 13, 2013

I was raised in a devout Christian family, which resulted in a fair amount of inner conflict and soul-searching throughout my academic life, particularly with respect to my ninth-grade education on evolution1. This in turn ultimately led me to read a book called Darwin’s Black Box by Michael Behe, which argues in favor of intelligent design2 on the basis of a concept called irreducible complexity. It is actually a pretty reasonable argument, in my opinion—though I’m admittedly no expert on the subject—at least in that its premise seems plausible. To summarize in one sentence: Behe argues that there are systems in present-day organisms consisting of interacting parts, each of which on its own would provide no reproductive advantage to an individual and so cannot be explained purely by Darwinian natural selection. Only taken as a whole do these systems provide reproductive advantages; and so some other process must have generated them (where intelligent design enters the picture).

Behe provides plenty of low-level biochemical examples that I won’t bother you with, primarily because I don’t remember them. But whether or not you agree with his argument—and my limited research leads me to believe that (surprise!) most of the scientific community does not–I think the concept of irreducible complexity is a useful one. Even if Behe is wrong with respect to evolution, we all know and probably to some extent accept the idea behind the whole is greater than the sum of its parts. Not everything in this world is the end result of some sequence of perfectly incremental changes, each coherent and explicable in its own right. Morever, if a whole could be greater than the sum of its parts, this leaves open the possibility that any given part on its own could even have negative effects, and only contribute towards a positive whole in concert with other parts.

This is a particularly important lesson for software developers—we who are practically hard-wired to test the validity of every assumption and break all problems into smaller pieces. We do love our A/B testing; but as Robert J. Moore recently wrote in an article on TechCrunch, these can be taken too far. I have been disheartened on more than one occasion by data-driven minds pushing to validate a large feature through A/B testing each of its smaller parts individually, only to “discover” that the feature had no impact, or even a negative impact, on whatever was being measured. I can’t prove it (without buy-in, that is), but my suspicion is often that the larger feature in its complete form might still have yielded positive results in these cases.

It’s difficult to make this argument, though. The obsessively data-driven approach is actually a very scientific way of tackling a problem: as we all learned in science class, the only true way to test a variable is in isolation, with all other potential factors held constant. One of the problems with applying this scientific methodology to a software project, of course, is that you cannot possibly hold all factors but one constant. The market, your competitors, your users—everything is changing around you at all times. But even if you could somehow contain all that, there remains that nagging possibility that Behe was right, and you risk breaking a big good thing into many small bad things.

How do you draw the line? I’m afraid I don’t have a satisfying answer to that. But from experience, I think I prefer to lean closer to the “test the whole feature” side of the spectrum than the “test each part by itself” side.


  1. Not that my parents were Biblical literalists. I never heard either my mom or my dad argue with any passion for a Young Earth, for example. I’m inclined to believe my sense of friction between religion and science during my formative years was as much a result of anti-religious sentiments among my science teachers (and peers) as anything else. 

  2. Not necessarily of theistic origin.