Being agile with continuous integration

The Philosopher Developer

January 27, 2012

Something's been stirring around in my head lately, and it's going to seem somewhat heretical to those firmly planted in a particular school of thought. Essentially I've been questioning the way continuous integration is done on some of the software projects I've seen, at ThoughtWorks and elsewhere.

More broadly, I guess my skepticism isn't specifically about continuous integration, but rather about test-driven development with TDD and, more importantly, BDD (i.e., with a language like Gherkin/Cucumber for defining feature requirements and a UI automation tool such as Selenium for driving acceptance tests).

To be clear, I view TDD and BDD as extremely useful techniques; and CI is a powerful tool to maximize their effectiveness. I would even accept that we are better off using a test-driven approach for 100% of development than we would be if we used no TDD. But that isn't the same as saying 100% TDD is optimal or that we should get started with BDD immediately on every project. Like most everything else in life, I believe it is ultimately a matter of achieving the right balance, which requires good judgment.

As a bit of background, I have had conversations with colleagues who have agreed with me that sometimes testing is not necessary or appropriate. The attitude I have picked up as a general rule of thumb is that for spiking out software prototypes—throwaway versions that we acknowledge in advance will probably differ dramatically from any actual release—we can often move faster by coding features without testing them. TDD might still be useful in this context, however, as one of its benefits is that it assists the developer in thinking through the implementation of a tricky piece of logic. But its usefulness here is actually in increasing development speed (which is important for a prototype), not in ensuring robustness (which arguably does not matter very much at this stage). Therefore it makes sense not to TDD everything, but only those things that can actually be expedited with TDD.

If we are in agreement on that point, let me suggest that we should be wary of overdoing it when it comes to early use of specifically BDD and continuous integration for certain types of projects.

Specifically, I am thinking of projects where:

  1. Getting a fast start is important (e.g., for instilling confidence in the customer or building up a strong user base early), and
  2. Requirements are volatile (e.g., because the project is experimental and based on hypotheses that need to be challenged)

Here is how I believe we can fall into a trap with CI. Even if we acknowledge the volatility of the requirements and the importance of moving quickly, it can be tempting to cling to BDD and CI as a safety net. These tools have served us well in the past on enterprise projects where regression failures and unstable builds are our mortal enemy. But we need to think about our priorities here, and consider whether we are getting the greatest return on investment for our time.

Think about what happens when the build goes from green to red. This is a stop-the-line event, requiring an expensive context switch for potentially every developer on the team. (Even if we adopt "he who broke it must fix it" as a rule, the rest of the team may still be blocked from checking in.) And of course, it should be disruptive to a team, in plenty of cases, on plenty of projects. But early on a project where requirements are in flux, what does it actually mean for the build to be red?

Literally speaking, of course, it means one or more tests have failed. Are these tests critical? Are they testing functionality that we know we want, and which is well defined? Or is the customer or product team still debating exactly how this feature should work? How much time have we invested in writing the tests? If we've used BDD to capture the requirement as it stands now and have wired up our tests (e.g., Cucumber feature files) to UI automation code, how much of that work will just be thrown away in a week or a month's time?

Perhaps more importantly, even if the test is correct, how much value is it delivering right now? This question is especially relevant to teams with a dedicated QA or QAs. A QA has a nuanced understanding of what is being tested and how final a requirement is. He or she is informed about how important a given feature is and whether it is actually broken at any given point (as opposed to the bane of all teams that use CI, flaky or intermittent failures). And early on a project, it is relatively inexpensive for this QA to test all the functionality he or she knows to be important with nearly every revision.

Just as we try to avoid over-specifying requirements upfront in any agile software project, shouldn't we also avoid over-BDDing our features too early?

I'm not articulating myself very well, but hopefully my concern is clear enough. It isn't that I question the value of testing with BDD or of CI in general. I think any project benefits from BDD and CI, from some starting point to the end of the project. I suppose my uncertainty is primarily about identifying that starting point. Perhaps it isn't always Day 1 on every project. Maybe on some projects it makes sense to write code vigorously for the first several weeks, with only unit tests and very few functional tests, and to only start introducing BDD with Cucumber et al once features become more well defined through team conversations and feedback from users.

Another possibility I've been considering is a slightly different CI paradigm. What if, for the early stages of a project, we didn't use CI as a binary green/red indicator, potentially stopping the line and forcing a context switch on everyone, but as more of a "heat map" highlighting areas that may require our attention? In other words, instead of loudly alerting the team that "the build is broken [red]", this different kind of CI could point out that "tests have been failing in this functional area for several commits in a row now," shifting us along a spectrum from green to yellow.

The team would discuss this area, and evaluate the appropriate course of action. Maybe the requirements being tested are actually still uncertain, in which case we don't sacrifice valuable time wrestling with the build instead of continuing work on preliminary features. Maybe the requirements have taken a clear shape by this point, in which case we revisit the tests and fix them where it makes sense, and rewrite them where they have become outdated. It may be possible, however, that requirements have changed and the tests have become obsolete, in which case we can discard them altogether.

I understand that there are very logical objections to some of what I'm saying here. One such objection is that by focusing less on proper testing and CI at the start of a project, we risk laying an unstable foundation upon which to expand the code base, which may jeopardize the quality of the product. We also incur greater infrastructural costs later on, when we do choose to focus on these aspects of the project, which may result in a slowdown at an unexpected or inopportune point (which is basically any point, since slowdowns are always undesirable). So this is far from a passionate viewpoint I am espousing. It is merely a thought, one which I intend to consider much more carefully in the days ahead and for which I'd love to hear any and all feedback.