While available documentation for unit testing is extensive and quite mature, discrepancies arise when talking about testing at higher levels of abstraction. Take for instance the way tests are classified. You can find integration tests, system tests, component acceptance tests, end-to-end tests, etc. One problem I find with the attempts to classify higher level tests is that they do not convey specific information. If I talk to a colleague about a system test or an integration test, it is unlikely that she is going to understand me precisely. Unlike the refactorings catalog or the design patterns catalog, those terms don’t convey a precise meaning and, therefore, they are not useful for communication. Furthermore, I think they depict a picture that is at the very least incomplete and suggests that further research is required on this topic.
With this post, I want to describe some concepts and the testing methodology that my team and I are currently following. These concepts may be useful for dealing with testing at higher abstraction levels. The methodology has been applied within the context of an object oriented based application in a particularly complex domain.
Software applications have a hierarchical structure. No matter what application you consider, you can see that in the way code is organized or the way components are plugged together. Moreover, each level in the hierarchy is delimited by some kind of boundary. Following the hierarchy, you can see how boundaries have increasing abstraction levels, from the most specific to the most generic. From the low level operations to the high level services offered.
Each level in the application hierarchy very likely offers an interface, a point from where you can exercise a certain part of the application. Therefore, each abstraction level offers a test point. The granularity of the test case scenarios varies depending on the abstraction level at which you test. At lower levels, you can implement fine-grained tests, being able to cover all behaviours of a given method or class. At higher levels, you can test the collaboration between the different elements of the application, but you lose the capacity to fully exercise all possible code paths, at least with reasonable effort, so you have to test with a coarser grain. Execution times also vary depending on the level at which you test, having lower levels faster execution times and higher levels slower execution times.
In summary, the architecture of the tests depends on the architecture of the software. The number of levels at which you can test depends on the number of abstraction levels of the application. I know that this may sound obvious, but I’d like to emphasize the widely different software architectures that are out there, from the Linux kernel, to AWS, to TensorFlow, to JUnit, to embedded systems (which in turn have a very wide range of architectures), to Android applications, to others that I don’t even know of. All of them are software applications, but some of them have GUI while others don’t, some of them depend on direct hardware collaboration while others don’t, some of them are very generic, while others are very specific. Accordingly, there must exist an equally large number of testing architectures. All of this suggests that a different approach is needed for the generalization of testing concepts other than attempting to classify types of tests.
The Test Ladder
I’d like to introduce this concept with an example of a real software architecture I’ve worked with (albeit simplified). In this example, I’m going to refer to the application as the box. The box contains a number of services; each service is itself an application (not to be confused with the service pattern as defined in DDD), each with varying degrees of complexity. Each service is composed by a number of components. In turn, each component is composed by one or more classes. Finally, the box is something that will run on an even larger system.
Initially, we started with just two testing levels in our ladder. At the top, we have the box acceptance tests. At the bottom, the unit tests. The background colors represent roughly the different abstraction levels from classes, to components, to services, to the application as a whole.
This testing architecture was useful, but, as you can guess, it had a major problem. The gap between the two levels was too large, as it missed many layers as defined by the software architecture. The top level had too slow execution times, leading to high feedback times while the unit tests could not account for many of the complex collaborations happening within the services.
The solution was to introduce tests at service level. That resulted in a major improvement in our testing architecture. Services could be tested at the proper abstraction level. We could test with much finer grain and that allowed us to cover all of the corner cases. All of that, with execution times much closer to unit tests.
At some point, we found that it would also be very convenient to include acceptance tests for especially complex components. Again, this produced really good results, as we could now test the component behaviours that were required by some of the service features. Curiously enough, we have few tests at this level, as depicted in the narrower ladder step.
If we go even further, and out of our control, there is a fifth step above the current ladder, where the box is integrated with the other systems. I have not included that level as I don’t have enough information to properly introduce it.
ATDD with the Test Ladder
We introduced Acceptance Test Driven Development at the same time we created the service level acceptance tests. From that time on, we have integrated the test ladder in our ATDD workflow, looping multiple times through all the levels in the test ladder. Sidenote, you may have noticed that I have always talked about acceptance tests for all the levels but for unit tests. I think that the term acceptance test (like the term unit test), is somewhat standard in the industry and it conveys precise meaning, to validate the implemenation of a feature.
We start at the topmost level (box acceptance tests), and we write an acceptance test. Then, we start descending the ladder, writing more specific acceptance tests as we go. At the topmost level, we start driving the development of each feature with a happy path end-to-end test. At the service level, we are capable of driving the design of the application including all of the business rules and scenarios. It is actually very hard to produce many relevant scenarios at the topmost level. Since we are able to cover more scenarios, the lower levels of the ladder will typically contain more tests. Sometimes, at the component level, we find it very useful to drive the design with component acceptance tests. Therefore, we don’t always loop over the component acceptance tests step of the ladder.
This methodology has emerged after many iterations and improvements. We descend the testing ladder with acceptance tests of increasing granularity, and then we climb it as we finish the implementation. Once finished, the ladder also provides an order of execution for the tests in the Continuous Integration pipeline, climbing up the ladder.
Non ATDD Tests
Once established the testing architecture, the test ladder provides the means for implementing other tests besides feature driving tests. For example, the box application from the example has a number of non functional requirements that require the implementation of a set of tests that cannot drive development. Those tests include, among others, performance, stress and stability tests, and in our case they are absolutely required to test that the system performs as expected.
Trying to classify tests is the same as trying to classify components in a given software design. The current attempts to classify tests as, for instance, the various testing pyramids out there, are a bit misleading. The testing architecture has to be designed, and it emerges from the design of the application itself. The testing points are given by the architecture of the application.
I have also observed that we are using the same structures and strategies for high level testing that we use for unit tests. For instance, we make use of test doubles, but mocking elements at different abstraction levels. Indeed, I have found that high level test doubles can be very powerful tools for testing. Of course, I’m not saying that you can use them in the same way you would use mocks in outside-in TDD. I our case, we have mocked strategic parts of the system, like external services out of our control.
Finally, I think that the testing architecture points towards the definition of a catalog of testing patterns. Such a catalog could describe recurrent structures found in testing architectures that could be reused, similarly to those described in the design patterns catalog.
There is a lot more to be said in this direction, and I hope this post provides insights for further development in high level software testing.