Thinking out loud: Automation vs Complication
In this post I want to do some stream-of-consciousness brainstorming on a question that has been in the back of my mind for a long time.
To put it in a very coarse way that I hope to gradually improve in this essay, there seems to be a contradiction in the fundamental pitch for automation.
On the one hand, it's pitched as a way to improve efficiency exponentially. When you have a process or operation that is manual, has way too many steps, way too much opportunity for human error, takes way too much valuable human time, then the correct way forward is to make or buy a tool to automate that process. The human operator should only input the essential information for the task, and the tool should do everything else, including checking the inputs for sanity. This pitch is repeated with variations by every single automation vendor, and every single internal tooling project. What's more, on the face of it it seems to check out.
On the other hand, every tool added to one's stack adds to complexity, to maintenance/integration burden, to training of new team members, and to opportunity for other kinds of error, where the abstractions are ill-understood. Every tool also has its own set of dependencies, which can become a single point of failure for your organisation now as well. But the most painful side of all this is the loss of flexibility to modify the process. A mature tool will offer certain amounts of internal "wiggle room" through configuration options, and through context-sensitivity. No tool however can offer infinite flexibility (otherwise it would be able to replace every other tool, unless of course that flexibility opened up the original set of problems around human error and inneficiency we were trying to resolve).
Organisations adopt automation to improve their efficiency, yet end up inefficient because the tools introduce a different kind of friction, some of it around using, integrating and maintaining the tools themselves, and a lot of it around not being able to operate in the optimal way for the organisation due to limitations introduced by the tools. At some point it starts to feel like automation is a masked form of the quote:
"All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections." - David Wheeler
This seemingly fundamental contradiction in the generic pitch for automation is what I want to explore, to understand whether it applies only to certain tools or kind of tools. If not, perhaps it indicates that there is a "sweet spot" for automation. If for instance 20% of the tooling provides 80% of the efficiency gains, then perhaps that 20% is worth the maintenance overhead, but for the rest of the inefficiency, which may seem like fertile ground for automation, the standard advice to automate is actually wrong. The two solutions (and any other ones that may come up along the way) may of course be combined into more complex strategies for any given organisation, but we'll see what happens when we get there.
Let's begin with a fairly simple example however. Let's imagine a completely informal business (where everything lives in the mind of the owner, no real records kept), for instance a small grocery store. Let's imagine our grocery store is transitioning to a paper-based system, where transactions are recorded. Conventional wisdom suggests that this transition will be an unmitigated positive step forward, once the transition pain is dealt with. Besides the potential efficiency and correctness improvements, it will allow the organisation to adopt a new level of scale. The transition from informal to paper-based formal allows more people to be involved in the business (at the cost of having to write things down for every transaction).
Ignoring the added friction however, it's important to understand the flexibility cost that the business incurs. When the transactions have to be recorded, for instance, the owner loses the opportunity to allow certain people to buy "on credit". Let's say that he manages to add some extra structure for amounts received or amounts promised into the system, he then has to make sure to clarify who in the business can make the decision to accept an "on credit" purchase. To do that, he may also have to determine some rules for which customers are trusted with those transactions, and as the rules have to be defined somewhat generically, they will end up being wrong in some cases (or worse, gamed) which will make people unhappy. That unhappiness of course can be managed through some escalation process whereby the cashier can call up the owner to speak to a certain customer, and maybe that leads to the rules being modified after a lot of thought etc. When the rules are modified, every cashier has to re-learn them, make sure the change is understood and applied correctly, etc etc.
It's important to understand here, that none of these problems existed when the owner had everything in their head. They could make up or modify the process and the rules as they went, and perhaps an apprentice could pick up how this worked by watching the boss. The rules would evolve much more quickly and reach a better place faster, because changing them is as close to frictionless as it gets. This kind of calculus comes into effect when many tools are introduced, and to some degree explains why bureaucracies and large corporations can misfire spectacularly and make decisions no sane person would make. They are not a person, they don't have the ability to learn and adapt at a moment's notice. Outdated rules, tools, and policies remain in use for sometimes decades longer than they should, because other systems and people and data depend on them, and change is very expensive and hard to justify before a catastrophic failure.
There may be an interesting class of tool that we can exempt from the above effects: stateless tools. Let's take the humble calculator. It can be integrated into an informal process, and to a paper-based system, with seemingly little drawback, as it can be circumvented or used as needed. If we ignore the fact that the calculator presupposes knowledge of and compliance with basic arithmetic, it can be considered a simple, almost-free tool to adopt. The properties that make it so are that it is stateless and therefore does not lock information inside it, which in turn makes it easy to circumvent and also incrementally adopt. It is conceptually well-generalised with well-understood use cases, ans as a technology has matured into combining low cost with high performance. It is also generative: it can be used for things its inventors never expected it to be used for, and this is a characteristic shared with tools such as the telephone, the internet, and the personal computer.
The calculator stands as an example of a tool that imposes minimal constraints on its user, offering benefit with any drawback being hard to support in context. It of course did not start this way: the first calculators were ridiculously expensive, occupied entire rooms, and were extremely hard to operate, nevermind operate reliably. But we've reached the point where none of those things are true. The solar calculator doesn't even require a battery!
There is also another class of tools that fail time and time again. These are tools with hidden side-effects that are discovered while the tool is being adopted, that are not obvious at the time they are being proposed. These tools are ones whose abstractions ignore or silence barriers that the user should be aware of. We have for instance produced several platforms and standards suites that promise to shield the individual programmer from the burden of thinking about distributed systems as such, instead allowing them to be manipulated as if they were centralised. From the old RPC protocols, to CORBA, XML-RPC, SOAP, and the 57 Web Services protocols, we've seen this system fail every time. The current generation of cloud computing seems to be accomplishing parts of that vision, though with edge cases, but it seems like physics is reasserting itself when it comes to the internet of things. The repeated failures to create tools that cross certain abstraction barriers teaches us that there are configuration and management steps that do appear to be trivial and automatable, but lead to failures that are as hard to foresee as they are inevitable.
So we've discussed three types of tools so far: some that are obvious wins, some that imply subtle tradeoffs, and some that are doomed to fail, and often take their users with them. While this essay hasn't made any huge steps forward, and certainly won't stand any serious scrutiny for completeness or rigour, it does give me the outline to think about the automation problem as not an unmitigated good, but a case for understanding the tool that's being introduced, weighing its impact, and making a conscious atempt to go for it or not.