One is the difference between optimizing for MTBF and MTTR (respectively, mean time between failures and mean time to repair). Quality gates improve the former but make the latter worse.
I think optimizing for MTTR (and also minimizing blast radius) is much more effective in the long term even in preventing bugs. For many reasons, but big among them is that quality gates can only ever catch the bugs you expect; it isn't until you ship to real people that you catch the bugs that you didn't expect. But the value of optimizing for fast turnaround isn't just avoiding bugs. It's increasing value delivery and organizational learning ability.
The other is that I think this grows out of an important cultural difference: the balance between blame for failure and reward for improvement. Organizations that are blame-focused are much less effective at innovation and value delivery. But they're also less effective at actual safety. [1]
To me, the attitude in, "Getting a call that production is not working is the event that I am trying to prevent by all means possible," sounds like it's adaptive in a blame-avoidance environment, but not in actual improvement. Yes, we should definitely use lots of automated tests and all sorts of other quality-improvement practices. And let's definitely work to minimize the impact of bugs. But we must not be afraid of production issues, because those are how we learn what we've missed.
[1] For those unfamiliar, I recommend Dekker's "Field Guide to Human Error": https://www.amazon.com/Field-Guide-Understanding-Human-Error...
I've found it very beneficial, and the concepts we learned have helped me inn almost every aspect of understanding the complicated world we live in. I've taken these concepts to two other companies now to great effect.
https://www.amazon.com/Field-Guide-Understanding-Human-Error...
https://codeascraft.com/2012/05/22/blameless-postmortems/
https://codeascraft.com/2016/11/17/debriefing-facilitation-g...
https://www.oreilly.com/library/view/velocity-conference-201...
When organizations scale up, and especially when they're dealing with risks, it's easy for them to shift toward the controlling end of things. This is especially true when internally people can score points by assigning or shifting blame.
Controlling and blaming are terrible for creative work, though. And they're also terrible for increasing safety beyond a certain pretty low level. (For those interested, I strongly recommend Sidney Dekker's "Field Guide to Understanding Human Error" [1], a great book on how to investigate airplane accidents, and how blame-focused approaches deeply harm real safety efforts.) So it's great to see Slack finding a way to scale up without losing something that has allowed them to make such a lovely product.
[1] https://www.amazon.com/Field-Guide-Understanding-Human-Error...
You could argue that we should now train pilots to carefully pause and consider whether the thing they are about to hit is safe to hit. But for that, you'd have to show that the additional reaction time in avoiding collisions is really net safer. And if you did argue that, you couldn't judge the current pilots by your proposed new standard.
For those interested, by the way, in really thinking through accident retrospectives, I strongly recommend Sidney Dekker's "Field Guide to Human Error": https://www.amazon.com/Field-Guide-Understanding-Human-Error...
I read it just out of curiosity, but it turned out to be very applicable to software development.
Amazon: https://www.amazon.com/Field-Guide-Understanding-Human-Error...
Older PDF (paperback is well worth it, in my opinion): http://www.leonardo-in-flight.nl/PDF/FieldGuide%20to%20Human...
Trying to find the 'real cause' is a fool's errand, because there are many places and ways to avoid the outcome.
I do take your meaning, reducing speed and following well established rules would have almost certainly have saved them.
0. PDF: http://www.leonardo-in-flight.nl/PDF/FieldGuide%20to%20Human...
Amazon: https://www.amazon.com/Field-Guide-Understanding-Human-Error...
There is no reason to say that these people "got it wrong". They were unlucky. Suppose the same shitheels broke a window, climbed in, unlocked the door, and had a big party on a weekend when the owners were away. One inclined to superiority-by-hindsight could say, "Well duh, why didn't they have bars on their windows?"
After a rare negative occurrence, one can always look back with hindsight, find some way the bad outcome could theoretically have been averted, and then say, "Well duh." Always. It is a great way to sound and feel smart. But it never actually fixes anything. Indeed, it can prevent the fixing of things because, having blamed someone, we mostly stop looking for useful lessons to learn.
If you want the book-length version of this, Sidney Dekker's "Field Guide to Understanding Human Error" has a great explanation of why retrospective blame ends up being immensely harmful: http://www.amazon.com/Field-Guide-Understanding-Human-Error/...
It's a brilliant book written by Sidney Dekker, a "Professor of Human Factors and Flight Safety". The basic point is that the default way of understanding bad outcomes is what he calls "the Old View or the Bad Apple Theory". He instead argues for the New View, where "human error is a symptom of trouble deeper inside a system".
Normally with a book like this, I read the first couple of chapters, say, "Ok, I get the idea," and can ignore the rest. After all, I both agree with and understand the basic thesis. But so far every chapter has been surprisingly useful; I keep discovering that I have Old View notions hidden away. E.g., when I discover a systemic flaw, I'm inclined to blame "bad design". But he points out that's a fancy way of calling the problem human error, just a different human and a different error than normal.
Even the driest parts are helped by his frequent use of examples, often taken from real-world aviation accident reports. There are also fascinating bits like a system for high-resolution markup of dialog transcripts to indicate timing (down to 1/10th second), speech inflection, and emphasis. I'll never use it myself, but I will definitely use the mindset that it requires.
Given how much time software projects spend dealing with bugs, I believe we need a new way to think about them, and for me this book describes a big piece of that.
1: https://codeascraft.com/2012/05/22/blameless-postmortems/
2: http://www.amazon.com/Field-Guide-Understanding-Human-Error/...
http://www.amazon.com/Field-Guide-Understanding-Human-Error/...
Fantastic read about the futility of placing blame on a single human in a catastrophe like this. It makes a strong case for why more automation often causes more work. Definitely worth checking out, Etsy has applied it to their engineering work by using it to facilitate blameless post mortems:
It's an incredibly thorough treatment of the incentives and psychology that lead to people labeling process failures as 'human error'.
Most of the book deals with manufacturing, aviation, and air control failures, but the principles generalize so easily to software development that it's a treat to read. One thing that makes it so good is that I was vaguely aware of most of what he covers before having read it, but reading him stitch it all together brought me to the point of intuitively understanding the concepts that had been floating in the back of my mind, and being able to see them all around me at work. He puts it together so smoothly that after having read it, it felt like I always knew what I had just learned.
It's super expensive on amazon https://www.amazon.com/Field-Guide-Understanding-Human-Error... but available on all the online library sites that aren't for linking in polite company. It's also on audible.