A book I read more than a decade ago helped me learn about this stuff: Release It! by Michael T. Nygard. https://www.oreilly.com/library/view/release-it/978168050026... I am wondering if the information contained within is really dated now, but I am not really sure.
I believe that load-testing components in concert, identifying such spots, and then reproducing the problems in isolation and alleviating them may be a solution to preemptively find such issues. That does come with the cost of such work and assumes that one can use a workload that is representative of actual issue-causing workloads in production. I believe that this is a skill that people and teams can learn, but mistakes are easy to make. I also think one should be careful with using production data for privacy reasons and to avoid accidentally performing real-world actions with consequences during testing - another area of software development I feel is worth considering (conscious and well-designed separation of environments into development and testing and production segments, e.g.).
https://www.oreilly.com/library/view/release-it/978168050026...