Computer Engineering: A DEC View of Hardware Systems Design by C. Gordon Bell, J. Craig Mudge, John E. McNamara, and Kenneth H. Olson
(If you're not familiar with the first and last authors, your study of DEC history should fill you in, as cafard notes elsewhere in this thread.)
http://www.amazon.com/Computer-Engineering-Hardware-Systems-...
https://archive.org/details/ComputerEngineeringADecViewOfHar...
This is what I'm trying to recapture with my ARM project. Basically an ARM Cortex M4 is of the same order of complexity as large mini-computer "back in the day" where you could (and often did) learn all the basics of computers from architecture to compiler construction. I realized that I had a tremendous advantage learning about computers because you could put an entire PDP 11 architecture in your head while you were writing code, but you can't so easily do that we even an ATOM version of the Pentium. Combined with a straight forward I/O system that kept to a small number of principles, used repeatedly, and you did not have "needless"[1] complexity getting in the way of learning.
Another good reference for seeing how things were build is "Computer Engineering: A DEC View of Hardware Design" [2] which discusses all sorts of trade offs in computer that once you understand them, things like superscalar execution units make much more sense to you.
[1] It is all useful complexity but before you know what you don't know it is just a wall of confusing concepts and jargon.
[2]http://www.amazon.com/Computer-Engineering-Hardware-Systems-...