We had already complex pipelining superscalar architecture with Pentium Pro and Pentium 4 (Netburst/Prescott). The current Core Architecture has a lot simpler pipelining based on Pentium 3/M. (see http://en.wikipedia.org/wiki/Clock_rate )
I favor a high single-core over a slow many-core CPUs.
Have you ever coded a many-core application that runs on thousands of CPUs? I have done using it using http://en.wikipedia.org/wiki/Cilk , http://en.wikipedia.org/wiki/OpenMP (and CUDA and OpenCL on GPU), as well as traditional using operating system process and threads.
You need new algorithms that work on massive parallel computers. Converting algorithms from serial to massive parallel is possible in many cases, but really hard science work (have done it).
Nevertheless, for a specific domain I would need a really high speed single-core CPU.
A good book about the topic is "Inside the Machine" from ArsTechnica: http://www.amazon.com/Inside-Machine-Introduction-Microproce... ...and various university lectures.
Get dozens of book recommendations delivered straight to your inbox every Thursday.