The biggest challenge to producing a unified programming model across all of these is dealing with moving data around. CPU vs GPU vs FPGA all have very different memory access characteristics. Gary Sabot was the first to try to tackle the problem of dealing with memory layout/locality in a language with what he called the Paralation model (http://www.amazon.com/The-Paralation-Model-Architecture-Inde...) and a lot of that work carried over into NESL (http://www.cs.cmu.edu/~scandal/nesl.html). Hadoop is a very lobotomized and hard to use modern version of these ideas.
I wouldn't knock DSLs - a lot of people are having good results using DSLs to program FPGAs (for example that's pretty much what http://www.novasparks.com/ does - http://lisp-univ-etc.blogspot.com/2013/06/lisp-hackers-marc-...).
http://www.amazon.com/Paralation-Model-Architecture-Independ...
https://www.amazon.com/Paralation-Model-Architecture-Indepen...