Anyway, just to practice what I've learned so far I will try to answer some of your questions from the top of my head; apologies in advance for my verbosity:
First of all, let's define functional (in fact, to be strict, declarative; more on this below):
An operation (i.e. a code fragment with a clearly defined input and output) is functional if for a given input it always gives the same output, regardless of all other execution state. It behaves just like a mathematical function, hence the name.
This gives a declarative operation the following properties:
1) Independence: nothing going on in the rest of the world will ever affect it.
2) Statelessness (same as immutability): there is no observable internal state; the output is the same every single time it is invoked with the same input.
3) Determinism: the output depends exclusively on the input and is always the same for a given input.
So what is the difference between functional and declarative? Functional is just a subset of declarative = declarative - dataflow variables
These properties give a functional program the following key benefits:
1) It is easier to design, implement and test. This is because of the above properties. For instance, because the output will never vary between different invocations, each input only needs to be tested once.
2) Easier to reason about (to prove correct). Algebraic reasoning (applying referential transparency for instance: if f(a)=a^2 then all occurences of f(a) can be replaced with a^2) and logical reasoning can be applied.
To further explore the practical implications of all this lets say that, given that all functional programs consist of a hierarchy of components (clearly defined program fragments connected exclusively to other components through their inputs and outputs) to understand a functional program it suffices to understand each of its components in isolation.
Basically, despite other programming models having more mindshare (but, as far as I can tell, aren't really better known, and this includes me ;), because of the above properties functional programming is fundamentally simpler than more expressive models, like OO and other models with explicit state.
Another very important point is that it is perfectly acceptable and feasible to write functional programs in non strictly functional languages like Java of C++ (although not in C, I won't explain why, it's complicated but basically the core reason has to do with how memory management is done in C).
This is because functional programming is not restricted to functional languages (where the program will be functional by definition no matter how much you mess up).
A program is functional if it is observably functional, if it behaves in the way specified above.
This can be achieved in, say, Java, with some discipline and if you know what you are doing; the Interpreter and Visitor design patterns are exactly for this, and one of the key operations to implement higher order programming, procedural abstraction, can easily be done using objects (see the excellent MIT OCW course https://ocw.mit.edu/courses/electrical-engineering-and-compu... for more on this).
Because of its limitations, it is often impossible to write a purely functional program. This is because the real world is statefull and concurrent. For instance, it is impossible to write a purely functional client-server application. How about IO or a GUI? Nope. I don't know Haskell yet, it seems they somehow pull it off with monads, but this approach, although impressive, is certainly not natural.
Garbage collection is a good thing. It's main benefit to functional languages is that it totally avoids dangling references by design. This is key to making determinism possible. Of course, automatically managing inactive memory to avoid most leaks is nice too (but not all leaks, like, say, references to unused variables inside a data structure, or any external resources).
However, functional programs can indeed result in higher memory consumption (bytes allocated per second, as opposed to memory usage, which is the minimum amount of memory for the program to run), which can be an issue in simulators, in which case a good garbage collector is required.
Certain specialised domains, like hard real time where lives are at stake, require specialised hardware and software anyway, never mind whether the language is functional or not.
So, for me, for the reasons above, the take home lesson so far is:
Program in the functional style wherever possible, it is in fact easier to get right due to its higher simplicity, and restrict and encapsulate statefulness (and concurrency) in an abstraction wherever possible (this common technique is called impedance matching).
Each programming problem, or component, etc, involves some degree of design first, or modelling, or a description, whichever word you prefer, it is all the same. There are some decisions you must make before coding, TDD or no TDD.
What paradigm you choose should depend first on the nature of the problem, not on the language. Certain problems are more easily (same as naturally) described in a functional way, as recursive functions on data structures. That part of the program should be implemented in a functional way if your language of choice allows that .
Other programs are more easily modelled as an object graph, or as a state diagram (awesome for IO among other things), and this is the way they should be designed and implemented if possible. But even in this case, some components can be designed in a functional way, and they should be wherever possible.
There is no one superior way, no silver bullet, it all depends on the context. It is better to know multiple programming paradigms without preferring one over the other, and apply the right one to the right problem.
[1] https://www.amazon.com/Concepts-Techniques-Models-Computer-P...
[1] Concepts, Techniques, and Models of Computer Programming (MIT Press) - URL: https://www.amazon.com/Concepts-Techniques-Models-Computer-P...
[2] The Big Book of Concepts (MIT Press) - URL: https://www.amazon.com/Big-Book-Concepts-MIT-Press/dp/026263...
[3] Gödel, Escher, Bach: An Eternal Golden Braid - URL: https://www.amazon.com/Gödel-Escher-Bach-Eternal-Golden/dp/0...
https://www.amazon.com/Concepts-Techniques-Models-Computer-P...
Language & library recs:
Java is actually a pretty shitty language to learn concurrency on, because the concurrency primitives built into the language & stdlib are stuck in the 1970s. There've been some more recent attempts to bolt more modern concurrency patterns on as libraries (java.concurrent is one; Akka is another; Quasar is a third), but you're still very limited by the language definition. Some other languages to study:
Erlang, for message-passing & distributed system design patterns. Go has a similar concurrency model, but not as pure.
Haskell for STM.
Python3.5/ES2017/C#, for async/await & promises. Actually, for a more pure implementation of promises, check out E or the Cap'n Proto RPC framework.
Rust, for mutable borrowing. Rust's concurrency story is fairly unique; they try to prove that data races can't exist by ensuring that only one reference is mutable at once.
JoCaml for the join calculus. Indeed, learning formal models like CSP, the pi-calculus, or the join-calculus can really help improve your intuitions about concurrency.
Hadoop for MapReduce-style concurrency. In particular, learning how you might represent, say, graph algorithms on a MapReduce system is a great teacher. Also look at real-time generalizations of MapReduce paradigms like Storm or Spark.
Paxos & Raft for the thorny problems in distributed consensus.
Vector clocks, operational transforms and CRDTs. One approach to concurrency is to make it not matter by designing your algorithms so that each stage can be applied in arbitrary order (or can compensate for other operations that have occurred in the meantime). That's the idea behind this, and it perhaps has the most long-term promise.
Project & job recs:
The best way to really learn concurrency is to take a job at a company that has to operate at significant scale. Google or Facebook are the most prominent, but any of the recent fast-growers (AirBnB, Uber, Dropbox, probably even Instacart or Zenefits) will have a lot of problems of this type.
Failing that, I've found that implementing a crawler is one giant rabbithole in learning new concurrency techniques. The interesting thing about crawlers is that you can implement a really simple, sequential one in about 15 minutes using a language's standard library, but then each step brings a new problem that you need to solve with a new concurrency technique. For example:
You don't want to wait on the network I/O, so you create multiple threads to crawl multiple sites at once.
You quickly end up exhausting your memory, because the number of URLs found on pages grows exponentially, and so you transition to a bounded thread pool.
You add support for robots.txt and sitemaps. Now you have immutable data that must be shared across threads.
You discover some URLs are duplicates; now you need shared mutable state between your fetch threads.
You start getting 429 and 403 request codes from the sites, telling you to back off and stop crawling them so quickly. Now you need a feedback mechanism from the crawl threads to the crawl scheduler, probably best implemented by message queues.
You want to process the results of the crawl. Now you need to associate the results of multiple fetches together to run analyses on it; this is what MapReduce is for.
You need to write out the results to disk. This is another source of I/O, but with different latency & locking characteristics. You either need another thread pool, or you want to start looking into promises.
You want to run this continuously and update a data store. Now you need to think about transactions.
http://www.amazon.com/Concepts-Techniques-Models-Computer-Pr...
I'll definitely be paying closer attention to this as it agrees with a lot of hunches that I have about what's wrong with programming as we currently practice it.
Thanks so much for the link. I'm taking notes :)
EDIT: Anyone who likes Avail, and particularly the points made in the History section, should probably read CTM. [1]
[1] http://www.amazon.com/Concepts-Techniques-Models-Computer-Pr...
While not a book, an alternative strategy that might be helpful would be to explore some projects like TorqueBox (Ruby) or Immutant (Clojure) that pull together a lot of different solutions (web server, application server, messaging, caching, transactions and scheduling) into a suite.
http://www.amazon.com/Concepts-Techniques-Models-Computer-Pr...
Scala is a "better Java" and you can learn both the JVM and functional programming (take Odersky's course on Coursera). Clojure is a great Lisp but the Java stuff will be very confusing if you haven't seen it before (the JVM-interop functions like proxy, gen-class, and reify don't have the easiest APIs).
This (Structure and Interpretation of Computer Programs) is a great book to get started on the deeper aspects of CS: http://mitpress.mit.edu/sicp/full-text/book/book.html
Also, I like this one: http://www.amazon.com/Concepts-Techniques-Models-Computer-Pr...
A 2004 textbook by the OP:
Concepts, Techniques, and Models of Computer Programming
https://www.amazon.com/Concepts-Techniques-Models-Computer-P...
And his 6 week edX course:
Paradigms of Computer Programming – Fundamentals
https://www.edx.org/course/paradigms-of-computer-programming...