ISBN: 0195126688
Buy on Amazon
Found in 5 comments on Hacker News
andyxor · 2021-07-17 · Original thread
> What area of machine learning do you feel is closer to how natural cognition works?

None. The prevalent ideas in ML are a) "training" a model via supervised learning b) optimizing model parameters via function minimization/backpropagation/delta rule.

There is no evidence for trial & error iterative optimization in natural cognition. If you'd try to map it to cognition research the closest thing would be behaviorist theories by B.F. Skinner from 1930s. These theories of 'reward and punishment' as a primary mechanism of learning have been long discredited in cognitive psychology. It's a black-box, backwards looking view disregarding the complexity of the problem (the most thorough and influential critique of this approach was by Chomsky back in the 50s)

The ANN model that goes back to Mcculloch & Pitts paper is based on neurophysiological evidence available in 1943. The ML community largely ignores fundamental neuroscience findings discovered since (for a good overview see https://www.amazon.com/Brain-Computations-Edmund-T-Rolls/dp/... )

I don't know if it has to do with arrogance or ignorance (or both) but the way "AI" is currently developed is by inventing arbitrary model contraptions with complete disregard for constraints and inner workings of living intelligent systems, basically throwing things at the wall until something sticks, instead of learning from nature, like say physics. Saying "but we don't know much about the brain" is just being lazy.

The best description of biological constraints from computer science perspective is in Leslie Valiant work on "neuroidal model" and his book "circuits of the mind" (He is also the author of PAC learning theory influential in ML theorist circles) https://web.stanford.edu/class/cs379c/archive/2012/suggested... , https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...

If you're really interested in intelligence I'd suggest starting with representation of time and space in the hippocampus via place cells, grid cells and time cells, which form sort of a coordinate system for navigation, in both real and abstract/conceptual spaces. This likely will have the same importance for actual AI as Cartesian coordinate system in other hard sciences. See https://www.biorxiv.org/content/10.1101/2021.02.25.432776v1 and https://www.sciencedirect.com/science/article/abs/pii/S00068...

Also see research on temporal synchronization via "phase precession", as a hint on how lower level computational primitives work in the brain https://www.sciencedirect.com/science/article/abs/pii/S00928...

And generally look into memory research in cogsci and neuro, learning & memory are highly intertwined in natural cognition and you can't really talk about learning before understanding lower level memory organization, formation and representational "data structures". Here are a few good memory labs to seed your firehose

https://twitter.com/MemoryLab

https://twitter.com/WiringTheBrain

https://twitter.com/TexasMemory

https://twitter.com/ptoncompmemlab

https://twitter.com/doellerlab

https://twitter.com/behrenstimb

https://twitter.com/neurojosh

https://twitter.com/MillerLabMIT

andyxor · 2021-04-05 · Original thread
The back-prop learning algorithm requires information non-local to the synapse to be propagated from output of the network backwards to affect neurons deep in the network.

There is simply no evidence for this global feedback loop, or global error correction, or delta rule training in neurophysiological data collected in the last 80 years of intensive research. [1]

As for "why", biological learning it is primarily shaped by evolution driven by energy expenditures constraints and survival of the most efficient adaptation engines. One can speculate that iterative optimization akin to the one run by GPUs in ANNs is way too energy inefficient to be sustainable in a living organism.

Good discussion on biological constraints of learning (from CompSci perspective) can be found in Leslie Valiant book [2]. Prof. Valiant is the author of PAC [3] one of the few theoretically sound models of modern ML, so he's worth listening to.

[1] https://news.ycombinator.com/item?id=26700536

[2] https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...

[3] https://en.wikipedia.org/wiki/Probably_approximately_correct...

andyxor · 2021-04-02 · Original thread
exactly, besides ignoring the innate structures, heuristics and biases hardcoded via evolution, the whole notion of "learning" became highly intertwined with reinforcement kind of learning, i.e trial & error, stimulus and response behaviorist terms popularized by Pavlov and Skinner a century ago, which is just one type in a large repertoire of adaptation mechanisms.

Memory in these models is used as afterfact, or some side utility for complex iterative routines based on calculus of function optimization. While in living organisms memory and its "hardcoded" shortcuts allow to cut through the search space quickly as in a large database index.

Speaking in database terms we have something like "materialized views" on acquired and genetically inherited knowledge, built from compressed and hierarchically organized sensory data and prior related actions and associations, including causal links. Causality is just a way to associate items in the memory graph.

Error correction doesn't play as much role in storing and retrieving information and pattern recognition, as current machine learning models may lead you to believe.

Instead, something akin to self-organized clustering is going on, with new info embedded in the existing "concept" graph via associations and generalizations, through simple LINK and JOIN mechanisms on massive scale.[1] The formation of this graph in long term memory is tightly coupled with sleep cycles and memory consolidation, while short term memory serves as a kind of cache.

Knowledge is organized hierarchically starting from principal components [2] of sensory data from e.g. visual receptive fields, and increasing in level of abstraction via "chunking", connecting objects A and B to form a new object C via JOIN mechanism, or associating objects A and B via LINK mechanism. Both LINK and JOIN outputs are "persisted" to memory via Hebbian plasticity.

All knowledge including causal links are expressed via this simple mechanism. Generating a prediction given a new sensory signal is just LINKing the signal with existing cluster by similarity.

Navigation in this abstract space is facilitated via coordinate system similar or perhaps identical to the role hippocampal place & grid cells play in spatial navigation. Similarity between objects is determined as similarity between their "embeddings" in this abstract concept space.

It's possible that innate structures are genetically pre-wired in this graph which represent high level "schemas", such as innate language grammar which distinguishes e.g. verb from noun, visual object grammar which distinguishes "up" from "down", etc. It is also possible these are embodied, i.e. connected to some representation of motor and sensory embeddings. And serve to bootstrap the graph structure for subsequent knowledge acquisition. I.e. no blank slate.

The information is passed, stored and retrieved via several (analogue) means both in point-to-point and broadcast communication, with electromagnetic oscillations playing primary role in synchronization in neural assemblies, facilitating e.g. speech segmentation (or boundary detection in general), and coupling an input signal "embedding" to existing knowledge embeddings in short term memory; while neural plasticity/LTP/STDP as storage mechanisms on single neuron level.

[1] See Leslie Valiant "neuroidal" model and his book https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...

[2] See Oja Rule http://www.scholarpedia.org/article/Oja_learning_rule

and Olshausen & Field classic work on sparse coding http://www.scholarpedia.org/article/Sparse_coding

bra-ket · 2020-12-06 · Original thread
Deep learning is mostly irrelevant for AGI but the best part of the article is bringing up the "recursive process called Merge”.

This Merge [0] is called "chunking" in cognitive psychology [1, 2], first mentioned in classic paper "The Magical Number Seven" by George A. Miller [3].

In the original Chomsky work[0] it is buried so deep in linguistics jargon it's easy to miss the centrality of this concept, which is the essence of generalization capability in biological mind.

It's the JOIN in Leslie Valiant LINK/JOIN model [4, 5]:

"The first basic function, JOIN, implements memory formation of a new item in terms of two established items: If two items A and B are already represented in the neural system, the task of JOIN is to modify the circuit so that at subsequent times there is the representation of a new item C that will fire if and only if the representations of both A and B are firing."

Papadimitriou & Vempala [6] extend it to "predictive join" (PJOIN) model.

Edit: As I think about it deep learning might be useful in implementing this "Merge" by doing nonlinear PCA (Principal Component Analysis) via stacked sparse autoencoders, kind of like in that "Cat face detection" paper by Quoc Le [7]. The only thing missing is hierarchical memory representation for those principal components, where NEW objects emerge by joining most similar existing objects.

[0] https://en.wikipedia.org/wiki/Merge_(linguistics)

[1] https://en.wikipedia.org/wiki/Chunking_(psychology)

[2] http://www.columbia.edu/~nvg1/Wickelgren/papers/1979cWAW.pdf

[3] https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus...

[4] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.208...

[5] https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...

[6] https://arxiv.org/pdf/1412.7955.pdf

[7] https://ieeexplore.ieee.org/abstract/document/6639343

bra-ket · 2020-10-21 · Original thread
I applaud the effort but the problem with RL as a model of learning is in the definition of RL itself. The idea of using "rewards" as a primary learning mechanism and a path to actual cognition is just wrong, full stop. It's a wrong level of abstraction and is too wasteful in energy spent.

Looking at it from CogSci perspective it is essentially an offshoot of behaviorism, using a coarse and extremely inefficient model of learning as reward and punishment, iterative trial and error process.

This 'Skinnerism' has been discredited in cognitive psychology decades ago and makes absolutely no biological sense whatsoever for the simple reason that any organism trying to adapt in this way will be eaten by predators before minimizing its "error function" sufficiently.

Living learning organisms have limited resources (energy and time), and they cut the search space drastically through shortcuts and heuristics and hardcoded biases instead of doing some kind of brute force optimization.

This is the case where computational efficiency [1] comes first and sets the constraints by which cognitive apparatus needs to be developed.

As for actual cognition models a good place to start is not ML/AI field (which tends to getting stuck in local minima as a whole), but state-of-the-art cognitive psychology, and may be looking at research in "distributional semantics", "concept spaces", "sparse representations", "small-world networks" and "learning and memory" neuroscience.

You'd be surprised how much knowledge we gained about the mind since those RL & ANN models developed in the 1940s.

[1] https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...