What is the central concept of The Deep Learning Revolution?

↑ backpropagation's 1986 triumph over symbolic ai. Sejnowski-Hinton backpropagation (1986). Rumelhart, Hinton, and Williams published backpropagation algorithm. Solved multilayer learning problem that killed perceptrons. Sejnowski was close collaborator.

What is AlexNet victory (2012) in The Deep Learning Revolution?

Krizhevsky-Sutskever-Hinton won ImageNet by 10% margin using deep ConvNets on GPUs. Definitively proved deep learning superiority on vision.

What is learned representations in The Deep Learning Revolution?

Central insight: networks learn hierarchical features automatically from data. Replaced hand-engineering with end-to-end learning.

What is paradigm shift: learning beats engineering in The Deep Learning Revolution?

Core thesis: deep learning's victory represents fundamental shift from hand-designed systems to learned representations. Data + compute + brain-inspired architecture won.

What is the main argument of The Deep Learning Revolution?

Minsky-Papert XOR critique. 1969 book proved perceptrons couldn't solve XOR problem, couldn't represent multilayer networks. Triggered first AI winter for connectionism.

The Deep Learning Revolution · Knowledge Graph

Knowledge Graph: The Deep Learning Revolution (Terrence Sejnowski, 2018)

Editorial spotlight: ↑ backpropagation's 1986 triumph over symbolic AI

Concepts

GPU computing revolution (importance 5): Graphics cards provided massive parallelism needed for deep learning. NVIDIA CUDA enabled 10-50x speedups over CPUs for neural network training.. Source: (from training memory of book).
learned representations (importance 5): Central insight: networks learn hierarchical features automatically from data. Replaced hand-engineering with end-to-end learning.. Source: (from training memory of book).
symbolic AI paradigm (importance 4): Knowledge represented as explicit rules and logic. Dominated AI from 1970s-2000s. Required hand-engineering of features and rules.. Source: (from training memory of book).
end-to-end learning (importance 4): Train entire system jointly from raw input to output. Avoids pipeline of hand-designed components.. Source: (from training memory of book).
hand-crafted feature engineering (importance 3): Traditional ML required domain experts to design features. Deep learning replaced this with learned representations.. Source: (from training memory of book).
Hebbian learning (importance 3): Neurons that fire together wire together. Biological learning principle that inspired early neural network algorithms.. Source: (from training memory of book).
big data era (importance 3): Internet provided unprecedented amounts of data. Photos, text, clicks. Enabled data-hungry deep learning approaches.. Source: (from training memory of book).
transfer learning (importance 3): Pretrain on large dataset, fine-tune on small target task. Leverages learned representations across domains.. Source: (from training memory of book).
interpretability challenge (importance 3): Deep networks are black boxes. Understanding what they learn and why they fail remains difficult.. Source: (from training memory of book).
hardware-software codesign (importance 3): Google TPUs, specialized AI chips. Hardware designed specifically for neural network operations.. Source: (from training memory of book).
AI ethics concerns (importance 3): Bias in training data, job displacement, autonomous weapons. Growing awareness by 2018 of societal impacts.. Source: (from training memory of book).
unsupervised learning frontier (importance 3): Learn from unlabeled data. Humans learn mostly without supervision; deep learning still relies heavily on labels.. Source: (from training memory of book).
future architecture evolution (importance 3): As of 2018: Transformers emerging, capsule networks proposed, graph networks developing. Architecture search automating design.. Source: (from training memory of book).
few-shot learning (importance 2): Learn from very few examples, like humans do. Still challenging for deep learning as of 2018.. Source: (from training memory of book).
weight initialization schemes (importance 2): Xavier/He initialization critical for training deep networks. Poor initialization prevents learning.. Source: (from training memory of book).
overfitting problem (importance 2): Networks memorize training data instead of learning patterns. Requires regularization and large datasets.. Source: (from training memory of book).
neuromorphic hardware (importance 2): Brain-inspired chips using spiking neurons and analog computation. Promise of ultra-low power AI.. Source: (from training memory of book).
AGI timeline debates (importance 2): Will deep learning lead to artificial general intelligence? Optimists say 10-20 years, skeptics say fundamental gaps remain.. Source: (from training memory of book).
embodied cognition gap (importance 2): Networks lack grounding in physical world. Humans learn through interaction and embodiment.. Source: (from training memory of book).
inductive biases (importance 2): Architectural choices encode assumptions about problem structure. ConvNets encode spatial locality, RNNs encode temporal structure.. Source: (from training memory of book).
learning to learn (importance 2): Train networks to quickly adapt to new tasks. Goal: match human ability to generalize from few examples.. Source: (from training memory of book).
multitask learning (importance 2): Train one network on multiple tasks simultaneously. Shares representations, improves generalization.. Source: (from training memory of book).
catastrophic forgetting (importance 2): Networks forget old tasks when trained on new ones. Humans accumulate knowledge without forgetting.. Source: (from training memory of book).

Claims

Minsky-Papert XOR critique (importance 5): 1969 book proved perceptrons couldn't solve XOR problem, couldn't represent multilayer networks. Triggered first AI winter for connectionism.. Source: (from training memory of book).
learning from brain architecture (importance 5): Deep learning's success came from mimicking brain's hierarchical organization, not from hand-designed logic. Vindicated connectionist philosophy.. Source: (from training memory of book).
paradigm shift: learning beats engineering (importance 5): Core thesis: deep learning's victory represents fundamental shift from hand-designed systems to learned representations. Data + compute + brain-inspired architecture won.. Source: (from training memory of book).
vanishing gradient problem (importance 4): Deep networks couldn't train effectively because gradients became exponentially small in early layers. Blocked scaling for 20 years.. Source: (from training memory of book).
deep learning data hunger (importance 4): Deep networks require massive labeled datasets to work well. ImageNet scale (millions of examples) was critical to 2012 breakthrough.. Source: (from training memory of book).
neuroscience-AI virtuous cycle (importance 4): AI learns from brain; AI models help understand brain. Sejnowski's career exemplifies this bidirectional inspiration.. Source: (from training memory of book).
SVM decade (1995-2005) (importance 3): Support Vector Machines dominated machine learning. Had theoretical guarantees neural networks lacked. Kernel trick provided nonlinearity without backprop.. Source: (from training memory of book).
Moravec's paradox (importance 3): Hard things for humans (chess, math) are easy for computers. Easy things for humans (vision, movement) are hard. Deep learning reversed this.. Source: (from training memory of book).
theory lag behind practice (importance 3): Deep learning works far better than theory predicts. Optimization and generalization not well understood mathematically.. Source: (from training memory of book).
common sense reasoning gap (importance 3): Deep learning excels at pattern matching but struggles with reasoning and common sense that humans find trivial.. Source: (from training memory of book).
algorithmic bias problem (importance 2): Networks learn biases from training data. Face recognition worse on minorities, word embeddings encode stereotypes.. Source: (from training memory of book).
brain energy efficiency gap (importance 2): Brain runs on 20 watts. AlphaGo used megawatts. Biological computation far more efficient than current deep learning.. Source: (from training memory of book).

Empirical results

NETtalk (Sejnowski-Rosenberg 1987) (importance 5): Neural network that learned to pronounce English text. First widely publicized success of backpropagation. Demonstrated learning from examples vs hand-coded rules.. Source: (from training memory of book).
AlexNet victory (2012) (importance 5): Krizhevsky-Sutskever-Hinton won ImageNet by 10% margin using deep ConvNets on GPUs. Definitively proved deep learning superiority on vision.. Source: (from training memory of book).
AlphaGo defeats Lee Sedol (2016) (importance 5): DeepMind's AlphaGo beat world champion at Go, game considered too complex for computers. Combined deep learning with Monte Carlo tree search.. Source: (from training memory of book).
First AI Winter (1970s) (importance 4): Funding dried up for neural networks after Minsky-Papert critique. Symbolic AI dominated research funding for 15 years.. Source: (from training memory of book).
DeepMind Atari DQN (2013) (importance 4): Deep Q-Network learned to play Atari games from pixels using reinforcement learning. Single architecture mastered diverse games.. Source: (from training memory of book).
Second AI Winter (early 1990s) (importance 3): Neural networks again lost funding and credibility. Couldn't scale to real-world problems, beaten by SVMs and other kernel methods.. Source: (from training memory of book).
Word2Vec embeddings (2013) (importance 3): Learned dense vector representations of words that captured semantic relationships. Showed deep learning could work for language, not just vision.. Source: (from training memory of book).
adversarial examples (importance 3): Imperceptible perturbations can fool neural networks. Reveals brittleness and gap from human perception.. Source: (from training memory of book).
neural machine translation (importance 3): Seq2seq with attention replaced statistical MT. Google Translate switched 2016, gained 60% improvement.. Source: (from training memory of book).
deep learning speech recognition (importance 3): Deep networks reached human parity on conversational speech. Enabled Siri, Alexa, Google Assistant.. Source: (from training memory of book).
self-driving cars (importance 3): Deep learning for perception (LiDAR, camera fusion). By 2018, partial autonomy deployed in Teslas.. Source: (from training memory of book).
medical image diagnosis (importance 2): Networks match radiologists on specific tasks (diabetic retinopathy, lung cancer). FDA approvals beginning 2017-2018.. Source: (from training memory of book).
AlphaFold protein prediction (importance 2): DeepMind applied deep learning to protein structure prediction. Hinted at future scientific applications.. Source: (from training memory of book).

Methods

Sejnowski-Hinton backpropagation (1986) (importance 5): Rumelhart, Hinton, and Williams published backpropagation algorithm. Solved multilayer learning problem that killed perceptrons. Sejnowski was close collaborator.. Source: (from training memory of book).
Hinton's layer-wise pretraining (2006) (importance 5): Greedy layer-by-layer unsupervised pretraining using RBMs. Broke through vanishing gradient problem, enabled training deep networks.. Source: (from training memory of book).
LeCun's convolutional networks (importance 4): Yann LeCun developed convolutional neural networks for handwriting recognition at Bell Labs. Used in check-reading systems.. Source: (from training memory of book).
Deep Belief Networks (importance 4): Stacked Restricted Boltzmann Machines pretrained layer-wise. First successful deep architecture, sparked 'deep learning' terminology.. Source: (from training memory of book).
Transformer architecture (2017) (importance 4): Attention-based architecture from Google. Replaced recurrence with self-attention, enabled massive parallelization and scaling.. Source: (from training memory of book).
ResNet skip connections (2015) (importance 4): Residual connections allowed training networks with 100+ layers. Won ImageNet 2015, showed depth was key to performance.. Source: (from training memory of book).
attention mechanism (importance 4): Allow networks to focus on relevant parts of input. Key innovation for translation and later Transformers.. Source: (from training memory of book).
Hopfield networks (1982) (importance 3): Energy-based recurrent networks that could store and retrieve patterns. Brought physicists into neural network research.. Source: (from training memory of book).
Boltzmann machine (importance 3): Stochastic version of Hopfield nets using simulated annealing. Could learn hidden representations but was computationally expensive.. Source: (from training memory of book).
LSTM (Hochreiter-Schmidhuber 1997) (importance 3): Long Short-Term Memory networks solved vanishing gradient for sequences using gating mechanisms. Enabled recurrent networks to learn long-range dependencies.. Source: (from training memory of book).
Dropout regularization (importance 3): Randomly dropping units during training prevented overfitting in large networks. Key technique enabling AlexNet success.. Source: (from training memory of book).
ReLU activation (importance 3): Rectified Linear Units replaced sigmoid/tanh. Avoided vanishing gradients, enabled much faster training of deep networks.. Source: (from training memory of book).
GANs (Goodfellow 2014) (importance 3): Generative Adversarial Networks pit generator against discriminator. Produced realistic images without explicit probability models.. Source: (from training memory of book).
sequence-to-sequence models (importance 3): Encoder-decoder architecture for variable-length inputs/outputs. Enabled neural machine translation.. Source: (from training memory of book).
stochastic gradient descent (importance 3): Update weights using small random batches. Noisy but enables online learning and escapes local minima.. Source: (from training memory of book).
self-supervised pretraining (importance 3): Create labels automatically from data structure (predict next word, rotate image). Emerging as key technique 2017-2018.. Source: (from training memory of book).
expert systems (importance 2): Rule-based systems encoding human expertise. Popular in 1980s but brittle and expensive to maintain.. Source: (from training memory of book).
Fukushima's Neocognitron (1980) (importance 2): Early hierarchical neural network inspired by visual cortex. Precursor to modern convolutional networks but lacked backprop.. Source: (from training memory of book).
Batch Normalization (importance 2): Normalize activations within mini-batches. Enabled much faster training and higher learning rates.. Source: (from training memory of book).
Neural Turing Machines (importance 2): Networks with external memory and attention. Could learn simple algorithms like sorting.. Source: (from training memory of book).
Adam optimizer (importance 2): Adaptive learning rates per parameter. Combines momentum and RMSprop, became default optimizer.. Source: (from training memory of book).
data augmentation (importance 2): Artificially expand dataset with transformations (rotations, crops, etc). Reduces overfitting.. Source: (from training memory of book).
neural architecture search (importance 2): Automatically discover network architectures via evolution or RL. Found models competitive with human designs.. Source: (from training memory of book).
spiking neural networks (importance 1): More biologically realistic than backprop. Binary spikes instead of continuous activations. Promising for neuromorphic hardware.. Source: (from training memory of book).

Entities

Geoffrey Hinton (importance 5): Kept neural networks alive during AI winters. Invented backprop, Boltzmann machines, dropout, layer-wise pretraining. 'Godfather of deep learning.'. Source: (from training memory of book).
Rosenblatt's Perceptron (1958) (importance 4): First learning algorithm for neural networks, used delta rule for single-layer networks. Sparked initial neural network enthusiasm.. Source: (from training memory of book).
PDP volumes (Rumelhart-McClelland 1986) (importance 4): Two-volume Parallel Distributed Processing books. Became the bible of connectionism, trained a generation of researchers.. Source: (from training memory of book).
ImageNet dataset (Fei-Fei Li 2009) (importance 4): 14 million labeled images across 20,000 categories. Scale was orders of magnitude beyond previous vision datasets.. Source: (from training memory of book).
Yann LeCun (importance 4): Pioneered convolutional networks at Bell Labs and NYU. Became Facebook AI Research director.. Source: (from training memory of book).
Yoshua Bengio (importance 4): Developed sequence models and attention mechanisms. Key figure in Montreal deep learning community.. Source: (from training memory of book).
MNIST dataset (importance 3): 60,000 handwritten digit images. Became standard benchmark for vision algorithms throughout 1990s-2000s.. Source: (from training memory of book).
Hubel-Wiesel visual cortex (importance 3): Hierarchical organization of visual cortex discovered by Hubel and Wiesel. Simple cells → complex cells. Inspired convolutional architectures.. Source: (from training memory of book).
Andrew Ng (importance 3): Popularized deep learning at Google and Baidu. Created Coursera deep learning courses reaching millions.. Source: (from training memory of book).
Ilya Sutskever (importance 3): Hinton's student, co-created AlexNet. Co-founded OpenAI, became chief scientist.. Source: (from training memory of book).
Demis Hassabis (importance 3): Co-founded DeepMind, led AlphaGo project. Neuroscientist background informed AI research approach.. Source: (from training memory of book).
Fei-Fei Li (importance 3): Created ImageNet dataset. Organized annual competition that drove computer vision progress 2010-2017.. Source: (from training memory of book).
Google TPU chips (importance 2): Tensor Processing Units optimized for matrix multiply. 10x more efficient than GPUs for inference.. Source: (from training memory of book).
Searle's Chinese Room (importance 2): Philosophical argument that symbol manipulation isn't understanding. Debate continues whether deep learning truly 'understands.'. Source: (from training memory of book).

Relations

Rosenblatt's Perceptron (1958) precedes Minsky-Papert XOR critique
Minsky-Papert XOR critique enables First AI Winter (1970s)
First AI Winter (1970s) enables symbolic AI paradigm
Hopfield networks (1982) enables Boltzmann machine
Boltzmann machine precedes Sejnowski-Hinton backpropagation (1986)
Sejnowski-Hinton backpropagation (1986) enables NETtalk (Sejnowski-Rosenberg 1987)
Sejnowski-Hinton backpropagation (1986) evidences PDP volumes (Rumelhart-McClelland 1986)
NETtalk (Sejnowski-Rosenberg 1987) exemplifies learned representations
PDP volumes (Rumelhart-McClelland 1986) supports learned representations
vanishing gradient problem enables Second AI Winter (early 1990s)
vanishing gradient problem motivates Hinton's layer-wise pretraining (2006)
LeCun's convolutional networks requires MNIST dataset
Hubel-Wiesel visual cortex motivates LeCun's convolutional networks
Hubel-Wiesel visual cortex motivates Fukushima's Neocognitron (1980)
Fukushima's Neocognitron (1980) precedes LeCun's convolutional networks
SVM decade (1995-2005) supports Second AI Winter (early 1990s)
Hinton's layer-wise pretraining (2006) enables Deep Belief Networks
Deep Belief Networks refutes vanishing gradient problem
ImageNet dataset (Fei-Fei Li 2009) enables AlexNet victory (2012)
GPU computing revolution enables AlexNet victory (2012)
AlexNet victory (2012) requires Dropout regularization
AlexNet victory (2012) requires ReLU activation
AlexNet victory (2012) refutes symbolic AI paradigm
AlexNet victory (2012) evidences paradigm shift: learning beats engineering
Word2Vec embeddings (2013) exemplifies learned representations
DeepMind Atari DQN (2013) exemplifies end-to-end learning
AlphaGo defeats Lee Sedol (2016) evidences paradigm shift: learning beats engineering
Transformer architecture (2017) builds-on attention mechanism
sequence-to-sequence models enables attention mechanism
symbolic AI paradigm exemplifies expert systems
symbolic AI paradigm requires hand-crafted feature engineering
hand-crafted feature engineering contradicts learned representations
learned representations enables end-to-end learning
Hebbian learning precedes Sejnowski-Hinton backpropagation (1986)
learning from brain architecture supports paradigm shift: learning beats engineering
Hubel-Wiesel visual cortex evidences learning from brain architecture
Batch Normalization enables ResNet skip connections (2015)
ResNet skip connections (2015) refutes vanishing gradient problem
deep learning data hunger requires big data era
big data era enables AlexNet victory (2012)
ImageNet dataset (Fei-Fei Li 2009) exemplifies deep learning data hunger
transfer learning builds-on learned representations
GANs (Goodfellow 2014) exemplifies learned representations
sequence-to-sequence models requires LSTM (Hochreiter-Schmidhuber 1997)
attention mechanism enables Transformer architecture (2017)
Geoffrey Hinton evidences Sejnowski-Hinton backpropagation (1986)
Geoffrey Hinton evidences Hinton's layer-wise pretraining (2006)
Geoffrey Hinton evidences Dropout regularization
Yann LeCun evidences LeCun's convolutional networks
Yoshua Bengio evidences attention mechanism
Ilya Sutskever evidences AlexNet victory (2012)
Demis Hassabis evidences AlphaGo defeats Lee Sedol (2016)
Demis Hassabis evidences DeepMind Atari DQN (2013)
Fei-Fei Li evidences ImageNet dataset (Fei-Fei Li 2009)
few-shot learning contradicts deep learning data hunger
interpretability challenge motivates adversarial examples
theory lag behind practice contradicts Deep Belief Networks
stochastic gradient descent requires Sejnowski-Hinton backpropagation (1986)
Adam optimizer builds-on stochastic gradient descent
weight initialization schemes refutes vanishing gradient problem
overfitting problem motivates Dropout regularization
overfitting problem motivates data augmentation
neural machine translation requires sequence-to-sequence models
neural machine translation requires attention mechanism
deep learning speech recognition requires LSTM (Hochreiter-Schmidhuber 1997)
self-driving cars builds-on LeCun's convolutional networks
medical image diagnosis builds-on AlexNet victory (2012)
AlphaFold protein prediction builds-on AlphaGo defeats Lee Sedol (2016)
hardware-software codesign exemplifies Google TPU chips
GPU computing revolution precedes hardware-software codesign
neuromorphic hardware requires spiking neural networks
AI ethics concerns exemplifies algorithmic bias problem
AGI timeline debates motivates common sense reasoning gap
Searle's Chinese Room motivates interpretability challenge
embodied cognition gap supports common sense reasoning gap
inductive biases exemplifies LeCun's convolutional networks
learning to learn enables few-shot learning
unsupervised learning frontier exemplifies self-supervised pretraining
self-supervised pretraining builds-on unsupervised learning frontier
multitask learning builds-on transfer learning
catastrophic forgetting motivates multitask learning
brain energy efficiency gap motivates neuromorphic hardware
spiking neural networks contradicts Sejnowski-Hinton backpropagation (1986)
neuroscience-AI virtuous cycle supports learning from brain architecture
neuroscience-AI virtuous cycle supports paradigm shift: learning beats engineering
future architecture evolution exemplifies Transformer architecture (2017)
neural architecture search enables future architecture evolution
Sejnowski-Hinton backpropagation (1986) enables learned representations
NETtalk (Sejnowski-Rosenberg 1987) contradicts symbolic AI paradigm
MNIST dataset precedes ImageNet dataset (Fei-Fei Li 2009)
LeCun's convolutional networks precedes AlexNet victory (2012)
LSTM (Hochreiter-Schmidhuber 1997) refutes vanishing gradient problem
ReLU activation refutes vanishing gradient problem
data augmentation enables AlexNet victory (2012)
ResNet skip connections (2015) builds-on AlexNet victory (2012)
Moravec's paradox refutes AlexNet victory (2012)
Moravec's paradox refutes DeepMind Atari DQN (2013)
Word2Vec embeddings (2013) exemplifies self-supervised pretraining
Rosenblatt's Perceptron (1958) builds-on Hebbian learning
Hopfield networks (1982) builds-on Hebbian learning
learned representations supports paradigm shift: learning beats engineering
end-to-end learning supports paradigm shift: learning beats engineering
learning from brain architecture contradicts symbolic AI paradigm
hand-crafted feature engineering contradicts end-to-end learning
ImageNet dataset (Fei-Fei Li 2009) cites Fei-Fei Li
AlexNet victory (2012) cites Ilya Sutskever
AlexNet victory (2012) cites Geoffrey Hinton
DeepMind Atari DQN (2013) cites Demis Hassabis
AlphaGo defeats Lee Sedol (2016) cites Demis Hassabis
attention mechanism cites Yoshua Bengio
LeCun's convolutional networks cites Yann LeCun
Second AI Winter (early 1990s) evidences SVM decade (1995-2005)
Hinton's layer-wise pretraining (2006) cites Geoffrey Hinton
Dropout regularization cites Geoffrey Hinton
Batch Normalization precedes ResNet skip connections (2015)
GPU computing revolution enables big data era
transfer learning builds-on AlexNet victory (2012)
GANs (Goodfellow 2014) motivates adversarial examples
Neural Turing Machines builds-on attention mechanism
adversarial examples evidences interpretability challenge
common sense reasoning gap contradicts AlphaGo defeats Lee Sedol (2016)
unsupervised learning frontier refutes deep learning data hunger
brain energy efficiency gap contradicts GPU computing revolution
spiking neural networks enables neuromorphic hardware
neuroscience-AI virtuous cycle builds-on Hubel-Wiesel visual cortex
neural architecture search builds-on ResNet skip connections (2015)

The Deep Learning Revolution

fast mental map

share a specific view

not a citable source