封面
版权信息
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is Searching for Authors Like You
Preface
What this book covers
To get the most out of this book
Get in touch
Chapter 1. What is Reinforcement Learning?
Learning – supervised unsupervised and reinforcement
RL formalisms and relations
Markov decision processes
Summary
Chapter 2. OpenAI Gym
The anatomy of the agent
Hardware and software requirements
OpenAI Gym API
The random CartPole agent
The extra Gym functionality – wrappers and monitors
Summary
Chapter 3. Deep Learning with PyTorch
Tensors
Gradients
NN building blocks
Custom layers
Final glue – loss functions and optimizers
Monitoring with TensorBoard
Example – GAN on Atari images
Summary
Chapter 4. The Cross-Entropy Method
Taxonomy of RL methods
Practical cross-entropy
Cross-entropy on CartPole
Cross-entropy on FrozenLake
Theoretical background of the cross-entropy method
Summary
Chapter 5. Tabular Learning and the Bellman Equation
Value state and optimality
The Bellman equation of optimality
Value of action
The value iteration method
Value iteration in practice
Q-learning for FrozenLake
Summary
Chapter 6. Deep Q-Networks
Real-life value iteration
Tabular Q-learning
Deep Q-learning
DQN on Pong
Summary
Chapter 7. DQN Extensions
The PyTorch Agent Net library
Basic DQN
N-step DQN
Double DQN
Noisy networks
Prioritized replay buffer
Dueling DQN
Categorical DQN
Combining everything
Summary
References
Chapter 8. Stocks Trading Using RL
Trading
Data
Problem statements and key decisions
The trading environment
Models
Training code
Results
Things to try
Summary
Chapter 9. Policy Gradients – An Alternative
Values and policy
The REINFORCE method
REINFORCE issues
PG on CartPole
PG on Pong
Summary
Chapter 10. The Actor-Critic Method
Variance reduction
CartPole variance
Actor-critic
A2C on Pong
A2C on Pong results
Tuning hyperparameters
Summary
Chapter 11. Asynchronous Advantage Actor-Critic
Correlation and sample efficiency
Adding an extra A to A2C
Multiprocessing in Python
A3C – data parallelism
A3C – gradients parallelism
Summary
Chapter 12. Chatbots Training with RL
Chatbots overview
Deep NLP basics
Training of seq2seq
The chatbot example
Summary
Chapter 13. Web Navigation
Web navigation
OpenAI Universe
Simple clicking approach
Human demonstrations
Adding text description
Things to try
Summary
Chapter 14. Continuous Action Space
Why a continuous space?
Action space
Environments
The Actor-Critic (A2C) method
Deterministic policy gradients
Distributional policy gradients
Things to try
Summary
Chapter 15. Trust Regions – TRPO PPO and ACKTR
Introduction
Roboschool
A2C baseline
Proximal Policy Optimization
Trust Region Policy Optimization
A2C using ACKTR
Summary
Chapter 16. Black-Box Optimization in RL
Black-box methods
Evolution strategies
ES on CartPole
ES on HalfCheetah
Genetic algorithms
GA on CartPole
GA tweaks
GA on Cheetah
Summary
References
Chapter 17. Beyond Model-Free – Imagination
Model-based versus model-free
Model imperfections
Imagination-augmented agent
I2A on Atari Breakout
Experiment results
Summary
References
Chapter 18. AlphaGo Zero
Board games
The AlphaGo Zero method
Connect4 bot
Connect4 results
Summary
References
Book summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Index
更新时间:2021-06-25 20:47:21