PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
reinforcement-learning
deep-learning
deep-reinforcement-learning
pytorch
atari
hessian
second-order
continuous-control
actor-critic
ale
mujoco
proximal-policy-optimization
ppo
advantage-actor-critic
a2c
acktr
natural-gradients
roboschool
kfac
kronecker-factored-approximation
-
Updated
Mar 3, 2020 - Python
The following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an obs