Functions | |
def | rendertrial (maxiter=100) |
Variables | |
float | DECAY_RATE = 0.99 |
env = DPendulum() | |
— Environment More... | |
list | h_rwd = [] |
float | LEARNING_RATE = 0.85 |
int | NEPISODES = 500 |
— Hyper paramaters More... | |
int | NSTEPS = 50 |
NU = env.nu | |
NX = env.nx | |
Q = np.zeros([env.nx,env.nu]) | |
Qref = reward+DECAY_RATE*np.max(Q[x2,:]) | |
RANDOM_SEED = int((time.time()%10)*1000) | |
— Random seed More... | |
reward | |
float | rsum = 0.0 |
u = np.argmax(Q[x,:] + np.random.randn(1,NU)/episode) | |
x = env.reset() | |
x2 | |
Example of Q-table learning with a simple discretized 1-pendulum environment.
def qtable.rendertrial | ( | maxiter = 100 | ) |
qtable.RANDOM_SEED = int((time.time()%10)*1000) |