| Functions | |
| def | rendertrial (maxiter=100) | 
| Variables | |
| float | DECAY_RATE = 0.99 | 
| env = DPendulum() | |
| — Environment  More... | |
| list | h_rwd = [] | 
| float | LEARNING_RATE = 0.85 | 
| int | NEPISODES = 500 | 
| — Hyper paramaters  More... | |
| int | NSTEPS = 50 | 
| NU = env.nu | |
| NX = env.nx | |
| Q = np.zeros([env.nx,env.nu]) | |
| float | Qref = reward + DECAY_RATE*np.max(Q[x2,:]) | 
| RANDOM_SEED = int((time.time()%10)*1000) | |
| — Random seed  More... | |
| reward | |
| float | rsum = 0.0 | 
| u = np.argmax(Q[x,:] + np.random.randn(1,NU)/episode) | |
| x = env.reset() | |
| x2 | |
Example of Q-table learning with a simple discretized 1-pendulum environment.
| def qtable.rendertrial | ( | maxiter = 100 | ) | 
| qtable.RANDOM_SEED = int((time.time()%10)*1000) |