Functions | |
def | rendertrial (maxiter=100) |
Variables | |
float | DECAY_RATE = 0.99 |
env = DPendulum() | |
— Environment More... | |
list | h_rwd = [] |
float | LEARNING_RATE = 0.85 |
int | NEPISODES = 500 |
— Hyper paramaters More... | |
int | NSTEPS = 50 |
NU = env.nu | |
NX = env.nx | |
Q = np.zeros([env.nx, env.nu]) | |
float | Qref = reward + DECAY_RATE * np.max(Q[x2, :]) |
RANDOM_SEED = int((time.time() % 10) * 1000) | |
— Random seed More... | |
reward | |
float | rsum = 0.0 |
u | |
x = env.reset() | |
x2 | |
Example of Q-table learning with a simple discretized 1-pendulum environment.
def qtable.rendertrial | ( | maxiter = 100 | ) |
qtable.RANDOM_SEED = int((time.time() % 10) * 1000) |
qtable.u |