Implement an evaluator that can unroll a model into an RL environment and get metrics #3

AntreasAntoniou · 2024-05-07T11:10:59Z

Write an evaluator such that it can receive an environment, a model, and some seed etc, and then do unrolling and collect rewards etc.

AntreasAntoniou · 2024-05-07T11:22:43Z

Waiting for a template to be posted by either @AdamJelley or @trevormcinroe so I can get started.

AdamJelley · 2024-05-08T09:48:50Z

Hi @AntreasAntoniou, here's a simple eval function that you can use as a template:

@torch.no_grad()
def eval_actor(
    env: gym.Env, actor: Actor, device: str, n_episodes: int, seed: int
) -> np.ndarray:
    env.seed(seed)
    actor.eval()
    episode_rewards = []
    for _ in range(n_episodes):
        state, done = env.reset(), False
        episode_reward = 0.0
        while not done:
            action = actor.act(state, device)
            state, reward, done, _ = env.step(action)
            episode_reward += reward
        episode_rewards.append(episode_reward)

    actor.train()
    return np.array(episode_rewards)

It takes in an env and an actor (network that maps state->action). Hopefully pretty straightforward (not much has changed since 2019 here!).

The interesting part is probably device usage. The env is normally on the cpu and expects a np.ndarray action, but the actor is normally on gpu (unless you have a fancy jax/gpu env setup). Therefore you can either: 1) Move the actor to cpu at the beginning and then do everything on cpu, or 2) Move the state to gpu, do actor forward pass on gpu, and then move the action back to cpu at each timestep. For small networks often used in RL, 1) is often faster, but for larger networks as we'll likely be using we may need to do 2) as is done above (the act function is a wrapper around the forward pass that handles the transfer of state and action to and from gpu respectively). Hope that helps!

trevormcinroe · 2024-05-09T08:56:53Z

Small nitpick @AdamJelley @AntreasAntoniou -- it might be good to pass seeds: Sequence[int] and then randomly select a starting seed at the top of the eval loop. Perhaps this could allow for a more robust eval of the agent.
Maybe like:

for _ in range(n_episodes):
        state, done = env.reset(seed=np.random.choice(seeds)), False

AFAIK, all envs can take a seed in .reset(), but not all actually use it. If that is the case here, then perhaps the random seed choice would go something like:

for _ in range(n_episodes):
        env.seed(np.random.choice(seeds))
        state, done = env.reset(), False

AntreasAntoniou self-assigned this May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an evaluator that can unroll a model into an RL environment and get metrics #3

Implement an evaluator that can unroll a model into an RL environment and get metrics #3

AntreasAntoniou commented May 7, 2024

AntreasAntoniou commented May 7, 2024

AdamJelley commented May 8, 2024

trevormcinroe commented May 9, 2024 •

edited

Loading

Implement an evaluator that can unroll a model into an RL environment and get metrics #3

Implement an evaluator that can unroll a model into an RL environment and get metrics #3

Comments

AntreasAntoniou commented May 7, 2024

AntreasAntoniou commented May 7, 2024

AdamJelley commented May 8, 2024

trevormcinroe commented May 9, 2024 • edited Loading

trevormcinroe commented May 9, 2024 •

edited

Loading