Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an evaluator that can unroll a model into an RL environment and get metrics #3

Open
AntreasAntoniou opened this issue May 7, 2024 · 3 comments
Assignees

Comments

@AntreasAntoniou
Copy link

Write an evaluator such that it can receive an environment, a model, and some seed etc, and then do unrolling and collect rewards etc.

@AntreasAntoniou AntreasAntoniou self-assigned this May 7, 2024
@AntreasAntoniou
Copy link
Author

Waiting for a template to be posted by either @AdamJelley or @trevormcinroe so I can get started.

@AdamJelley
Copy link

Hi @AntreasAntoniou, here's a simple eval function that you can use as a template:

@torch.no_grad()
def eval_actor(
    env: gym.Env, actor: Actor, device: str, n_episodes: int, seed: int
) -> np.ndarray:
    env.seed(seed)
    actor.eval()
    episode_rewards = []
    for _ in range(n_episodes):
        state, done = env.reset(), False
        episode_reward = 0.0
        while not done:
            action = actor.act(state, device)
            state, reward, done, _ = env.step(action)
            episode_reward += reward
        episode_rewards.append(episode_reward)

    actor.train()
    return np.array(episode_rewards)

It takes in an env and an actor (network that maps state->action). Hopefully pretty straightforward (not much has changed since 2019 here!).

The interesting part is probably device usage. The env is normally on the cpu and expects a np.ndarray action, but the actor is normally on gpu (unless you have a fancy jax/gpu env setup). Therefore you can either: 1) Move the actor to cpu at the beginning and then do everything on cpu, or 2) Move the state to gpu, do actor forward pass on gpu, and then move the action back to cpu at each timestep. For small networks often used in RL, 1) is often faster, but for larger networks as we'll likely be using we may need to do 2) as is done above (the act function is a wrapper around the forward pass that handles the transfer of state and action to and from gpu respectively). Hope that helps!

@trevormcinroe
Copy link

trevormcinroe commented May 9, 2024

Small nitpick @AdamJelley @AntreasAntoniou -- it might be good to pass seeds: Sequence[int] and then randomly select a starting seed at the top of the eval loop. Perhaps this could allow for a more robust eval of the agent.
Maybe like:

for _ in range(n_episodes):
        state, done = env.reset(seed=np.random.choice(seeds)), False

AFAIK, all envs can take a seed in .reset(), but not all actually use it. If that is the case here, then perhaps the random seed choice would go something like:

for _ in range(n_episodes):
        env.seed(np.random.choice(seeds))
        state, done = env.reset(), False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants