Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding SugarScape IG (polars with loops) #71

Merged

Conversation

adamamer20
Copy link
Collaborator

@adamamer20 adamamer20 commented Aug 21, 2024

This PR adds the Sugarscape IG implementation with AgentSetPolars and GridPolars.

@rht: Sorry but I made a mistake with the last performance plot. It measured allocation/setup but not the actual running of the model. The good news is that mesa-frames with Polars still scales better than native Python, as can be seen from the plot, though not as well as flame2gpu (with the CPU implementation, that is - it simply doesn't scale as well).
The implementation can still be made faster with some simple tweaks (e.g., lazy query and GridPolars implementation with pl.ranges), but I think there's not much time to optimize it further before GSoC ends.
The bad news is that pandas scales horribly, and this probably has to do with DFs being the size of RAM or larger (I am doing some profiling right now to understand), which pandas really struggles to handle. I am thinking of a solution (e.g., using Dask, which should have the same API), but I will keep you posted.

@adamamer20 adamamer20 linked an issue Aug 21, 2024 that may be closed by this pull request
Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

@adamamer20 adamamer20 self-assigned this Aug 21, 2024
@adamamer20 adamamer20 added the examples additions or modifications to sample models label Aug 21, 2024
@adamamer20 adamamer20 requested a review from rht August 21, 2024 15:25
@adamamer20 adamamer20 added this to the 0.1.0 Alpha Release milestone Aug 21, 2024
@rht
Copy link
Contributor

rht commented Aug 22, 2024

The bad news is that pandas scales horribly, and this probably has to do with DFs being the size of RAM or larger (I am doing some profiling right now to understand), which pandas really struggles to handle.

Can I take a look at the current pandas result?

@adamamer20
Copy link
Collaborator Author

The bad news is that pandas scales horribly, and this probably has to do with DFs being the size of RAM or larger (I am doing some profiling right now to understand), which pandas really struggles to handle.

Can I take a look at the current pandas result?

Sorry, been working on API docs. Tomorrow I will send you the graphs.

@adamamer20
Copy link
Collaborator Author

adamamer20 commented Aug 25, 2024

The bad news is that pandas scales horribly, and this probably has to do with DFs being the size of RAM or larger (I am doing some profiling right now to understand), which pandas really struggles to handle.

Can I take a look at the current pandas result?

These are current timings

Execution times:
---------------
mesa:
  Number of agents: 1, Time: 0.00 seconds
  Number of agents: 10001, Time: 6.00 seconds
  Number of agents: 20001, Time: 12.42 seconds
  Number of agents: 30001, Time: 19.06 seconds
  Number of agents: 40001, Time: 26.48 seconds
  Number of agents: 50001, Time: 33.85 seconds
---------------
---------------
mesa-frames (pd concise):
  Number of agents: 1, Time: 0.09 seconds
  Number of agents: 10001, Time: 15.23 seconds
  Number of agents: 20001, Time: 29.51 seconds
  Number of agents: 30001, Time: 45.61 seconds
  Number of agents: 40001, Time: 59.79 seconds
  Number of agents: 50001, Time: 79.37 seconds
---------------
---------------
mesa-frames (pl concise):
  Number of agents: 1, Time: 0.51 seconds
  Number of agents: 10001, Time: 2.28 seconds
  Number of agents: 20001, Time: 2.92 seconds
  Number of agents: 30001, Time: 3.56 seconds
  Number of agents: 40001, Time: 4.23 seconds
  Number of agents: 50001, Time: 5.31 seconds
---------------

Most of pandas' time is spent on merge operations. Profiling information on speed and memory usage here
Interestingly, memory usage isn't a problem.
I tried speeding up a bit merge operations with tips here, but the performance improvement was subtle.
One potential optimization: we could store agent positions and cells in the same DF. This would let us skip a merge operation when getting property cells (self.space.cells), though it might use a bit more memory if there are multiple agents per cell.
I will now try:

  1. Performance with Dask
  2. Performance with cuDF / Dask-cuDF
  3. Performance with a unified agents/cells dataframe in space

@adamamer20
Copy link
Collaborator Author

adamamer20 commented Aug 26, 2024

UPDATE:

  1. Dask and Dask-cuDF require significant refactoring. It can be added as a backend in the future but it's not a possible solution right now.
  2. cuDF does not have the combine_first method (so we need to reimplement it with other cuDF operations or it will fallback to pandas). Most of the time in cuDF is also spent on merge operations. I need to dig deeper because it seems weird that merge requires so much time, even more on GPU where it should be faster.
  3. This is the best approach right now: it does not require a significant refactoring and should speed up operations. I'm working on it in unique-agents-cells-space

@rht
Copy link
Contributor

rht commented Aug 27, 2024

Dask and Dask-cuDF require significant refactoring. It can be added as a backend in the future but it's not a possible solution right now.

Will the refactoring complicate the implementation, or is it mainly replacing the ops to the ones supported by Dask/Dask-cuDF?

@adamamer20
Copy link
Collaborator Author

Dask and Dask-cuDF require significant refactoring. It can be added as a backend in the future but it's not a possible solution right now.

Will the refactoring complicate the implementation, or is it mainly replacing the ops to the ones supported by Dask/Dask-cuDF?

Just a matter of implementing operations with DaskMixin, implementation should stay the same.

@adamamer20
Copy link
Collaborator Author

I will merge this PR and open a dedicate issue for Pandas speedup (and DaskMixin).

… 65-sugarscape-instantaneous-growback-polars-with-loops
@adamamer20 adamamer20 merged commit 98af5c8 into main Aug 28, 2024
7 checks passed
@adamamer20 adamamer20 deleted the 65-sugarscape-instantaneous-growback-polars-with-loops branch August 28, 2024 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples additions or modifications to sample models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SugarScape Instantaneous Growback (Polars-With-Loops)
2 participants