Skip to content

Refactor: Use Async DB instead of Sync #817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

1yam
Copy link
Member

@1yam 1yam commented Jul 7, 2025

To bring back message parallelization we need to use AsyncDb instead of sync one

Related Clickup or Jira tickets : ALEPH-526

Self proofreading checklist

  • Is my code clear enough and well documented
  • Are my files well typed
  • New translations have been added or updated if new strings have been introduced in the frontend
  • Database migrations file are included
  • Are there enough tests
  • Documentation has been included (for new feature)

Changes

This pull request updates the database session handling across multiple files to use asynchronous methods and factories, improving compatibility with modern async workflows. The changes primarily involve replacing synchronous session handling with asynchronous equivalents and updating method calls accordingly.

Transition to Asynchronous Database Sessions:

  • Updated session factories and types: Replaced DbSessionFactory and DbSession with AsyncDbSessionFactory and AsyncDbSession across various modules (src/aleph/api_entrypoint.py, src/aleph/chains/bsc.py, src/aleph/chains/ethereum.py, src/aleph/chains/nuls2.py, src/aleph/chains/connector.py, src/aleph/chains/indexer_reader.py, src/aleph/chains/chain_data_service.py). [1] [2] [3] [4] [5] [6] [7]

  • Replaced synchronous session management: Updated methods to use async with for session handling instead of with, ensuring proper handling of asynchronous database operations. This change affects methods like add_pending_tx, upsert_chain_sync_status, and others (src/aleph/chains/chain_data_service.py, src/aleph/chains/ethereum.py, src/aleph/chains/indexer_reader.py). [1] [2] [3]

Migration of Database Operations to Async:

  • Async database operations: Replaced synchronous calls to database methods (e.g., session.commit(), upsert_file) with their asynchronous counterparts (await session.commit(), await upsert_file). This ensures compatibility with async workflows (src/aleph/chains/chain_data_service.py, src/aleph/chains/ethereum.py, src/aleph/chains/indexer_reader.py). [1] [2] [3]

  • Async helper methods: Updated helper methods like get_last_height, count_pending_txs, and get_unconfirmed_messages to use await for asynchronous execution (src/aleph/chains/ethereum.py, src/aleph/chains/indexer_reader.py). [1] [2] [3]

Consistency Across Chain Modules:

  • Standardized session handling: Applied the same async session handling approach across all chain modules (src/aleph/chains/bsc.py, src/aleph/chains/ethereum.py, src/aleph/chains/nuls2.py, src/aleph/chains/connector.py). This ensures uniformity in database interaction patterns. [1] [2] [3] [4]

These updates pave the way for better scalability and performance in asynchronous environments, aligning the codebase with modern best practices for database interaction.

Notes

This PR is the first parts of message parallelization, but this parts could have side effects intense testing will be needed

For some reason on local test in some case (rare case) it's gave me an error about psycopg2 but it's shouldn't be used since we use async engine that use asyncpg, not sure if it only on testing parts or if could happen in prod

@1yam 1yam requested review from amalcaraz and nesitor July 8, 2025 12:38
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make sense that @amalcaraz review this parts (mostly the update part)

@1yam 1yam force-pushed the 1yam-sync-db-to-async branch from f113cd1 to 64c1b98 Compare July 8, 2025 12:56
Copy link
Member

@nesitor nesitor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the bunch of comments I made, everything looks fine for me, but before merging it to main, we have to do a lot of testing on staging ensuring we don't create race conditions or other side effects.

Comment on lines +81 to 84
select(
AggregateDb.key,
AggregateDb.content,
AggregateDb.creation_datetime.label("created"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to ensure that this is the same equivalent

Comment on lines +97 to +101
select(AggregateDb.key, AggregateDb.content)
.filter(where_clause)
.order_by(AggregateDb.key)
)
result = query.all()
result = (await session.execute(query)).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment on lines +21 to +23
if dapp:
# For some reason asyncpg don't handle it if dapp is None
query = query.where(AlephBalanceDb.dapp == dapp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dapp:
# For some reason asyncpg don't handle it if dapp is None
query = query.where(AlephBalanceDb.dapp == dapp)
if dapp:
# For some reason asyncpg don't handle it if dapp is None
query = query.where(AlephBalanceDb.dapp == dapp)
else:
query = query.where(AlephBalanceDb.dapp is null)

This is not correct, as if dapp is None don't filter it, we need to equal to None or Null in DB.

Comment on lines +268 to +269
select_stmt += " WHERE address = ANY(:addresses)"
parameters = {"addresses": list(addresses)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is totally equivalent, maybe we need to maintain IN operator?


return {
column.name: getattr(self, column.name)
for column in self.__table__.columns
if column.name not in exclude_set
if column.name not in exclude_set and column.name not in insp.unloaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why added that condition?

@@ -49,9 +49,14 @@ def update_balances(session: DbSession, content: Mapping[str, Any]) -> None:
dapp = content.get("dapp")

LOGGER.info("Updating balances for %s (dapp: %s)", chain, dapp)
print(f"Chain type: {type(chain)}, Chain value: {chain.value}, Full chain: {chain}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print should be removed

Comment on lines +55 to +57
print(f"Number of balances to update: {len(balances)}")
for addr, bal in list(balances.items())[:3]: # Print first 3 for debug
print(f" {addr}: {bal} bal type {type(bal)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, these one as are debug messages I think

return None

# TODO: fix this pydanticV2 issue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's exactly the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants