Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle hard deletes that happened while Stitch wasn't running? #813

Open
vitorbaptista opened this issue Sep 14, 2022 · 1 comment

Comments

@vitorbaptista
Copy link

I have a MySQL RDS database that I track using Log-based Incremental, saving the results to a S3 bucket. It's working well, but a few times I reached the max number of rows replicated during a period, and Stitch stopped replicating for a period. During this period, some rows were hard deleted from the source MySQL RDS database. When Stitch was back up (after I upgraded my plan), it resumed replicating the data, but it didn't catch those hard deletions. In the end, these rows never get a _sdc_deleted_at column, even though they were deleted in the source DB.

How can I handle this? If I reset the table, will Stitch understand that some rows were deleted and add the _sdc_deleted_at? Or is there another way?

@vitorbaptista
Copy link
Author

I tried resetting the table, but Stitch was unable to see the deleted rows. I ended up building a process where I check all IDs in my data warehouse and my source DB, and then delete the ones that don't appear in my source DB. It's very hackish, and I'd love to hear about a better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants