Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO tracker #2

Open
7 of 20 tasks
lffg opened this issue Feb 14, 2024 · 0 comments
Open
7 of 20 tasks

TODO tracker #2

lffg opened this issue Feb 14, 2024 · 0 comments

Comments

@lffg
Copy link
Owner

lffg commented Feb 14, 2024

  • Graceful shutdown stuff
    • Pass cancellation token to job's context (Pass cancellation token to job's context #3)
    • Propagate cancellation confirmation upwards the task hierarchy tree (4077065)
    • Study whether we should provide a wrapper type over tokio_utils' CancellationToken. Otherwise, if a downstream user who wants to implement graceful shutdowns would need to directly depend on tokio_utils. (4077065)
  • Improve error handling
    • External error type without external types associated to it (bab4625)
    • Internal error type (bab4625)
      • Study whether should we introduce a "level" field on the internal error
      • The actual problem is to decide which internal errors should be reported to the user. I think that we should only expose error related to job executions ("user-provided code")
    • Refactor the current lifecycle implementation to accommodate the new error types (fe6f7c4)
    • Decide on how to handle InternalErrors that arise from the job lifecycle implementation
      • Probably we should just expose internal errors through the error handler and let the user do something with them (e.g. log them and then probably file a bug report on fila's issue tracker). I can't see any other more sensible option.
    • Handle panics in job executors (ec68b69)
    • Apply timeout
    • Expose error handler
  • Create the Maintainer process tree (a Subscriber sub component)
    • Cleaner, to remove completed and cancelled jobs that are past a certain age
    • Rescuer, to recover jobs:
      • That are stuck on the processing state. This mostly happens when the node that was processing a job goes down while the job is executing, such that the job lifecycle implementation can't finish.
      • That are stuck on the available state. This happens when a job is published but there aren't any listening subscribers, which causes the Postgres notification to get lost.
    • Scheduler (not for now; see long term)
  • Do some load tests
  • Abstract over DB driver (this library should be compatible with both sqlx and tokio-postgres, under different feature-flags). Traits:
    • Executor, to execute queries
    • Transaction: Executor, to run a transaction
    • Pool
    • Listener
  • Testing stuff (under mod fila::test)
    • Test-friendly send API which also allows one to set the state
    • Create a "mock" implementation for the above database traits so that we may mock job publishing during tests without actually talking to the database
    • Create assertion helpers which also query the database state, e.g. to return one specific job status.
      • We'll probably have to make fila::send return an opaque JobId to identify it later.
  • Guide-level documentation
    • Do not forget to mention that the current architecture guarantees only at least once job execution. In the future we may introduce a TransactionalJob trait that receives the same transaction as the job lifecycle runner to ensure exactly once semantics.
  • Long-term
    • Job scheduling
      • Cron-style scheduling
      • Substitute the current default retry strategy (immediate) to use exponential backoff
@lffg lffg pinned this issue Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant