Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sentinel to mark dag as removed on reserialization #39825

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented May 24, 2024

When we pass a task over RPC call we can't include the dag as part of the task without running into recursion trouble. So we just "elide" the dag from the task object, require that it be passed separately and then get reattatched to the task after deserialization. We introduce a sentinel so that way we can raise a helpful error if someone tries to access the dag attr after it's been elided. Additionally, the sentinel lets us short circuit some logic in the task.dag setter that is not helpful.

The motivator for this PR is avoiding conditional code like this:

    # when taking task over RPC, we need to add the dag back
    if isinstance(task, MappedOperator):
        if not task.dag:
            task.dag = dag
    elif not task._dag:
        task._dag = dag

And the idea for this approach came from a discussion with @uranusjr .

The addition of method get_relevant_upstream_map_indexes to the pydantic TI model is just a driveby addition and i can split it out if necessary.

@dstandish
Copy link
Contributor Author

@uranusjr here's the task.dag sentinel PR

@dstandish dstandish requested a review from jscheffl May 24, 2024 21:23
@dstandish dstandish marked this pull request as draft May 24, 2024 21:24
@dstandish dstandish force-pushed the use-sentinel-to-elide-the-dag-object-on-reserialization branch from 27f2623 to 4a1152c Compare May 24, 2024 21:25
@dstandish
Copy link
Contributor Author

darn, realized that this needs to be merged first: #39259

(i can't cherry pick the sentinel bit into independent PR just yet)

@dstandish dstandish force-pushed the use-sentinel-to-elide-the-dag-object-on-reserialization branch from 4a1152c to e3518a4 Compare May 30, 2024 18:53
@dstandish dstandish force-pushed the use-sentinel-to-elide-the-dag-object-on-reserialization branch from 67fece5 to 1284974 Compare June 28, 2024 16:19
@dstandish dstandish marked this pull request as ready for review June 28, 2024 17:00
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me from general approach but still one Pytest failing.

We don't serialize the dag on the task.dag attr when making RPC calls.  By marking it with a sentinel value, we can add understand when we're dealing with a deserialized object, and then re-set the dag attr while skipping some of the extra code applied in the setter.
@dstandish dstandish force-pushed the use-sentinel-to-elide-the-dag-object-on-reserialization branch from 70a5331 to 4aa024e Compare July 8, 2024 19:38
@dstandish
Copy link
Contributor Author

ok pushed some test fixes let's see 🤞

@dstandish
Copy link
Contributor Author

green @jscheffl

@jscheffl
Copy link
Contributor

jscheffl commented Jul 9, 2024

For me this looks all reasonable and green. After the term change (I dislike Elided as well :-D) I would rate for LGTM.

@dstandish dstandish changed the title Use sentinel to elide the dag object on reserialization Use sentinel to mark dag as removed on reserialization Jul 10, 2024
@dstandish
Copy link
Contributor Author

@potiuk let us know if this gets your blessing, when you have a moment.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. May it be merged.

@dstandish dstandish merged commit 7d7a4cd into apache:main Jul 10, 2024
51 of 52 checks passed
@dstandish dstandish deleted the use-sentinel-to-elide-the-dag-object-on-reserialization branch July 10, 2024 22:23
@utkarsharma2 utkarsharma2 added this to the Airflow 2.10.0 milestone Jul 12, 2024
@utkarsharma2 utkarsharma2 added the type:new-feature Changelog: New Features label Jul 12, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
We don't serialize the dag on the task.dag attr when making RPC calls.  By marking it with a sentinel value, we can add understand when we're dealing with a deserialized object, and then re-set the dag attr while skipping some of the extra code applied in the setter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants