-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect data_interval_start due to scheduling time change #21715
Conversation
Should we call |
195d738
to
abac9fe
Compare
Thank you for your review, I made changes. Looking forward to your new comment 😄
|
airflow/timetables/interval.py
Outdated
def _prev_align(self, current: DateTime) -> DateTime: | ||
"""Get the prev scheduled time. | ||
|
||
This is ``current - interval``, unless ``current`` falls right on the | ||
interval boundary, when ``current`` is returned. | ||
""" | ||
prev_time = self._get_prev(current) | ||
if self._get_next(prev_time) != current: | ||
return prev_time | ||
return current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be always the same as self._get_prev(self._get_next(current))
?
Say a CRON schedule runs one every midnight. For this implementation:
- If
current
is at midnight, the return value is at midnight - If
current
is on 1am, the return value is at midnight
For self._get_prev(self._get_next(current))
:
- If
current
is at midnight,_get_next
returns the next midnight, and_get_prev
returns this midnight - If
current
is at 1am,_get_next
returns the next midnight, and_get_prev
returns this midnight
And for the timedelta interval, self._get_prev(self._get_next(current))
is always current
.
So if I’m not mistaken, this function is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are different ha
For self._prev_align()
- If current is at 1am, the return value is at midnight
For self._align()
- If current is at 1am, the return value is next midnight.
It is correct that data_schedule_start
is equal to the data_schedule_end
of the previous one when no scheduling time changes. However, when the scheduling time changes, The first data_schedule_start
for a new schedule may not be the same as the data_schedule_end
before the schedule was changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the breakdown. From this analysis it seems like a _prev_align
is unavoidable. However, perhaps we could rename the functions to descibe things better. Maybe we could rename _align
to _align_to_next
, and name this new function _align_to_prev
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we need to document pretty thoroughly why we need both functions.
Hi, do you plan to work on this PR? |
I'm very sorry, I forgot about it. |
6813963
to
0bcca55
Compare
This PR fixes the incorrect first run of data_interval_start after
changing the scheduling time.
New version of the timetables issue
eg:
At first I had a scheduling time of 30 * * * * *
After running it twice, I changed it to 32 * * * *
The actual first run time after the change is fine, but because the data interval starts from the end of the previous interval, it has already been determined and not refreshed, so the displayed execution_date is incorrect. This can cause problems with Cross-dag dependencies.
Correct:
My current solution is to add
start = self._get_prev(end)
belowend = self._get_next(start)
to reset data_interval_start.I'm not sure if this is appropriate 😃
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.