Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonization DateTime incorrect handling of naive time strings #2278

Closed
gethvi opened this issue Nov 22, 2022 · 2 comments · Fixed by #2279
Closed

Harmonization DateTime incorrect handling of naive time strings #2278

gethvi opened this issue Nov 22, 2022 · 2 comments · Fixed by #2279

Comments

@gethvi
Copy link
Contributor

gethvi commented Nov 22, 2022

Considering this function of DateTime class:

@staticmethod
def __parse(value: str) -> Optional[str]:
try:
DateTime.parse_utc_isoformat(value)
except ValueError:
pass
else:
return utils.decode(value)
try:
value = dateutil.parser.parse(value, fuzzy=True)
value = value.astimezone(pytz.utc)
value = value.isoformat()
except (ValueError, OverflowError):
return None
return utils.decode(value)

In particular the line:

value = value.astimezone(pytz.utc) 

This causes that naive time strings (without timezone information) are being considered as being in a timezone of the local computer. My computer is set to a UTC +01:00 timezone and this is the result:

DateTime.sanitize("2021-01-17 07:44:46")    # this calls DateTime.__parse

'2021-01-17T06:44:46+00:00'

Because the parsed timestamps are usually produced by feeds outside of the local computer I believe it makes much more sense to assume they are produced in UTC rather than processing them based on whatever is the local timezone of the IntelMQ installation.


On a side note: There are several cases of this kind of code:

if not value.tzinfo and sys.version_info <= (3, 6):
    value = pytz.utc.localize(value)
elif not value.tzinfo:
    value = value.astimezone(pytz.utc)

Such expressions are not equal in what they do. The first one (pytz.utc.localize) merely attaches a time zone object to a datetime without adjustment of date and time data. The second one (astimezone(pytz.utc)) adjusts time and date based on local time (docs). Example (with the computer in UTC +01:00):

value = datetime.datetime.strptime("2021-01-17 07:44:46", "%Y-%m-%d %H:%M:%S")
value.isoformat()
'2021-01-17T07:44:46'

pytz.utc.localize(value).isoformat()
'2021-01-17T07:44:46+00:00'

value.astimezone(pytz.utc).isoformat()
'2021-01-17T06:44:46+00:00'

I think this is also the reason why the following test works only when the machine running the test is in UTC timezone:

@unittest.skipIf(time.timezone != 0, 'Test only works with UTC')
def test_datetime_convert(self):
self.assertEqual('2019-07-01T15:15:15+00:00',
harmonization.DateTime.convert('15 15 15 07 01 2019',
format='from_format|%M %H %S %m %d %Y'))
self.assertEqual('2019-07-01T00:00:00+00:00',
harmonization.DateTime.convert('07-01-2019',
'from_format_midnight|%m-%d-%Y'))
self.assertEqual('2011-02-01T02:43:11.572760+00:00',
harmonization.DateTime.convert(129410017915727600,
'windows_nt'))

PROPOSED SOLUTION:
I think the better way of handling of naive time strings is to assume they are in UTC rather than local timezone of IntelMQ installation:

if not value.tzinfo:
    value = value.replace(tzinfo=datetime.timezone.utc)
else:
    value = value.astimezone(tz=datetime.timezone.utc)

The function value.replace(tzinfo=datetime.timezone.utc) does effectively the same thing as pytz.utc.localize(value): it attaches a time zone object to a datetime without adjustment of date and time data. And without needing pytz as dependency.

Quick tests showed that my solution seems to work and passes all the tests including the one previously skipped.

@gethvi
Copy link
Contributor Author

gethvi commented Nov 22, 2022

Also if we are phasing out Python 3.6 (#2272) some additional functions can be dropped. Such as DateTime.parse_utc_isoformat can be replaced by built in datetime.datetime.fromisoformat (introduced in 3.7).

gethvi added a commit to gethvi/intelmq that referenced this issue Nov 22, 2022
gethvi added a commit to gethvi/intelmq that referenced this issue Nov 22, 2022
gethvi added a commit to gethvi/intelmq that referenced this issue Nov 22, 2022
@kamil-certat
Copy link
Contributor

I cannot be sure, but I think it was the original intention - to assume the UTC. But as you mentioned, the astimezone isn't the right approach to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants