-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid ISO 8601 in example (and it probably shouldn't parse) #38
Comments
Thanks for reporting! I agree - we should disallow these or split this style off as a "best guess"-style parser. |
Just to make sure I'm understanding correctly. According to this spec, the parser should treat times like this:
Is that your interpretation as well? |
My understanding of the spec has one difference relative to what you wrote:
The time zone in that is not basic, as I understand it. The basic version would not include a colon in the time zone:
|
Sounds good! I'll take a look and see how much effort this would take to enforce. If it's relatively straightforward I'll try to knock it out today. |
I think that it would require making two versions of the ISO 8601 regexp here, one for basic and one for extended. That would then necessitate either giving an option to choose basic or extended to When I've tried to put it all into one regexp, the regexp was big and effectively unusable: tidyverse/lubridate#867 (comment) And, I don't think that it's worth going down the path of a parser. (Having written one for ISO 8601, it is likely more complex than the value here.) Lines 5 to 35 in fad29b7
|
I think the easiest way would just be to permit the invalid forms in the regex, but enforce the restrictions once the regex capture groups have been converted into a matrix: There are more cases to consider, but just to communicate the simplest form of the idea, it would happen in post-processing like: m[is.na(m[,"dash"]) != is.na(m[,"colon"]),] <- NA |
I think that logic allow dates to look like these (which are also invalid): |
I'll have to test, but I don't think so. The backreferences to the earlier To handle it all in regex, it looks like it could use a conditional capture group. I've never used this regex behavior before, so I'm inclined to go with the R-side approach to not make the regex grow into a completely unintelligible mess with tricky-to-debug performance implications. |
Overall, as long as we have an accurate, maintainable solution, I'm good with it. I haven't used backreferences like that before (and I've minimally used backreferences overall), so as long as it works, great! |
There are several examples in the code where the character string
"20150101T08:35:32.123+05:30"
is used as an example of condensed ISO 8601. According to Wikipedia, this is not valid ISO 8601 because both the date and time parts must use the extended format (https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).Locations where I found it are:
parttime/README.Rmd
Line 153 in 28a9496
parttime/R/class_partial_time_coercion.R
Line 99 in 28a9496
parttime/tests/testthat/setup_test_data.R
Line 16 in 74282db
I think that it would be useful if the default regexp only allowed for one or the other of basic or extended forms. (In reality and for simplicity, that would be better as two regexps.)
The text was updated successfully, but these errors were encountered: