Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TimeFormat option for output/csv with 'unix' (default) and 'rfc3399' options #2274

Merged

Conversation

rpocklin
Copy link
Contributor

@rpocklin rpocklin commented Dec 1, 2021

Simple PR to increase time unit granularity with CSV file output to be more in-line with JSON file output.

file.csv

Before:
http_req_receiving,1638331259,...

After:
http_req_receiving,1638331421334748,...

@CLAassistant
Copy link

CLAassistant commented Dec 1, 2021

CLA assistant check
All committers have signed the CLA.

na--
na-- previously requested changes Dec 1, 2021
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good catch and it probably should have been the default. Unfortunately, now that the csv output has been released for quite a few versions already, we can't simply make a breaking change like this... 😞 It will probably disrupt the workflows of everyone that uses it.

Please add a new option to the output that controls this behavior, with 1-second Unix timestamps being the default and microsecond ones having to be explicitly enabled.

@codebien
Copy link
Contributor

codebien commented Dec 1, 2021

Please add a new option to the output that controls this behavior, with 1-second Unix timestamps being the default and microsecond ones having to be explicitly enabled.

@na-- Considering that Go has Unix{Milli | Micro | Nano} I guess we can expand the set of accepted values?

@na--
Copy link
Member

na-- commented Dec 1, 2021

milliseconds and microseconds sure, nanoseconds are probably not very useful 🤷‍♂️

Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if it could be useful to create a type Precision that should wrap this concept, Duration doesn't cover it very well and defining a new set of strings anytime could be not the best, it could be added during #883 implementation.

@yorugac
Copy link
Contributor

yorugac commented Dec 10, 2021

Slightly connected to the topic of configurable precision. Values of all builtin durations are all scaled up to milliseconds right now:

k6/stats/units.go

Lines 27 to 33 in ee8d9bd

const timeUnit = time.Millisecond
// D formats a duration for emission.
// The reverse of D() is ToD().
func D(d time.Duration) float64 {
return float64(d) / float64(timeUnit)
}

IIRC, summary relies on floats to scale down to microseconds, i.e. 0.5ms = 500µs, etc.
Similarly, custom metrics seem to assume milliseconds as values by default (when isTime = true). I guess for most cases it is fine as is but if there'll be a new option, it might make sense to extend it to values too.

@rpocklin
Copy link
Contributor Author

Hi,

From what i'm hearing it would be good to be able to specify a time granularity option (eg. --granularity msec | milli | micro | nano) which changes the timestamp granularity consistently (eg. in summary report, CSV and JSON outputs).

This is a bit beyond this current PR and my current Go expertise but would really like to see this feature if possible.

I am open to working on this if you can show me a good place for the CLI options and how to pass them into the output parsers.

@na--
Copy link
Member

na-- commented Dec 13, 2021

I'm sorry if I wasn't clear, but I don't think we should touch the global options for this change. Instead, this should be just an option for the CSV output, and only this file needs to be changed to add it: https://github.com/grafana/k6/blob/master/output/csv/config.go

@rpocklin rpocklin force-pushed the improve-time-unit-granularity-for-csv-output branch 3 times, most recently from 6b03bbb to 75d8c05 Compare March 10, 2022 03:54
@rpocklin rpocklin requested a review from na-- March 16, 2022 04:21
@rpocklin
Copy link
Contributor Author

rpocklin commented Mar 16, 2022

I'm sorry if I wasn't clear, but I don't think we should touch the global options for this change. Instead, this should be just an option for the CSV output, and only this file needs to be changed to add it: https://github.com/grafana/k6/blob/master/output/csv/config.go

@na-- I have updated the PR making CSV output flag useISO8601 (which is opt-in) and sets the time granularity to micro seconds (same as JSON formatter). This does not affect any metric values mentioned by @yorugac but that could be done separately as a separate PR.

I have included unit tests, also to use this feature simply run k6 with the following command-line argument:
--out csv=fileName=results.csv,useISO8601=true

The default behaviour is unchanged as requested, so it will be backwards compatible.

The CSV output will be as follows:

Before
http_reqs,1646885003,

After
http_reqs,2022-03-10T17:42:12.586544+11:00,

@mstoykov
Copy link
Contributor

My two cents: It might be worth taking into consideration that iso8601 timestamps are a lot more cpu intensive to marshal than pure integers.

I would argue that given that in most cases those files will be read by computers not humans, the performance likely should take precedence and we should just be dumping ms/ns as integers instead of anything else.

@na--
Copy link
Member

na-- commented Mar 16, 2022

👍 at the very least, these should be separate options 🤔 One can be called timeUnit or granularity or something like that, with time.Duration values, the default being 1s. Then you can have a separate timeFormat option with values like timestamp (the default), iso8601, etc.

@rpocklin
Copy link
Contributor Author

rpocklin commented Mar 16, 2022

My two cents: It might be worth taking into consideration that iso8601 timestamps are a lot more cpu intensive to marshal than pure integers.

I would argue that given that in most cases those files will be read by computers not humans, the performance likely should take precedence and we should just be dumping ms/ns as integers instead of anything else.

The original goal of this PR was to allow these result logs to be analysed under second-level granularity, this PR (when running k6 with the new flag) simply aligns the CSV time output format with JSON, which already uses ISO 8601 time format by default and micro-second time granularity :)

@rpocklin
Copy link
Contributor Author

rpocklin commented Mar 16, 2022

From what i'm hearing it would be good to be able to specify a time granularity option (eg. --granularity msec | milli | micro | nano) which changes the timestamp granularity consistently (eg. in summary report, CSV and JSON outputs).

👍 at the very least, these should be separate options 🤔 One can be called timeUnit or granularity or something like that, with time.Duration values, the default being 1s. Then you can have a separate timeFormat option with values like timestamp (the default), iso8601, etc.

The main goal of this PR is to align the output formats between CSV and JSON, in both time granularity and date format. They could be separated for more customisation and happy to do another PR for that in the future but right now i'm looking at a sensible step forward to solve our immediate issue in parsing these logs consistently between CSV and JSON file formats and visualising in graphs - second-level time granularity isn't acceptable.

At the end of the day the flag is opt-in (as you requested it to be a non-breaking change) and this makes it consistent with the JSON output format (which is what anyone using this library would expect, and a good step forward for the library).

I'd suggest any customisations for time precisions should be considered as a global option so you don't have to implement different flags across the output writers.

If you could review this again and run a test comparing CSV and JSON outputs (before vs. after with the flag) it may help you understand my point a little better.

@rpocklin rpocklin changed the title Change time unit granularity in csv output from seconds to microseconds Change time unit granularity in CSV output from seconds to microseconds and align to JSON ISO 8601 time output Mar 16, 2022
@rpocklin rpocklin force-pushed the improve-time-unit-granularity-for-csv-output branch from 75d8c05 to 8dfc03d Compare March 16, 2022 09:40
@na--
Copy link
Member

na-- commented Apr 1, 2022

Sorry for the delay in responding, @rpocklin. I understand and mostly agree with your desire to unify the CSV and JSON outputs, it makes a lot of sense, but I disagree with the way you propose to go about doing it.

I'd still prefer to add 2 separate options to the csv output:

  • timePrecision - 1s (current value and future default), 1ms, etc.
  • timeFormat - timestamp (current value and future default), iso8601, etc.

I am fine with merging a PR that adds both of these options, as well as merging a PR that only adds one of them. However, I don't think we should merge a PR for a single new option that conflates the two. That will be harder to maintain and reason about in the long term.

@na-- na-- removed their request for review April 5, 2022 12:07
@rpocklin rpocklin force-pushed the improve-time-unit-granularity-for-csv-output branch 3 times, most recently from 2685ae8 to 05301b4 Compare April 7, 2022 02:32
@rpocklin rpocklin changed the title Change time unit granularity in CSV output from seconds to microseconds and align to JSON ISO 8601 time output Add TimeFormat option for output/csv with 'unix' (default) and 'rfc3399' options Apr 7, 2022
@rpocklin rpocklin requested a review from na-- April 7, 2022 02:46
@rpocklin
Copy link
Contributor Author

rpocklin commented Apr 7, 2022

Sorry for the delay in responding, @rpocklin. I understand and mostly agree with your desire to unify the CSV and JSON outputs, it makes a lot of sense, but I disagree with the way you propose to go about doing it.

I'd still prefer to add 2 separate options to the csv output:

  • timePrecision - 1s (current value and future default), 1ms, etc.
  • timeFormat - timestamp (current value and future default), iso8601, etc.

I am fine with merging a PR that adds both of these options, as well as merging a PR that only adds one of them. However, I don't think we should merge a PR for a single new option that conflates the two. That will be harder to maintain and reason about in the long term.

Thanks @na-- I have re-worked this PR based on this approach, such that now there is a CSV output option:
timeFormat = [ 'unix' | 'rfc3399' ]

I'd appreciate feedback on the custom enum type TimeFormat in output/csv/consts.go.

Once compiled, you can run with this flag set as follows:

./k6 --out csv=fileName=results.csv,timeFormat=rfc3399 script.js

As requested, if you run it without timeFormat defined, or with it set to timeFormat=unix you will get the default (current) behaviour.

@codebien codebien self-requested a review April 11, 2022 10:19
Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rpocklin,
sorry for the late response. Thanks for your effort! 🙏

It seems you have pushed two unexpected files (go-bin-rpm-amd64.deb and 1), can you remove them, please?

output/csv/consts.go Outdated Show resolved Hide resolved
Comment on lines +34 to +19
func (timeFormat TimeFormat) IsValid() bool {
switch timeFormat {
case Unix, RFC3399:
return true
}
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this, it could be directly on the parser.

Copy link
Contributor Author

@rpocklin rpocklin May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed to be idiomatic for Go to have isValid() methods on types, and allows for a clean extension of TimeFormat types in the future.

Copy link
Contributor

@codebien codebien May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a timeFormat exists it must be valid, for that I prefer to maintain this logic on the parsing side so we can keep it centralized. @mstoykov opinion?

I think we should move this check in the output constructor so we can check the consolidated value and we can validate also values from json and env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can make a suggestion that might help illustrate what you mean?

Copy link
Contributor

@codebien codebien May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it directly on the right line so it should be easier to explain #2274 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For enums in general we use enumer as for example with

//go:generate enumer -type=CompatibilityMode -transform=snake -trimprefix CompatibilityMode -output compatibility_mode_gen.go
type CompatibilityMode uint8

Which generates https://github.com/grafana/k6/blob/1679c6843c7a5578b25aaba228d98476926467c2/lib/compatibility_mode_gen.go and is then used

if cm, err = CompatibilityModeString(val); err != nil {
var compatValues []string
for _, v := range CompatibilityModeValues() {
compatValues = append(compatValues, v.String())
}
err = fmt.Errorf(`invalid compatibility mode "%s". Use: "%s"`,
val, strings.Join(compatValues, `", "`))
}

Copy link
Contributor Author

@rpocklin rpocklin Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For enums in general we use enumer as for example with

//go:generate enumer -type=CompatibilityMode -transform=snake -trimprefix CompatibilityMode -output compatibility_mode_gen.go
type CompatibilityMode uint8

Which generates https://github.com/grafana/k6/blob/1679c6843c7a5578b25aaba228d98476926467c2/lib/compatibility_mode_gen.go and is then used

if cm, err = CompatibilityModeString(val); err != nil {
var compatValues []string
for _, v := range CompatibilityModeValues() {
compatValues = append(compatValues, v.String())
}
err = fmt.Errorf(`invalid compatibility mode "%s". Use: "%s"`,
val, strings.Join(compatValues, `", "`))
}

@mstoykov I spent a long time learning enumer and how it works for Go. I ended up with the follow error:
enumer: can't handle non-integer constant type TimeFormat

From what I can understand, this would only work on integer-based enums, whereas in this case we need to preserve the string literals as they are passed in by the user (via the command line) and compared to the matching TimeFormat const.

Let me know if this is still possible with your suggestion, otherwise i'm at a loss how to implement what you are suggesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpocklin yes, as you noticed, enumer is integer-based but it shouldn't be an issue. As you can see from the code posted above, enumer generates a function for mapping the type from a string. In the linked code is if cm, err = CompatibilityModeString(val); err != nil { where in case of error it means that a value hasn't been found.

Let me know if it helps.

output/csv/output.go Outdated Show resolved Hide resolved
output/csv/output.go Outdated Show resolved Hide resolved
output/csv/output.go Outdated Show resolved Hide resolved
@na-- na-- added this to the v0.39.0 milestone Apr 26, 2022
@na-- na-- removed their request for review April 26, 2022 07:09
@na-- na-- dismissed their stale review April 26, 2022 07:12

LGTM except for the comments already stated by @codebien (especially the one about go-bin-rpm-amd64.deb!), but I'll likely be on vacation for the next review cycle so someone else needs to take over.

@na-- na-- requested a review from mstoykov April 26, 2022 07:12
@codebien
Copy link
Contributor

Hi @rpocklin, we would like to get your contribution merged during the current cycle so we will have it in the next release. Do you plan to address the request changes? We will do our best to support you as fast as we can.

@rpocklin
Copy link
Contributor Author

Hi @rpocklin, we would like to get your contribution merged during the current cycle so we will have it in the next release. Do you plan to address the request changes? We will do our best to support you as fast as we can.

Thanks for the nudge, i'll get this done in the next day or two :)

@rpocklin
Copy link
Contributor Author

Hi @rpocklin, sorry for the late response. Thanks for your effort! 🙏

It seems you have pushed two unexpected files (go-bin-rpm-amd64.deb and 1), can you remove them, please?

Yep will remove these!

@rpocklin rpocklin force-pushed the improve-time-unit-granularity-for-csv-output branch from 05301b4 to ab9eedf Compare May 26, 2022 12:39
@@ -97,6 +100,7 @@ func newOutput(params output.Params) (*Output, error) {
csvWriter: stdoutWriter,
row: make([]string, 3+len(resTags)+1),
saveInterval: saveInterval,
timeFormat: timeFormat,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
timeFormat: timeFormat,
timeFormat: config.TimeFormat,

It seems we can avoid allocating the timeFormat variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, nice tip will remove.

@@ -87,6 +89,7 @@ func newOutput(params output.Params) (*Output, error) {

saveInterval := config.SaveInterval.TimeDuration()
fname := config.FileName.String
timeFormat := config.TimeFormat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
timeFormat := config.TimeFormat
if config.TimeFormat != Unix && config.TimeFormat != RFC3339 {
return fmt.Errorf("the timeFormat provided is not supported")
}

Copy link
Contributor Author

@rpocklin rpocklin Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this continue to grow into a long and unmaintainable if statement as we add more time formats?

This was the reason I made an isValid() method is so that it isolates the validity test from the parser itself, and keeps the validation and type definitions within the same file output/csv/consts.go. It's also easier to unit test as-is.

@rpocklin rpocklin force-pushed the improve-time-unit-granularity-for-csv-output branch from ab9eedf to b85b662 Compare June 1, 2022 09:31
@codecov-commenter
Copy link

codecov-commenter commented Jun 1, 2022

Codecov Report

Merging #2274 (ab9eedf) into master (42f2e60) will increase coverage by 2.42%.
The diff coverage is 65.72%.

❗ Current head ab9eedf differs from pull request most recent head d630eff. Consider uploading reports for the commit d630eff to get more accurate results

@@            Coverage Diff             @@
##           master    #2274      +/-   ##
==========================================
+ Coverage   72.71%   75.14%   +2.42%     
==========================================
  Files         184      204      +20     
  Lines       14571    16242    +1671     
==========================================
+ Hits        10596    12205    +1609     
+ Misses       3333     3275      -58     
- Partials      642      762     +120     
Flag Coverage Δ
ubuntu 75.08% <65.72%> (+2.43%) ⬆️
windows 74.79% <65.49%> (+2.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
api/common/context.go 100.00% <ø> (ø)
api/v1/client/client.go 0.00% <0.00%> (ø)
api/v1/client/metrics.go 0.00% <0.00%> (ø)
api/v1/client/status.go 0.00% <0.00%> (ø)
api/v1/group.go 54.05% <ø> (-35.01%) ⬇️
api/v1/metric.go 54.54% <ø> (-45.46%) ⬇️
api/v1/routes.go 55.31% <ø> (ø)
api/v1/status.go 100.00% <ø> (ø)
cmd/login_cloud.go 19.76% <0.00%> (+19.76%) ⬆️
cmd/login_influxdb.go 11.66% <0.00%> (+11.66%) ⬆️
... and 249 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ece4633...d630eff. Read the comment docs.

output/csv/output.go Outdated Show resolved Hide resolved
output/csv/consts.go Outdated Show resolved Hide resolved
Comment on lines +34 to +19
func (timeFormat TimeFormat) IsValid() bool {
switch timeFormat {
case Unix, RFC3399:
return true
}
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For enums in general we use enumer as for example with

//go:generate enumer -type=CompatibilityMode -transform=snake -trimprefix CompatibilityMode -output compatibility_mode_gen.go
type CompatibilityMode uint8

Which generates https://github.com/grafana/k6/blob/1679c6843c7a5578b25aaba228d98476926467c2/lib/compatibility_mode_gen.go and is then used

if cm, err = CompatibilityModeString(val); err != nil {
var compatValues []string
for _, v := range CompatibilityModeValues() {
compatValues = append(compatValues, v.String())
}
err = fmt.Errorf(`invalid compatibility mode "%s". Use: "%s"`,
val, strings.Join(compatValues, `", "`))
}

@rpocklin rpocklin force-pushed the improve-time-unit-granularity-for-csv-output branch from b85b662 to 2fc1f3f Compare June 6, 2022 09:22
@rpocklin rpocklin force-pushed the improve-time-unit-granularity-for-csv-output branch from 2fc1f3f to d630eff Compare June 6, 2022 09:27
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for working on this @rpocklin 🙇

@mstoykov mstoykov merged commit 27f2ead into grafana:master Jun 8, 2022
@mstoykov
Copy link
Contributor

Just a note here that I did some fixes in #2560 and noticed that we haven't noticed that the rfc is with three 3 not with two and two 9.

This has been fixed, but if you are using it locally @rpocklin, just a heads-up that the option is also named correctly now.

Thanks again for the working on this 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants