Feature/2108 - csv parser #4439

maxunt · 2018-07-19T18:21:56Z

closes: #2108

Required for all PRs:

Signed CLA.
Associated README.md updated.
Has appropriate unit tests.

glinton · 2018-07-26T14:14:03Z

plugins/parsers/csv/parser.go

+	if p.Delimiter != "" {
+		runeStr := []rune(p.Delimiter)
+		if len(runeStr) > 1 {
+			return nil, fmt.Errorf("delimiter must be a single character, got: %v", p.Delimiter)


Pedantic, but can you use the non-default verb to print (%s in this case)

glinton · 2018-07-26T14:24:58Z

plugins/parsers/csv/parser.go

+	}
+
+	for _, fieldName := range p.FieldColumns {
+		if recordFields[fieldName] == "" {


Consider using value, ok := recordFields[fieldName] then check if !ok and return, line 115 becomes unnecessary.

glinton · 2018-07-26T14:32:29Z

plugins/parsers/csv/parser.go

+	return metrics, nil
+}
+
+//does not use any information in header and assumes DataColumns is set


If a comment is directly above an exported function, start it with // FunctionName ...

glinton · 2018-07-26T14:54:22Z

plugins/parsers/registry.go

+	nameColumn string,
+	timestampColumn string,
+	timestampFormat string,
+	defaultTags map[string]string) (Parser, error) {


Returning an error isn't useful here. Where it's an unexported function not matching any interface, I'd remove it, or remove the function altogether and just instantiate a CSVParser on line 154.

goller · 2018-07-26T19:20:07Z

plugins/parsers/csv/parser.go

+	"github.com/influxdata/telegraf/metric"
+)
+
+type CSVParser struct {


From https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#parsing:

I think you need to allow comments, quote characters, column skipping, row skipping, header row count, and trimming.

It doesn't look like the quote character can be customized when using the Go csv parser, and we would want to stick to this implementation.

The other options sound great, but I don't think they are must have and we could add them later depending on available time.

I can definitely add features to allow comments and trimming. I know quote characters aren't supported by the go csv parser. Regarding column skipping, there is already functionality for that by simply not adding the column name to either csv_tag_columns or csv_field_columns. The header row count raises a few issues about how a header with more than one line would be interpreted, unless we decide to skip it entirely. It would most likely not mesh well with the function to extract column names from the header. We would have to decide how we want to configure that; I think we could probably pair it with the row skipping configuration.

Here is some clarification on how these options should work:

The csv_skip_rows option is an integer that controls the number of lines at the beginning of the file that should be skipped over. I would think you would want a bufio.Reader so you can call ReadLine() this many times before passing this reader into csv.Reader.

The csv_skip_columns option is an integer that controls the number of columns, from the left, to skip over.

Finally csv_header_row_count would replace csv_header, it would be an integer that is the number of rows to treat as the column names, the values would be concatenated for each column. This is applied after you csv_skip_rows, here is an example:

foo,bar 1,2,3

This would produce the column names: ["foo1", "bar1", "3"]. Make sure to allow for lines of differing length.

goller · 2018-07-26T19:27:35Z

plugins/parsers/csv/parser.go

+
+//does not use any information in header and assumes DataColumns is set
+func (p *CSVParser) ParseLine(line string) (telegraf.Metric, error) {
+	r := bytes.NewReader([]byte(line))


ParseLine does not do the same validation as Parse; I don't see delimiter set, for example.

Perhaps extract building the csv reading into a new function that both parse and parseline use.

danielnelson · 2018-07-27T00:07:34Z

@maxunt Don't forget to rebase/merge so that the unrelated grok documentation is no longer present.

…and comment

danielnelson · 2018-07-28T01:57:02Z

docs/DATA_FORMATS_INPUT.md

+  ## By default, this is the name of the plugin
+  ## the `name_override` config overrides this
+  # csv_name_column = ""


Call this csv_measurement_column

danielnelson · 2018-07-28T01:58:51Z

docs/DATA_FORMATS_INPUT.md

+  ## Columns listed here will be added as fields
+  ## the field type is infered from the value of the field
+  csv_field_columns = []


I think we should add all non-tag columns as fields. If someone wants to skip a field they can use fieldpass/fielddrop

danielnelson · 2018-07-28T02:00:21Z

docs/DATA_FORMATS_INPUT.md

+  ## as there are columns of data
+  ## If `csv_header` is set to false, this config must be used
+  csv_data_columns = []


Call this csv_column_names

danielnelson · 2018-07-28T02:09:34Z

internal/config/config.go

+				val, _ := strconv.ParseBool(str.Value)
+				c.CSVTrimSpace = val
+			} else {
+				//for config with quotes


No need to have these else clauses, if its not a bool then it should be an error. This is actually a bug throughout this function, when the type is wrong for the field name it looks like currently we delete the field, when we should return an error and refuse to start Telegraf.

danielnelson · 2018-07-28T02:20:35Z

plugins/parsers/csv/parser.go

+	"github.com/influxdata/telegraf/metric"
+)
+
+type CSVParser struct {


Here is some clarification on how these options should work:

The csv_skip_rows option is an integer that controls the number of lines at the beginning of the file that should be skipped over. I would think you would want a bufio.Reader so you can call ReadLine() this many times before passing this reader into csv.Reader.

The csv_skip_columns option is an integer that controls the number of columns, from the left, to skip over.

Finally csv_header_row_count would replace csv_header, it would be an integer that is the number of rows to treat as the column names, the values would be concatenated for each column. This is applied after you csv_skip_rows, here is an example:

foo,bar 1,2,3

This would produce the column names: ["foo1", "bar1", "3"]. Make sure to allow for lines of differing length.

danielnelson · 2018-07-28T03:08:54Z

plugins/parsers/csv/parser_test.go

+	require.NoError(t, err2)
+
+	//deep equal fields
+	require.True(t, reflect.DeepEqual(goodMetric.Fields(), returnedMetric.Fields()))


require.Equal(t, goodMetric.Fields(), returnedMetric.Fields())

Call these expected/actual, or want/got.

danielnelson · 2018-07-28T03:09:34Z

plugins/parsers/csv/parser_test.go

+
+	metrics, err := p.Parse([]byte(testCSV))
+	require.NoError(t, err)
+	require.Equal(t, true, reflect.DeepEqual(expectedFields, metrics[0].Fields()))


require.Equal(t, expectedFields, metrics[0].Fields())

Check other tests and make sure you are using this everywhere.

danielnelson · 2018-07-28T03:10:09Z

plugins/parsers/csv/parser_test.go

+	metrics, err := p.Parse([]byte(testCSV))
+	for k := range metrics[0].Fields() {
+		log.Printf("want: %v, %T", expectedFields[k], expectedFields[k])
+		log.Printf("got: %v, %T", metrics[0].Fields()[k], metrics[0].Fields()[k])


Boo! no logging in tests

danielnelson · 2018-07-28T03:12:42Z

plugins/parsers/csv/parser.go

+}
+
+func (p *CSVParser) parseRecord(record []string) (telegraf.Metric, error) {
+	recordFields := make(map[string]string)


I think you won't need this intermediate map if you make fields implicit.

danielnelson · 2018-07-28T03:15:12Z

plugins/parsers/csv/parser.go

+		// attempt type conversions
+		if iValue, err := strconv.Atoi(value); err == nil {
+			fields[fieldName] = iValue
+		} else if fValue, err := strconv.ParseFloat(value, 64); err == nil {


This will require all floats to have a decimal part to avoid type mismatch errors. @goller Is this going to work for us in the future?

danielnelson · 2018-08-23T02:38:01Z

docs/DATA_FORMATS_INPUT.md

@@ -2,6 +2,17 @@

 Telegraf is able to parse the following input data formats into metrics:

+<<<<<<< HEAD


This file needs fixed up due to merge issues. I just merged your updates to the JSON parser so you may need to update again too.

im not sure i correctly followed your new format for the INPUT_DATA_FORMATS file when i resolved the merge conflict if you could take a look at that. I think the csv is missing the proper link to its section

danielnelson · 2018-08-23T18:00:35Z

plugins/parsers/csv/parser.go

+			//concatenate header names
+			for i := range header {
+				name := header[i]
+				name = strings.Trim(name, " ")


I think we may not want to trim the strings.

danielnelson · 2018-08-23T18:07:06Z

plugins/parsers/csv/parser.go

+	csvReader := csv.NewReader(r)
+	// ensures that the reader reads records of different lengths without an error
+	csvReader.FieldsPerRecord = -1
+	if p.Delimiter != "" {


Let's compute the effective delimiter and comment values as part of a New() function, then when Parse is called we can create the reader without needing to worry to redo this work.

danielnelson · 2018-08-23T18:07:48Z

plugins/parsers/csv/parser.go

+		}
+	} else if p.HeaderRowCount == 0 && len(p.ColumnNames) == 0 {
+		// if there is no header and no DataColumns, that's an error
+		return nil, fmt.Errorf("there must be a header if `csv_data_columns` is not specified")


This test can also go in the New function.

danielnelson · 2018-08-23T18:08:13Z

plugins/parsers/csv/parser.go

+
+	// if there is nothing in DataColumns, ParseLine will fail
+	if len(p.ColumnNames) == 0 {
+		return nil, fmt.Errorf("[parsers.csv] data columns must be specified")


Can go in New function

danielnelson · 2018-08-23T18:09:23Z

plugins/parsers/csv/parser.go

+	for i, fieldName := range p.ColumnNames {
+		if i < len(record) {
+			value := record[i]
+			value = strings.Trim(value, " ")


Not sure if we want to trim

danielnelson · 2018-08-23T18:11:36Z

plugins/parsers/csv/parser.go

+	// will default to plugin name
+	measurementName := p.MetricName
+	if recordFields[p.MeasurementColumn] != nil {
+		measurementName = recordFields[p.MeasurementColumn].(string)


This could panic if the column is not a string, perhaps we should pull the value from record?

danielnelson · 2018-08-23T18:12:17Z

plugins/parsers/csv/parser.go

+		if recordFields[tagName] == nil {
+			return nil, fmt.Errorf("could not find field: %v", tagName)
+		}
+		tags[tagName] = recordFields[tagName].(string)


Will panic if not a string.

danielnelson · 2018-08-23T18:15:11Z

plugins/parsers/csv/parser.go

+		if recordFields[p.TimestampColumn] == nil {
+			return nil, fmt.Errorf("timestamp column: %v could not be found", p.TimestampColumn)
+		}
+		tStr := recordFields[p.TimestampColumn].(string)


Could panic, maybe use record with two return value form:

timeColumn := record[p.TimestampColumn] if timeColumn != nil { // } timeString, ok := col.(string); ok { }

Might be easier to deal with errors if you put this into a function.

danielnelson · 2018-08-23T18:18:18Z

plugins/parsers/csv/parser.go

+			return nil, fmt.Errorf("timestamp column: %v could not be found", p.TimestampColumn)
+		}
+		tStr := recordFields[p.TimestampColumn].(string)
+		if p.TimestampFormat == "" {


You can verify this in the New function, and then treat this as a struct invariant.

danielnelson assigned goller Jul 19, 2018

glinton reviewed Jul 26, 2018

View reviewed changes

goller suggested changes Jul 26, 2018

View reviewed changes

maxunt added 11 commits July 27, 2018 16:28

unfinished csv parser

a6c1e2b

functionality for csv parser, still needs unit tests

c839ce3

add unit tests for csv parser

3c8cb17

mess with config options

4a07734

fix unit tests

48210f5

change README

67f4929

unfinished test case for csv

d24e687

fix type conversion and add unit test

e07ed58

addresses greg and chris's comments

edd8afc

address some of greg+chris's comments. includes config for trimspace …

b5ff78f

…and comment

get rid of grok changes on branch

7704f3e

maxunt force-pushed the feature/2108 branch from 10b2c5c to 7704f3e Compare July 27, 2018 23:34

danielnelson suggested changes Jul 28, 2018

View reviewed changes

maxunt added 11 commits August 17, 2018 14:27

Merge branch 'master' into feature/2108

83db721

initial config changes

80135ee

Merge branch 'master' into feature/2108

339670f

additional config options

60761d7

start to remove field_column config

24e38f3

just broke a lot. lovely

fc36fd5

fixed it

6e7ec3e

address some of daniel's comments

0d7b236

trim space manually

20ed819

fix config

5016899

Merge branch 'master' into feature/2108

162b092

danielnelson reviewed Aug 23, 2018

View reviewed changes

Merge branch 'master' into feature/2108

86d353f

Remerge data format docs

c058db6

danielnelson force-pushed the feature/2108 branch from 52810e9 to c058db6 Compare August 23, 2018 17:50

danielnelson suggested changes Aug 23, 2018

View reviewed changes

finally fixes hopefully. error checks in registry.go

4847a59

maxunt force-pushed the feature/2108 branch from 5ace570 to 4847a59 Compare August 24, 2018 21:20

maxunt added 2 commits August 24, 2018 14:54

tags are added before default tags

b408ac4

fix tags being removed from fields

acc5ea7

danielnelson approved these changes Aug 24, 2018

View reviewed changes

danielnelson added this to the 1.8.0 milestone Aug 24, 2018

danielnelson added the new plugin label Aug 24, 2018

danielnelson merged commit 889745a into master Aug 24, 2018

danielnelson deleted the feature/2108 branch August 24, 2018 23:40

rgitzel pushed a commit to rgitzel/telegraf that referenced this pull request Oct 17, 2018

Add csv parser (influxdata#4439)

e7e07c0

otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019

Add csv parser (influxdata#4439)

df47169

otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019

Add csv parser (influxdata#4439)

a9c81cd

dupondje pushed a commit to dupondje/telegraf that referenced this pull request Apr 22, 2019

Add csv parser (influxdata#4439)

917e619

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/2108 - csv parser #4439

Feature/2108 - csv parser #4439

maxunt commented Jul 19, 2018 •

edited by glinton

Loading

glinton Jul 26, 2018

glinton Jul 26, 2018

glinton Jul 26, 2018

glinton Jul 26, 2018

goller Jul 26, 2018

danielnelson Jul 27, 2018

maxunt Jul 27, 2018

danielnelson Jul 28, 2018

goller Jul 26, 2018

danielnelson commented Jul 27, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Jul 28, 2018

danielnelson Aug 23, 2018

maxunt Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

danielnelson Aug 23, 2018

		@@ -2,6 +2,17 @@

		Telegraf is able to parse the following input data formats into metrics:

		<<<<<<< HEAD

Feature/2108 - csv parser #4439

Feature/2108 - csv parser #4439

Conversation

maxunt commented Jul 19, 2018 • edited by glinton Loading

Required for all PRs:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielnelson commented Jul 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxunt commented Jul 19, 2018 •

edited by glinton

Loading