Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logparser common log format error (nginx/apache) #1810

Closed
tympanix opened this issue Sep 24, 2016 · 4 comments · Fixed by #1864
Closed

Logparser common log format error (nginx/apache) #1810

tympanix opened this issue Sep 24, 2016 · 4 comments · Fixed by #1864
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@tympanix
Copy link

tympanix commented Sep 24, 2016

Bug report

Using the logparser plugin to parse nginx access log files does not parse http basic auth requests when the username contains a digit or spaces.

Applies to both the COMMON_LOG_FORMAT and COMBINED_LOG_FORMAT grok pattern. Issue may be relevant for apache logs as well.

Relevant telegraf.conf:

# Stream and parse log file(s).
[[inputs.logparser]]
  files = ["/var/log/nginx/access.log"]
  from_beginning = false

  [inputs.logparser.grok]
    patterns = ["%{COMMON_LOG_FORMAT}"]
    measurement = "nginx_access_log"

System info:

Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u2 (2016-01-02) x86_64 GNU/Linux
Telegraf - version 1.0.0
nginx version: nginx/1.6.2

Steps to reproduce:

  1. Set up telegraf.conf file as above
  2. Echo the examples to the logfile (see additional info)
  3. Telegraf will not match the grok pattern to the log

Expected behavior:

Telegraf matches the log file using either the COMMON_LOG_FORMAT or the COMBINED_LOG_FORMAT and passes the log onto the outputs.

Actual behavior:

When the username contains digits the log is ignored. When containing spaces words are parsed as other attributes (e.g. client_ip will be parsed as one of the words).

Additional info:

Here are some example logs that causes the error:

Using numbers in the http basic auth username:

127.0.0.1 - username123 [25/Sep/2016:00:19:43 +0200] "GET / HTTP/1.1" 401 590 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36"

Using spaces in the http basic auth username:

127.0.0.1 - my username here [25/Sep/2016:00:17:36 +0200] "GET / HTTP/1.1" 401 590 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36"
@sparrc sparrc added the bug unexpected problem or unintended behavior label Sep 26, 2016
@sparrc sparrc added this to the 1.1.0 milestone Sep 26, 2016
@andrecrt
Copy link

andrecrt commented Oct 6, 2016

@tympanix thanks for finding the bug! I was getting nuts not understanding why some requests weren't appearing on Influx. Taking that knowledge, I updated my logparser to "ignore" processing both ident and auth (see the first 2 %{DATA}) and now all requests seem to be logged properly!

[[inputs.logparser]]
  ## files to tail.
  files = ["/var/log/nginx/access.log"]
  ## Read file from beginning.
  from_beginning = true
  ## Override the default measurement name, which would be "logparser_grok"
  name_override = "nginx_access_log"
  ## For parsing logstash-style "grok" patterns:
  [inputs.logparser.grok]
    patterns = ["%{CUSTOM_LOG}"]
    custom_patterns = '''
      CUSTOM_LOG %{CLIENT:client_ip} %{DATA} %{DATA} \[%{HTTPDATE:ts:ts-httpd}\] "(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-)
    '''

@tympanix
Copy link
Author

tympanix commented Oct 6, 2016

Great solution. I've done something similar by adding digits (0-9) and spaces to the NGUSER pattern to overcome this issue. We have a potential issue where both the ident and auth contains multiple words though. You wouldn't be able to tell them apart. I've never seen this in practice though.

@sparrc
Copy link
Contributor

sparrc commented Oct 7, 2016

@tympanix are http ident and auth allowed to have spaces in them? I'm not sure there's anything we can do if so. I will definitely fix the case of numbers in the ident & auth for release 1.1.

@tympanix
Copy link
Author

tympanix commented Oct 7, 2016

I have tested this on my own nginx server, and seemingly the http basic module does not complain when using spaces. The following is logged when using "my username here" as the username:

127.0.0.1 - my username here [25/Sep/2016:00:17:36 +0200] "GET / HTTP/1.1" 401 590 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36"

I would point out that this is an edge case. Also the password (which I assume to be the first dash) doesn't show regardless of the input. If that is always the case then we should be able to parse the log unambiguously. I don't know if this is related to apache as well.

Thank you for the commit 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants