Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple SIGHUPs in short succession causes a panic #5879

Closed
Tommoa opened this issue May 20, 2019 · 2 comments
Closed

Multiple SIGHUPs in short succession causes a panic #5879

Tommoa opened this issue May 20, 2019 · 2 comments
Assignees
Labels
area/agent bug unexpected problem or unintended behavior
Milestone

Comments

@Tommoa
Copy link

Tommoa commented May 20, 2019

Relevant telegraf.conf:

Default configuration from ./telegraf config --output-filter file as well as

# Telegraf Configuration
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  hostname = ""
  omit_hostname = false

[[outputs.file]]
   files = ["stdout"]
   data_format = "influx"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]

System info:

Linux kernel: 5.0.11-arch1-1-ARCH
Telegraf version: Telegraf unknown (git: master e6dd8536)

Steps to reproduce:

  1. Start telegraf
  2. Run while [ true ]; do kill -s SIGHUP $(pidof telegraf); done

Expected behavior:

Configuration reloads on a loop.

Actual behavior:

E! [telegraf] Error running agent: context canceled

Additional info:

This bug actually doesn't occur if the config being used is small enough; I can reproduce this using the default generated config however if I remove all of the commented lines from that config (as per the configuration above), then the error never occurs.

@danielnelson danielnelson added area/agent bug unexpected problem or unintended behavior labels May 20, 2019
@danielnelson danielnelson self-assigned this May 20, 2019
@sgtsquiggs
Copy link
Contributor

sgtsquiggs commented May 30, 2019

Not too sure about the unintended side effects, but this can be mitigated in cmd/telegraf/telegraf.go:

diff --git a/cmd/telegraf/telegraf.go b/cmd/telegraf/telegraf.go
index a3fae740..7fead672 100644
--- a/cmd/telegraf/telegraf.go
+++ b/cmd/telegraf/telegraf.go
@@ -100,7 +100,9 @@ func reloadLoop(
                }()
 
                err := runAgent(ctx, inputFilters, outputFilters)
-               if err != nil {
+               if err == context.Canceled {
+                       log.Printf("E! [telegraf] Error running agent: %v", err)
+               } else if err != nil {
                        log.Fatalf("E! [telegraf] Error running agent: %v", err)
                }
        }

Since the context is only cancelled via SIGHUP or stop, I believe we can safely ignore contex.Canceled error from runAgent.

@danielnelson
Copy link
Contributor

Yeah that should basically work, though I think we just want to ignore context.Canceled. 3d7a718

@danielnelson danielnelson added this to the 1.11.0 milestone May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants