Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown #170

Open
kao73 opened this issue Feb 9, 2021 · 3 comments
Open

Graceful shutdown #170

kao73 opened this issue Feb 9, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@kao73
Copy link

kao73 commented Feb 9, 2021

How should I configure the CloudWatch Agent to make sure all collected data pushed away when I stop the agent?
I tried many options, but wasn't able to make it work as expected.

We use metrics to collect statistics in our application. Thus, we expect the metrics to be precise enough.
We have several ASG clusters with installed CloudWatch Agent with StatsD plugin on each EC2. There are several modules sending metrics to the agent. We found some metrics lost when ASG scales down. During investigation and some manual tests we found the agent doesn't publish collected data even if the amazon-cloudwatch-agent service stopped gracefully.

Can it be just incorrect configuration or WAD feature?

My configuration:

{
  "agent": {
    "metrics_collection_interval": 60,
    "omit_hostname": true,
    "logfile": "/var/log/amazon-cloudwatch-agent.log"
  },
  "metrics": {
    "namespace": "MyNameSpace",
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}"
    },
    "metrics_collected": {
      "statsd": {
        "metrics_collection_interval": 60,
        "metrics_aggregation_interval": 0
      },
      "mem": {
        "metrics_collection_interval": 60,
        "metrics_aggregation_interval": 0,
        "measurement": [
          "used_percent"
        ]
      }
    }
  }
}

For manual test we used:

  • start the agent service
  • send 1000 data points for a single metric by netcat
  • wait about 20 seconds
  • stop the agent service

Thanks.

@pingleig
Copy link
Member

pingleig commented Feb 9, 2021

I think if you send a SIGTERM to the agent, it should flush the data in buffer. btw: We are not following telegraf closely, which support flushing without shutdown using SIGUSER1 influxdata/telegraf#7366

@Shigerello
Copy link

Fluentd supports a variety of flushing and shutting-down options using signals as well.

https://docs.fluentd.org/deployment/signals

SIGUSR1
Forces the buffered messages to be flushed and reopens Fluentd's log. Fluentd will try to flush the current buffer (both memory and file) immediately, and keep flushing at flush_interval.

@mihaileu
Copy link

I have same issue

[agent] Hang on, flushing any cached metrics before shutdown

doesn't flush the last aggregated stats. Is there a workaround or a fix for this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants