Output Cloudwatch Logs not using Region #11963

alec-medcrypt · 2022-10-07T17:18:54Z

I have a telegraf config setup on a node without the AWS CLI installed. Below is the config:

`# Generic, basic /usr/local/etc/telegraf.conf file for FreeBSD

Gathers some basic metrics and transmits them to cloudwatch

Be sure to set the region below

[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = false

[[processors.aws_ec2]]
imds_tags = ["instanceId"]

[[inputs.tail]]
name_override = "app_logs"
files = ["/var/log/app/*.log"]
data_format = "grok"
grok_patterns = ['%{GREEDYDATA:message}']

[[outputs.cloudwatch_logs]]
region = "us-east-2"
log_group = "/namespace/env/app"
log_stream = "tag:instanceId"
log_data_metric_name = "app_logs"
log_data_source = "field:message"`

I am getting the following errors:

2022-10-07T17:16:36Z E! [telegraf] Error running agent: connecting output outputs.cloudwatch_logs: Error connecting to output "outputs.cloudwatch_logs": operation error CloudWatch Logs: DescribeLogGroups, failed to resolve service endpoint, an AWS region is required, but was not found 2022-10-07T17:16:37Z E! [agent] Failed to connect to [outputs.cloudwatch_logs], retrying in 15s, error was 'operation error CloudWatch Logs: DescribeLogGroups, failed to resolve service endpoint, an AWS region is required, but was not found'

The text was updated successfully, but these errors were encountered:

powersj · 2022-10-11T20:20:26Z

Hi,

What are you using for credentials to log in to AWS?

The typical config stanza includes the following options:

  ## Amazon Credentials
  ## Credentials are loaded in the following order
  ## 1) Web identity provider credentials via STS if role_arn and
  ##    web_identity_token_file are specified
  ## 2) Assumed credentials via STS if role_arn is specified
  ## 3) explicit credentials from 'access_key' and 'secret_key'
  ## 4) shared profile from 'profile'
  ## 5) environment variables
  ## 6) shared credentials file
  ## 7) EC2 Instance Profile
  # access_key = ""
  # secret_key = ""
  # token = ""
  # role_arn = ""
  # web_identity_token_file = ""
  # role_session_name = ""
  # profile = ""
  # shared_credential_file = ""

Are you certain whatever credentials you are using also have access to that region?

Thanks!

telegraf-tiger · 2022-10-26T18:09:42Z

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

starek4 · 2023-07-17T20:31:24Z

@alec-medcrypt did you solve this? having the same issue.

I am using these options:

region
access_key
secret_key
log_group
log_stream
log_data_metric_name
log_data_source

The very same config is working on my local desktop, but not working on IoT arm device. Exactly the same behaviour as you described. On my local desktop I have the AWS CLI, on the IoT device I don't have it.

starek4 · 2023-07-17T20:50:50Z

Hmm, I figure it out. Even tho I have correctly setup the aws region in the config file via region key, it started to work after I also set the AWS_REGION env variable.

colinbut · 2023-08-17T20:05:48Z

Can we re-open this issue please?

agree with @starek4 - unless i'm missing something else, I also found out that setting up the region key doesn't seem to work and only works when set AWS_REGION env variable.

If it is required to set the env variable then that defeats purpose of having the region key at all in the conf file. I feel we should fix this so that the region key in the conf file works without needing to set the env variable

powersj · 2023-08-17T20:19:42Z

@colinbut @starek4,

Can you find what the last working version of telegraf was so we can look at what changed?

Looking at the AWS credentials.go, we set the region based on the value in the toml if no RoleARN was set.

colinbut · 2023-08-18T22:27:16Z

@powersj,

I'm using the latest version and I'm using profile credentials. Not sure about @starek4 - what credentials method you using?

The error we're seeing appears to be from the go aws sdk:

operation error CloudWatch Logs: DescribeLogGroups, failed to resolve service endpoint, an AWS region is required,

However, irrespective i had a look at the lines of code you referenced.

Tbh, I don't really understand Go and no proficient in it but doesn't the following code reads "try load options into options variable and if error, execute configV2.WithRegion(c.Region)"

... which if my reading of the code is correct then means the region will never get set assuming load options is successful.

options := []func(*configV2.LoadOptions) error{
		configV2.WithRegion(c.Region),
}

Shouldn't it be something like this instead?

options,err := configV2.LoadOptions
if err != nil {
//... 
}

options = append(options, configV2.WithRegion(c.Region))

starek4 · 2023-08-19T05:06:00Z

I am using the access key and secret key combination. And also the latest version, just the version for armhf (arm32v7).

powersj · 2023-08-19T15:06:39Z

Shouldn't it be something like this instead?

No, this is a common pattern. Essentially the client (telegraf) builds an array of options, like connection time outs, credentials to load, etc. and passes that entire array of options to the connect call. Then in this case the AWS library will load all those options, set up the necessary settings and then connect.

You can see examples of this in the AWS SDK for Go v2 Configuration page. There they explain that we can set the region in one of two ways, either how we are with WithRegion(), or via the environment variable.

On Monday, I'll build a custom version of telegraf and we can see what your platforms are reporting.

fixes: influxdata#11963

powersj · 2023-08-21T16:58:05Z

@starek4, @colinbut,

I have put up PR #13803 which includes some additionally logging. Can you please download the artifacts and reproduce the issue. Those artifacts should be up in 5-10mins.

Please provide the full logs. I would like to see what was set, when and where.

colinbut · 2023-08-27T13:20:52Z

@powersj

2023-08-26T09:13:07Z D! [agent] Connecting outputs
2023-08-26T09:13:07Z D! [agent] Attempting connection to [outputs.cloudwatch_logs]
linux
arm64
env AWS_REGION is ""
setting region to: eu-west-1
loading with root credentials
loaded config is using region: eu-west-1
2023-08-26T09:13:07Z E! [agent] Failed to connect to [outputs.cloudwatch_logs], retrying in 15s, error was "operation error CloudWatch Logs: DescribeLogGroups, failed to resolve service endpoint, an AWS region is required, but was not found"
linux
arm64
env AWS_REGION is ""
setting region to: eu-west-1
loading with root credentials
loaded config is using region: eu-west-1
2023-08-26T09:13:22Z E! [telegraf] Error running agent: connecting output outputs.cloudwatch_logs: error connecting to output "outputs.cloudwatch_logs": operation error CloudWatch Logs: DescribeLogGroups, failed to resolve service endpoint, an AWS region is required, but was not found

powersj · 2023-08-28T13:22:12Z

@colinbut - is eu-west-1 the expected region? If so, that does in fact look like we are reading it correctly and setting the value.

Can you show the same example but with the AWS region set via the environment?

colinbut · 2023-08-28T15:01:49Z

@powersj

Yes i'm using eu-west-1 region.

Here's the log with region set as env var:

root@ip-172-31-34-228 bin]# AWS_REGION=eu-west-1 ./telegraf --config ../../etc/telegraf/telegraf.conf
2023-08-28T14:57:04Z I! Loading config: ../../etc/telegraf/telegraf.conf
2023-08-28T14:57:04Z I! Starting Telegraf 1.28.0-938ed112
2023-08-28T14:57:04Z I! Available plugins: 239 inputs, 9 aggregators, 28 processors, 24 parsers, 59 outputs, 5 secret-stores
2023-08-28T14:57:04Z I! Loaded inputs: cpu disk diskio docker_log kernel mem processes swap system
2023-08-28T14:57:04Z I! Loaded aggregators:
2023-08-28T14:57:04Z I! Loaded processors:
2023-08-28T14:57:04Z I! Loaded secretstores:
2023-08-28T14:57:04Z I! Loaded outputs: cloudwatch_logs
2023-08-28T14:57:04Z I! Tags enabled: host=###
2023-08-28T14:57:04Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"###", Flush Interval:10s
2023-08-28T14:57:04Z D! [agent] Initializing plugins
2023-08-28T14:57:04Z D! [outputs.cloudwatch_logs] Log data: key "field", source "message"...
2023-08-28T14:57:04Z D! [outputs.cloudwatch_logs] Log stream "/aws/ec2/####"...
2023-08-28T14:57:04Z D! [agent] Connecting outputs
2023-08-28T14:57:04Z D! [agent] Attempting connection to [outputs.cloudwatch_logs]
linux
arm64
env AWS_REGION is "eu-west-1"
setting region to: eu-west-1
loading with root credentials
loaded config is using region: eu-west-1
2023-08-28T14:57:04Z D! [outputs.cloudwatch_logs] Found log group "/aws/ec2/####"
2023-08-28T14:57:04Z D! [agent] Successfully connected to outputs.cloudwatch_logs
2023-08-28T14:57:04Z D! [agent] Starting service inputs
2023-08-28T14:57:14Z D! [outputs.cloudwatch_logs] Processing metric 'docker_log map[com.docker.

*note i've masked certain data to preserve my company's ip but the logs clearly show it can connect to the outputs.cloudwatch_logs output plugin.

powersj · 2023-08-28T15:59:47Z

I'm thinking this is worth an upstream issue then. Both scenarios appear to load a config using the correct region and you appear to have permissions for the region. Nothing appears to be obviously wrong when we make the request.

Can you file an issue at the upstream issue: https://github.com/aws/aws-sdk-go-v2/issues

colinbut · 2023-08-30T17:43:43Z

I can certainly try...

Agree, it appears to be the go aws sdk could be the problem if the code what you say is as what you expect.

Have raised issue on aws-sdk-go-v2 - aws/aws-sdk-go-v2#2260

colinbut · 2023-09-02T10:30:59Z

for anyone encountering this problem also, the workaround for time being is to simply invoke telegraf by supplying the AWS Region as env var, e.g.

AWS_REGION=eu-west-1 ./telegraf --config /etc/telegraf/telegraf.conf

Alternatively, if managing this via systemd, can edit the default service file:

e.g.

[Unit]
Description=Telegraf
Documentation=https://github.com/influxdata/telegraf
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/telegraf
User=telegraf
Environment="AWS_REGION=eu-west-1"
ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartForceExitStatus=SIGPIPE
KillMode=control-group
LimitMEMLOCK=8M:8M

[Install]
WantedBy=multi-user.target

powersj · 2023-09-05T16:46:35Z

Thanks for raising the issue with them. I have responded on the thread as well.

fixes: influxdata#11963

powersj · 2023-09-05T19:52:02Z

@colinbut,

I have put up another PR for you to try after trying to learn about the different aws config loading options. Looking at #10841 it does look like a change to how credentials were loaded occurred.

Could you give #13868 a try once artifacts are posted?

colinbut · 2023-09-06T09:48:26Z

@powersj

i now get a different error:

2023-09-06T09:45:23Z D! [outputs.cloudwatch_logs] Log data: key "field", source "message"...
2023-09-06T09:45:23Z D! [outputs.cloudwatch_logs] Log stream "log-group"...
2023-09-06T09:45:23Z D! [agent] Connecting outputs
2023-09-06T09:45:23Z D! [agent] Attempting connection to [outputs.cloudwatch_logs]
linux
arm64
env AWS_REGION is ""
setting region to: eu-west-1
loading with root credentials
loaded config is using region: eu-west-1
2023-09-06T09:45:26Z E! [agent] Failed to connect to [outputs.cloudwatch_logs], retrying in 15s, error was "operation error CloudWatch Logs: DescribeLogGroups, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"/\": unsupported protocol scheme \"\""
linux
arm64
env AWS_REGION is ""
setting region to: eu-west-1
loading with root credentials
loaded config is using region: eu-west-1
2023-09-06T09:45:44Z E! [telegraf] Error running agent: connecting output outputs.cloudwatch_logs: error connecting to output "outputs.cloudwatch_logs": operation error CloudWatch Logs: DescribeLogGroups, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "/": unsupported protocol scheme ""

powersj · 2023-09-06T12:36:52Z

@colinbut

Two questions as I'm not sure I follow the limited logs:

Can you provide your config with any secrets removed?
The log messages show some comments about log data and stream. Which means it did connect, but then I see connecting outputs. Are the log messages from two different runs? single run?

colinbut · 2023-09-07T09:09:42Z

@powersj

just like last time, it is one single run.

I'm running the telegraf within EC2 instance that has a docker container running inside it. Therefore, I'm using the docker_log inputs plugin to capture logs on stdout and sending to AWS CloudWatch via the cloudwatch outputs plugin.

My telegraf config:

[global_tags]

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  hostname = ""
  omit_hostname = false

 [[outputs.cloudwatch_logs]]
   region = "eu-west-1"
   profile = "######"
   log_group = "#######"
   log_stream = "log-group"
   log_data_metric_name  = "docker_log"
   log_data_source  = "field:message"

 [[inputs.docker_log]]
    endpoint = "unix:///var/run/docker.sock"
    from_beginning = true
    timeout = "5s"
    source_tag = false

fixes: influxdata#11963

colinbut · 2024-03-01T15:06:01Z

@powersj Hi, just wondering whether there is any update regarding this matter since it's been nearly 6 months lapsed.

I can see the last activity is your commit of adding debug logs? to an open branch of yours ?

powersj · 2024-03-01T15:29:53Z

Hi,

I have no update as I have not looked into this further. There was a branch that you were testing to help me try to understand what was going on, but I am not sure anything was learned from that. Additionally, when we asked upstream for help their opinion was for us to use a debugger. Which means the next steps are either I or someone else will need to somehow reproduce this and walk through what may or may not be going on.

There is a workaround via the environment variable, which is nice, but I understand it is not ideal. I can add this back to my list as I did forget about it, but it could be faster if someone with more knowledge of AWS tried to as well.

powersj added the waiting for response waiting for response from contributor label Oct 11, 2022

telegraf-tiger bot closed this as completed Oct 26, 2022

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jul 17, 2023

powersj reopened this Aug 17, 2023

powersj added the waiting for response waiting for response from contributor label Aug 17, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 18, 2023

powersj added a commit to powersj/telegraf that referenced this issue Aug 21, 2023

fix: Debug for cloudwatch output region

cd8d941

fixes: influxdata#11963

powersj mentioned this issue Aug 21, 2023

fix: Debug for cloudwatch output region #13803

Closed

powersj added the waiting for response waiting for response from contributor label Aug 21, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 27, 2023

powersj added the waiting for response waiting for response from contributor label Aug 28, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 28, 2023

powersj added waiting for response waiting for response from contributor upstream bug or issues that rely on dependency fixes labels Aug 28, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 30, 2023

colinbut mentioned this issue Sep 2, 2023

LoadDefaultConfig not programmatically setting region, need AWS Region set as environment variable aws/aws-sdk-go-v2#2260

Closed

powersj added a commit to powersj/telegraf that referenced this issue Sep 5, 2023

fix: Debug for cloudwatch output region

2c09c33

fixes: influxdata#11963

powersj mentioned this issue Sep 5, 2023

fix: debug for cloudwatch output region #13868

Closed

powersj added the waiting for response waiting for response from contributor label Sep 5, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Sep 6, 2023

powersj added the waiting for response waiting for response from contributor label Sep 6, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Sep 7, 2023

powersj added a commit to powersj/telegraf that referenced this issue Sep 7, 2023

fix: Debug for cloudwatch output region

da9f95b

fixes: influxdata#11963

powersj mentioned this issue May 30, 2024

Use of include_linked_accounts causes an IndexOutOfRange error when requesting SES metrics #15422

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output Cloudwatch Logs not using Region #11963

Output Cloudwatch Logs not using Region #11963

alec-medcrypt commented Oct 7, 2022

powersj commented Oct 11, 2022

telegraf-tiger bot commented Oct 26, 2022

starek4 commented Jul 17, 2023 •

edited

Loading

starek4 commented Jul 17, 2023

colinbut commented Aug 17, 2023

powersj commented Aug 17, 2023

colinbut commented Aug 18, 2023

starek4 commented Aug 19, 2023

powersj commented Aug 19, 2023

powersj commented Aug 21, 2023

colinbut commented Aug 27, 2023

powersj commented Aug 28, 2023

colinbut commented Aug 28, 2023

powersj commented Aug 28, 2023

colinbut commented Aug 30, 2023

colinbut commented Sep 2, 2023

powersj commented Sep 5, 2023

powersj commented Sep 5, 2023

colinbut commented Sep 6, 2023

powersj commented Sep 6, 2023

colinbut commented Sep 7, 2023

colinbut commented Mar 1, 2024

powersj commented Mar 1, 2024

Output Cloudwatch Logs not using Region #11963

Output Cloudwatch Logs not using Region #11963

Comments

alec-medcrypt commented Oct 7, 2022

Gathers some basic metrics and transmits them to cloudwatch

Be sure to set the region below

powersj commented Oct 11, 2022

telegraf-tiger bot commented Oct 26, 2022

starek4 commented Jul 17, 2023 • edited Loading

starek4 commented Jul 17, 2023

colinbut commented Aug 17, 2023

powersj commented Aug 17, 2023

colinbut commented Aug 18, 2023

starek4 commented Aug 19, 2023

powersj commented Aug 19, 2023

powersj commented Aug 21, 2023

colinbut commented Aug 27, 2023

powersj commented Aug 28, 2023

colinbut commented Aug 28, 2023

powersj commented Aug 28, 2023

colinbut commented Aug 30, 2023

colinbut commented Sep 2, 2023

powersj commented Sep 5, 2023

powersj commented Sep 5, 2023

colinbut commented Sep 6, 2023

powersj commented Sep 6, 2023

colinbut commented Sep 7, 2023

colinbut commented Mar 1, 2024

powersj commented Mar 1, 2024

starek4 commented Jul 17, 2023 •

edited

Loading