Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch Agent 1.247351.0 Fails to Start on Windows Server 2022 #471

Closed
SignalRichard opened this issue May 18, 2022 · 11 comments
Closed
Labels
bug Something isn't working os/windows Windows

Comments

@SignalRichard
Copy link

Describe the bug
The CloudWatch Agent fails to start on Windows Server 2022 when using the amazon-cloudwatch-agent-ctl.ps1 to set the configuration file.

Steps to reproduce
Start the cloud watch agent with a local configuration file similar to:
& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:C:\asg-cloudwatch\amazon-cloudwatch-agent.json

What did you expect to see?
Expected the CloudWatch agent to start.

What did you see instead?
The agent does not start - the log file indicates that it does not re-attempt to start after it detects the instance is EC2 when the metadata service is not yet available (see log files and additional info below)

What version did you use?
Version: 1.247351.0b251861
Downloaded from: https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi

Please note - there is some anomaly here when a regional endpoint is used instead of the global endpoint the following version is downloaded which DOES NOT produce the problematic behavior: 1.247350.0b251780
Example regional endpoint: https://s3.us-west-2.amazonaws.com/amazoncloudwatch-agent-us-west-2/windows/amd64/latest/amazon-cloudwatch-agent.msi

What config did you use?
Config:

{
        "logs": {
                "logs_collected": {
                        "files": {
                                "collect_list": [
                                        {
                                                "file_path": "C:\\example\\_logs\\*.log",
                                                "log_group_name": "example-group/log-group",
                                                "log_stream_name": "{instance_id}"
                                        }
                                ]
                        }
                }
        },
        "metrics": {
                "namespace": "ExampleNamespace",
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "InstanceId": "${aws:InstanceId}"
                },
                "metrics_collected": {
                        "LogicalDisk": {
                                "measurement": [
                                        "% Free Space"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "Memory": {
                                "measurement": [
                                        "% Committed Bytes In Use"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "Paging File": {
                                "measurement": [
                                        "% Usage"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "PhysicalDisk": {
                                "measurement": [
                                        "% Disk Time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "Processor": {
                                "measurement": [
                                        "% User Time",
                                        "% Idle Time",
                                        "% Interrupt Time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "_Total"
                                ]
                        }
                }
        }
}

Environment
OS: Windows Server 2022 (10.0.20348 Build 707)

Additional context
This issue appears to only happen on Windows Server 2022 since I have also tested the version in question on Windows Server 2019 without issue.

Here are the log files from various configurations:

Windows Server 2022 with CloudWatch Agent 1.247351.0

PS C:\ProgramData\Amazon\AmazonCloudWatchAgent\Logs> cat .\amazon-cloudwatch-agent.log
2022/05/17 18:55:55 I! 2022/05/17 18:55:55 D! [EC2] Found active network interface
2022/05/17 18:55:55 E! ec2metadata is not available
I! Detected the instance is OnPremise
2022/05/17 18:55:55 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
No json config files found, please provide config, exit now

2022/05/17 18:55:55 I! Return exit error: exit code=99
2022/05/17 18:55:55 I! there is no json configuration when running translator
2022/05/17 19:03:39 I! 2022/05/17 19:03:39 D! [EC2] Found active network interface
I! Detected the instance is EC2
2022/05/17 19:03:39 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
No json config files found, please provide config, exit now

2022/05/17 19:03:39 I! Return exit error: exit code=99
2022/05/17 19:03:39 I! there is no json configuration when running translator

Windows Server 2022 with CloudWatch Agent 1.247350.0

PS C:\ProgramData\Amazon\AmazonCloudWatchAgent\Logs> cat .\amazon-cloudwatch-agent.log
2022/05/17 16:34:30 I! 2022/05/17 16:34:30 E! ec2metadata is not available
I! Detected the instance is OnPrem
2022/05/17 16:34:30 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
No json config files found, please provide config, exit now

2022/05/17 16:34:30 I! Return exit error: exit code=99
2022/05/17 16:34:30 I! there is no json configuration when running translator
2022/05/17 16:39:38 I! I! Detected the instance is EC2
2022/05/17 16:39:38 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
No json config files found, please provide config, exit now

2022/05/17 16:39:39 I! Return exit error: exit code=99
2022/05/17 16:39:39 I! there is no json configuration when running translator
2022/05/17 16:43:20 I! I! Detected the instance is EC2
2022/05/17 16:43:20 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2022/05/17 16:43:20 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\Configs\file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
No csm configuration found.
No windows event log configuration found.
Configuration validation first phase succeeded

2022/05/17 16:43:20 I! Config has been translated into TOML C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.toml
2022-05-17T16:43:20Z I! Starting AmazonCloudWatchAgent 1.247350.0
2022-05-17T16:43:20Z I! AWS SDK log level not set
2022-05-17T16:43:20Z I! Loaded inputs: logfile win_perf_counters
2022-05-17T16:43:20Z I! Loaded aggregators:
2022-05-17T16:43:20Z I! Loaded processors: ec2tagger
2022-05-17T16:43:20Z I! Loaded outputs: cloudwatchlogs cloudwatch
2022-05-17T16:43:20Z I! Tags enabled: host=EC2AMAZ-*******
2022-05-17T16:43:20Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"EC2AMAZ-*******", Flush Interval:1s
2022-05-17T16:43:20Z I! [logagent] starting
... [truncated] ...

Windows Server 2019 with CloudWatch Agent 1.247351.0

PS C:\ProgramData\Amazon\AmazonCloudWatchAgent\Logs> cat .\amazon-cloudwatch-agent.log
2022/05/18 12:06:46 I! 2022/05/18 12:06:39 D! [EC2] Found active network interface
2022/05/18 12:06:46 E! ec2metadata is not available
I! Detected the instance is OnPremise
2022/05/18 12:06:46 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
No json config files found, please provide config, exit now

2022/05/18 12:06:46 I! Return exit error: exit code=99
2022/05/18 12:06:46 I! there is no json configuration when running translator
2022/05/18 12:15:01 I! 2022/05/18 12:14:55 D! [EC2] Found active network interface
2022/05/18 12:15:01 E! ec2metadata is not available
I! Detected the instance is OnPremise
2022/05/18 12:15:01 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
No json config files found, please provide config, exit now

2022/05/18 12:15:02 I! Return exit error: exit code=99
2022/05/18 12:15:02 I! there is no json configuration when running translator
2022/05/18 12:22:10 I! 2022/05/18 12:22:10 D! [EC2] Found active network interface
I! Detected the instance is EC2
2022/05/18 12:22:10 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2022/05/18 12:22:10 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\Configs\file_amazon-cloudwatch-agent.json ...
2022/05/18 12:22:10 I! Valid Json input schema.
No csm configuration found.
No windows event log configuration found.
Configuration validation first phase succeeded

2022/05/18 12:22:10 I! Config has been translated into TOML C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.toml
2022-05-18T12:22:10Z I! Starting AmazonCloudWatchAgent 1.247351.0
2022-05-18T12:22:10Z I! AWS SDK log level not set
2022-05-18T12:22:10Z I! Loaded inputs: logfile win_perf_counters
2022-05-18T12:22:10Z I! Loaded aggregators:
2022-05-18T12:22:10Z I! Loaded processors: ec2tagger
2022-05-18T12:22:10Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-05-18T12:22:10Z I! Tags enabled: host=EC2AMAZ-2GJIQIF
2022-05-18T12:22:10Z I! [logagent] starting
... [truncated] ...
@SaxyPandaBear
Copy link
Contributor

Are you running this in Powershell or Powershell ISE? I recall that a change in piping debug logs to stderr instead of stdout causes ISE to think that there is an error with the executable and stops

@SignalRichard
Copy link
Author

It is run in PowerShell as part of the user data script for an EC2 instance

@SaxyPandaBear SaxyPandaBear added bug Something isn't working os/windows Windows labels May 19, 2022
@fivehorizons
Copy link

+1 for issue

@SaxyPandaBear
Copy link
Contributor

Considering that it works on Windows Server 2019, it seems like there's some change/regression at an OS level. That being said, I'm fairly certain that this is related to the issue I mentioned since I think the only change that would have impacted this is changing from using the fmt lib to log, which pipes the output to stderr instead of stdout. But would have to investigate.

@khanhntd
Copy link
Contributor

Hey @SignalRichard,
I cannot replicate your issues on Windows 2022 in two cases: valid json configuration and in-valid json configuration. However, for the in-valid json configuration, it would give me this error of

Start-Service : Service 'Amazon CloudWatch Agent (AmazonCloudWatchAgent)' cannot be started due to the following
error: Cannot start service AmazonCloudWatchAgent on computer '.'.

Would it be best for you to show us the replication process ? From the AMI and go through the whole process. Moreover, at least we know that it does not relates to Window PowerShell ISE and Powershell since CWAgent does not stop at

2022/05/17 18:55:55 I! 2022/05/17 18:55:55 D! [EC2] Found active network interface

@SaxyPandaBear
Copy link
Contributor

@SignalRichard @fivehorizons are either of you able to provide the EC2 user data script you used to install the CloudWatch agent? I think that installing the agent manually on a Windows Server 2022 EC2 works, as far as I know. It would help with trying to reproduce the exact issue if we could take a look at your existing user data

@SignalRichard
Copy link
Author

I will try and get you a succinct setup to reproduce this issue - currently the CloudWatch agent is installed on a machine and a new AMI is created and when the instance starts the user data initializes/configures the CloudWatch agent. Unfortunately, the AMI creation process and the AMI itself is complicated so I want to provide a boiled down example. To note - I downgraded the CloudWatch agent on the existing AMI that I was having issues with to the previous version and it starts and works as expected.

@khanhntd
Copy link
Contributor

khanhntd commented May 25, 2022

@SignalRichard @fivehorizons Another thing we would like you to provide us is the error info in the Event Viewer during the the CWAgent fails to start.

@SaxyPandaBear
Copy link
Contributor

I think I've been able to root cause it. #473 accounts for this in that it wraps a call to the config-downloader executable in cmd /c, so it pipes the output to stdout instead of letting the debug logs go to stderr.

I was able to reproduce the issue on the 351 release, and used this user data script to check if I fixed the issue:

<powershell>
cd "C:\Users\Administrator"
Start-Process msiexec.exe -ArgumentList '/i https://awscli.amazonaws.com/AWSCLIV2.msi' -Wait

Start-Process 'C:\Program Files\Amazon\AWSCLIV2\aws.exe' -ArgumentList 's3 cp s3://[my-bucket]/amazon-cloudwatch-agent.zip .' -Wait

Expand-Archive -Path .\amazon-cloudwatch-agent.zip -DestinationPath .

cd amazon-cloudwatch-agent/

& '.\install.ps1'

New-Item -Path C:\asg-cloudwatch -ItemType Directory

Add-Content -path "C:\asg-cloudwatch\amazon-cloudwatch-agent.json" @'
{
        "logs": {
                "logs_collected": {
                        "files": {
                                "collect_list": [
                                        {
                                                "file_path": "C:\\example\\_logs\\*.log",
                                                "log_group_name": "example-group/log-group",
                                                "log_stream_name": "{instance_id}"
                                        }
                                ]
                        }
                }
        },
        "metrics": {
                "namespace": "ExampleNamespace",
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "InstanceId": "${aws:InstanceId}"
                },
                "metrics_collected": {
                        "LogicalDisk": {
                                "measurement": [
                                        "% Free Space"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "Memory": {
                                "measurement": [
                                        "% Committed Bytes In Use"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "Paging File": {
                                "measurement": [
                                        "% Usage"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "PhysicalDisk": {
                                "measurement": [
                                        "% Disk Time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "Processor": {
                                "measurement": [
                                        "% User Time",
                                        "% Idle Time",
                                        "% Interrupt Time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "_Total"
                                ]
                        }
                }
        }
}
'@

& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:C:\asg-cloudwatch\amazon-cloudwatch-agent.json
</powershell>
<persist>true</persist>

I got onto the host and ran the following to see if it was running, and it was.

PS C:\Windows\system32> & 'C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1' -a status
{
  "status": "running",
  "starttime": "2022-05-25T17:09:56",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247351.0-25-g4f65be41"
}
PS C:\Windows\system32> cat 'C:\ProgramData\Amazon\AmazonCloudWatchAgent\Logs\amazon-cloudwatch-agent.log'
2022/05/25 17:09:56 I! 2022/05/25 17:09:56 D! [EC2] Found active network interface
I! Detected the instance is EC2
2022/05/25 17:09:56 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json ...
C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2022/05/25 17:09:56 Reading json config file path: C:\ProgramData\Amazon\AmazonCloudWatchAgent\Configs\file_amazon-cloudwatch-agent.json ...
2022/05/25 17:09:56 I! Valid Json input schema.
I! Trying to detect region from ec2
No csm configuration found.
No windows event log configuration found.
Configuration validation first phase succeeded

2022/05/25 17:09:56 I! Config has been translated into TOML C:\ProgramData\Amazon\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.toml
2022-05-25T17:09:57Z I! Starting AmazonCloudWatchAgent 1.247351.0-25-g4f65be41
2022-05-25T17:09:57Z I! AWS SDK log level not set
2022-05-25T17:09:57Z I! Loaded inputs: logfile win_perf_counters
2022-05-25T17:09:57Z I! Loaded aggregators:
2022-05-25T17:09:57Z I! Loaded processors: ec2tagger
2022-05-25T17:09:57Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-05-25T17:09:57Z I! Tags enabled: host=EC2AMAZ-CNN18AC
2022-05-25T17:09:57Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"EC2AMAZ-CNN18AC", Flush Interval:1s
2022-05-25T17:09:57Z I! [logagent] starting
2022-05-25T17:09:57Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-05-25T17:09:57Z I! [logagent] found plugin logfile is a log collection
2022-05-25T17:09:57Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-05-25T17:09:57Z I! cloudwatch: get unique roll up list []
2022-05-25T17:09:57Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 49s
2022-05-25T17:09:57Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-05-25T17:09:57Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
PS C:\Windows\system32>

@SaxyPandaBear
Copy link
Contributor

#473 accounts for this.

As far as the identified discrepancy between regional S3 endpoints with the "latest" version, it's mostly because we release in waves, caught an issue, halted, and rolled back to v350. As of 5/20, downloading and installing the "latest" CloudWatch agent should be pointing to v350, not v351.

Given that we rolled back the release, root caused this issue with what we were going to release and have the fix merged and primed for our next release, I'm going to be closing this.


Much appreciated on the prompt reach out to identify this issue, and the analysis done upfront for isolating it to Windows Server 2022.

I dug into it a little more and I think the root cause is a behavior change between the service that executes the user data script on Windows Server 2022 compared to prior OSes. Windows Server 2022 uses EC2Launch v2 by default, whereas previous OSes use v1.

@rbarbosa-inetum
Copy link

To solved the problem I did the steps below.

1 - Install CloudWatch agent bty SSM using this document https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Agent-on-EC2-Instance-fleet.html

2 - After install perform this command

& $Env:ProgramFiles\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1 -m ec2 -a start

& $Env:ProgramFiles\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1 -m ec2 -a status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working os/windows Windows
Projects
None yet
Development

No branches or pull requests

5 participants