Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.2.3 Breaks on Oauth Token Retrieval from cfapi #302

Closed
gaigaslab-operations opened this issue Oct 21, 2021 · 16 comments · Fixed by #303
Closed

Release 1.2.3 Breaks on Oauth Token Retrieval from cfapi #302

gaigaslab-operations opened this issue Oct 21, 2021 · 16 comments · Fixed by #303

Comments

@gaigaslab-operations
Copy link

We upgraded to Tile 1.2.3 from 1.2.2
Immediately, the splunk-nozzle-1.2.3 app started crashing after reporting that it could not get a token from the api endpoint - authorization error.

I don't have the logs right now, since we reverted to 1.2.2 to recover. I will reproduce on our sandbox foundation and provide the logs.

@gaigaslab-operations
Copy link
Author

gaigaslab-operations commented Oct 21, 2021

Logs of splunk-firehose-nozzle-1.2.3 after upgrading from 1.2.2 to 1.2.3

2021-10-21T14:01:47.753-04:00 [CELL/0] [OUT] Downloading droplet... 2021-10-21T14:01:47.760-04:00 [CELL/1] [OUT] Cell 615ce47c-66d0-4e88-a417-6e7ea554053d successfully created container for instance ea1a13fa-a03c-4c90-6502-1696 2021-10-21T14:01:47.962-04:00 [CELL/0] [OUT] Downloaded droplet (5.9M) 2021-10-21T14:01:48.113-04:00 [APP/PROC/WEB/0] [OUT] {"timestamp":"1634839308.113173723","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Running splunk-firehose-nozzle","log_level":1,"data":{}} 2021-10-21T14:01:48.113-04:00 [APP/PROC/WEB/0] [OUT] {"timestamp":"1634839308.113698244","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Apps are not being cached. When apps are not cached, the org and space caching TTL is ineffective","log_level":1,"data":{}} 2021-10-21T14:01:48.114-04:00 [APP/PROC/WEB/0] [OUT] {"timestamp":"1634839308.114690542","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Running splunk-firehose-nozzle with following configuration variables ","log_level":1,"data":{"add-app-info":"[\"AppName\", \"OrgName\", \"SpaceName\"]","add-tags":false,"api-endpoint":"https://api.****","app-cache-ttl":0,"app-limits":0,"batch-size":100,"boltdb-path":"cache.db","branch":"master","buildos":"Darwin","commit":"fcbda2da45cb659c677174b95b7a06c0afa81a96","debug":false,"drop-warn-threshold":1000,"extra-fields":"foundation:cfsb-1","flush-interval":2000000000,"hec-workers":8,"ignore-missing-apps":true,"job-host":"","job-index":"-1","job-name":"splunk-nozzle","keep-alive":25000000000,"mem-queue-monitor-interval":0,"missing-app-cache-ttl":0,"org-space-cache-ttl":32400000000000,"queue-size":10000,"retries":3,"skip-ssl-cf":false,"skip-ssl-splunk":false,"splunk-host":"https://cloud.us.humio.com:443/api/v1/ingest/hec","splunk-index":"cfsb-1","subscription-id":"splunk-firehose","trace-logging":false,"version":"","wanted-events":"[\"LogMessage\", \"Error\"]"}} 2021-10-21T14:01:48.129-04:00 [CELL/1] [OUT] Downloading droplet... 2021-10-21T14:01:48.330-04:00 [CELL/1] [OUT] Downloaded droplet (5.9M) 2021-10-21T14:01:48.395-04:00 [APP/PROC/WEB/0] [OUT] {"timestamp":"1634839308.395050049","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Retrieving apps from remote","log_level":1,"data":{}} 2021-10-21T14:01:48.495-04:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1634839308.495010138","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Running splunk-firehose-nozzle","log_level":1,"data":{}} 2021-10-21T14:01:48.496-04:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1634839308.495557308","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Apps are not being cached. When apps are not cached, the org and space caching TTL is ineffective","log_level":1,"data":{}} 2021-10-21T14:01:48.496-04:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1634839308.496112347","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Running splunk-firehose-nozzle with following configuration variables ","log_level":1,"data":{"add-app-info":"[\"AppName\", \"OrgName\", \"SpaceName\"]","add-tags":false,"api-endpoint":"https://api.****","app-cache-ttl":0,"app-limits":0,"batch-size":100,"boltdb-path":"cache.db","branch":"master","buildos":"Darwin","commit":"fcbda2da45cb659c677174b95b7a06c0afa81a96","debug":false,"drop-warn-threshold":1000,"extra-fields":"foundation:cfsb-1","flush-interval":2000000000,"hec-workers":8,"ignore-missing-apps":true,"job-host":"","job-index":"-1","job-name":"splunk-nozzle","keep-alive":25000000000,"mem-queue-monitor-interval":0,"missing-app-cache-ttl":0,"org-space-cache-ttl":32400000000000,"queue-size":10000,"retries":3,"skip-ssl-cf":false,"skip-ssl-splunk":false,"splunk-host":"https://cloud.us.humio.com:443/api/v1/ingest/hec","splunk-index":"cfsb-1","subscription-id":"splunk-firehose","trace-logging":false,"version":"","wanted-events":"[\"LogMessage\", \"Error\"]"}} 2021-10-21T14:01:48.749-04:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1634839308.748988390","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Retrieving apps from remote","log_level":1,"data":{}} 2021-10-21T14:01:58.751-04:00 [APP/PROC/WEB/0] [OUT] {"timestamp":"1634839318.751422882","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to open App Cache","log_level":2,"data":{}} 2021-10-21T14:01:58.817-04:00 [APP/PROC/WEB/0] [OUT] {"timestamp":"1634839318.817667723","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to run splunk-firehose-nozzle","log_level":2,"data":{"error":"Error requesting apps: Get \"https://api.****/v2/apps?inline-relations-depth=0\": oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"unauthorized\",\"error_description\":\"Bad credentials\"}"}} 2021-10-21T14:01:59.130-04:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1634839319.130104303","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to open App Cache","log_level":2,"data":{}} 2021-10-21T14:01:59.197-04:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1634839319.197067022","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to run splunk-firehose-nozzle","log_level":2,"data":{"error":"Error requesting apps: Get \"https://api.****/v2/apps?inline-relations-depth=0\": oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"unauthorized\",\"error_description\":\"Bad credentials\"}"}} 2021-10-21T14:02:08.770-04:00 [APP/PROC/WEB/0] [OUT] Exit status 0 2021-10-21T14:02:09.444-04:00 [APP/PROC/WEB/1] [OUT] Exit status 0 2021-10-21T14:02:13.946-04:00 [CELL/0] [OUT] Cell f5d6edfb-179d-49fa-99ef-895ee714d7ba stopping instance 1054312b-c536-4816-5fcd-ae25 2021-10-21T14:02:13.947-04:00 [CELL/0] [OUT] Cell f5d6edfb-179d-49fa-99ef-895ee714d7ba destroying container for instance 1054312b-c536-4816-5fcd-ae25 2021-10-21T14:02:13.953-04:00 [API/0] [OUT] Process has crashed with type: "web" 2021-10-21T14:02:13.961-04:00 [API/0] [OUT] App instance exited with guid 3dbfc916-94fd-4419-8d45-2b8891ef7b88 payload: {"instance"=>"1054312b-c536-4816-5fcd-ae25", "index"=>0, "cell_id"=>"f5d6edfb-179d-49fa-99ef-895ee714d7ba", "reason"=>"CRASHED", "exit_description"=>"Codependent step exited", "crash_count"=>2, "crash_timestamp"=>1634839333935809636, "version"=>"ca55885a-3c59-447a-bdd0-58a2211bd966"} 2021-10-21T14:02:13.966-04:00 [CELL/0] [OUT] Cell 0996735d-3a84-4f5e-b0bd-310cbd0ffbc4 creating container for instance 5cbdc801-d202-49aa-57a9-c815 2021-10-21T14:02:14.193-04:00 [PROXY/0] [OUT] Exit status 137 2021-10-21T14:02:14.226-04:00 [CELL/0] [OUT] Cell f5d6edfb-179d-49fa-99ef-895ee714d7ba successfully destroyed container for instance 1054312b-c536-4816-5fcd-ae25

@gaigaslab-operations
Copy link
Author

I have confirmed that the environment variables for CLIENT_ID and CLIENT_SECRET are the same on 1.2.2 (because it is working) and on 1.2.3

@xyloman
Copy link

xyloman commented Oct 21, 2021

I can confirm that we experienced the same behavior in our test environment after taking 1.2.3.

@gaigaslab-operations
Copy link
Author

I just removed all of 1.2.3, 1.2.2 and deployed a fresh instance of 1.2.3.

The problem is identical. This indicates that it is probably not due to the migration/upgrade changes.

I couldn't see anything in the release diffs that would cause this issue.

Thank you @xyloman for confirming.

@labrown
Copy link

labrown commented Oct 22, 2021

Perhaps an update in one of the libraries the app uses?

@JuergenSu
Copy link

HI, i can confirm we are on Nozzle release 1.2.3 and Tanzu Applicaiton Services 2.11.8 which is CC api 2.164.0 and the nozzle runs fine without this issue.

i know from an other user that they also dont experience this issue with TAS 2.11.X

@kashyap-splunk
Copy link
Collaborator

@gaigaslab-operations Thanks for reporting this. Can you please provide more details of the environment ? like TAS version, so that we can try to reproduce. (We had verified in TAS 2.11 and 2.7 LTS, and this issue was not there.)

@gaigaslab-operations
Copy link
Author

TAS 2.11.8 LTS

@gaigaslab-operations
Copy link
Author

Since the problem is about authorization, how is the app getting a token? Is it hitting uaa.?

@gaigaslab-operations
Copy link
Author

@kashyap-splunk @JuergenSu
Can you send over a redacted version of your environment for the splunk-firehose-nozzle-1.2.3? (cf env)

@JuergenSu
Copy link

Hi, this is our env

cf7 env splunk-firehose-nozzle-1.2.3
Getting env variables for app splunk-firehose-nozzle-1.2.3 in org Splunk-Nozzle-Org / space splunk-nozzle-space as [*****]...
System-Provided:
VCAP_SERVICES: {}

VCAP_APPLICATION: {
 "application_id": "6ac0ee8f-b8f3-4469-aa2a-9f087f667026",
 "application_name": "splunk-firehose-nozzle-1.2.3",
 "application_uris": [],
 "cf_api": "[API ENDPOINT]",
 "limits": {
  "fds": 16384
 },
 "name": "splunk-firehose-nozzle-1.2.3",
 "organization_id": "685cac63-fd0e-4a43-af77-fd7ca5939a46",
 "organization_name": "splunk-nozzle-org",
 "space_id": "1e1f6592-5bd8-46d8-b38e-2cb72c424066",
 "space_name": "splunk-nozzle-space",
 "uris": [],
 "users": null
}

User-Provided:
ADD_APP_INFO: ["AppName", "OrgName", "OrgGuid", "SpaceName", "SpaceGuid"]
ADD_TAGS: false
API_ENDPOINT: [*****]
APP_CACHE_INVALIDATE_TTL: 60s
APP_LIMITS: 5000
AUTHOR: Shubham Jain
CC_HOST: [*****]
CF_ORG: splunk-nozzle-org
CF_SKIP_SSL: false
CF_SPACE: splunk-nozzle-space
CF_TARGET: [*****]
CLIENT_ID: splunk-nozzle
CLIENT_SECRET: [*****]
CONSUMER_QUEUE_SIZE: 10000
DROP_WARN_THRESHOLD: 1000
ENABLE_EVENT_TRACING: true
EVENTS: ["HttpStartStop", "LogMessage", "Error", "ContainerMetric"]
EXTRA_FIELDS: longevity:testing
FIREHOSE_KEEP_ALIVE: 25s
FIREHOSE_SUBSCRIPTION_ID: splunk-firehose
FLUSH_INTERVAL: 5s
GOPACKAGENAME: main
HEC_BATCH_SIZE: 250
HEC_RETRIES: 5
HEC_WORKERS: 9
IGNORE_MISSING_APP: false
LOGIN_HOST: [*****]
MISSING_APP_CACHE_INVALIDATE_TTL: 300s
NOZZLE_MEMORY: 256M
ORG_SPACE_CACHE_INVALIDATE_TTL: 300s
ROOT: $HOME
SCALE_OUT_NOZZLE: 12
SCHEME: https
SECURITY_USER_NAME: [*****]
SECURITY_USER_PASSWORD: [*****]
SKIP_SSL_VALIDATION_CF: false
SKIP_SSL_VALIDATION_SPLUNK: false
SPLUNK_HOST: [*****]
SPLUNK_INDEX: [*****]
SPLUNK_TOKEN: [*****]
STATUS_MONITOR_INTERVAL: 0s
UAA_HOST: [*****]
VERIFY_SSL: false

Running Environment Variable Groups:
FOUNDATION: DEV

No staging env variables have been set

and its running fine with 2.11.8-build.13

@kashyap-splunk
Copy link
Collaborator

@gaigaslab-operations for auth, app uses the Client ID/secret provided to get token from uaa. Here, the Client ID and secret pair is generated using either 'uaac' or deployment manifest. (For ref: pivotal docs or Github docs)

And, I have attached the env details of the setup I used for testing.
PCF Splunk nozzle test environment.pdf

Can you try generating new client id/secret pair using above steps and use them in your test/dev setup ?

@gaigaslab-operations
Copy link
Author

@kashyap-splunk We had already created a uaa oauth client and were using it with 1.2.2. When we upgraded to 1.2.3, that's where the problem started. When we backlevel to 1.2.2, it works again.

The differences between my environment and that of @JuergenSu are as follows:

(my configuration on the left, @JuergenSu in parenthesis following)

ADD_APP_INFO: ["AppName", "OrgName", "SpaceName"] (["AppName", "OrgName", "OrgGuid", "SpaceName", "SpaceGuid"])
APP_CACHE_INVALIDATE_TTL: 0s (60s)
APP_LIMITS: 0 (5000)
ENABLE_EVENT_TRACING: false (true)
FIREHOSE_SUBSCRIPTION_ID: (splunk-firehose)
FLUSH_INTERVAL: 2s (5s)
HEC_BATCH_SIZE: 100 (250)
HEC_RETRIES: 3 (5)
HEC_WORKERS: 8 (9)
IGNORE_MISSING_APP: true (false)
MISSING_APP_CACHE_INVALIDATE_TTL: 0s (300s)
ORG_SPACE_CACHE_INVALIDATE_TTL: 72h (300s)

I modified my config to match, restaged, and still had the problem.

My go buildpack is:
go_buildpack-cached-cflinuxfs3-v1.9.37+1634589969.zip

I validated that the CLIENT_ID and CLIENT_SECRET are correct by retrieving a token through curl:

curl -d 'grant_type=client_credentials&client_id=humio-splunk-nozzle&client_secret=***&token_format=jwt&response_type=token' https://uaa.sys.****/oauth/token
{"access_token":"","token_type":"bearer","expires_in":43199,"scope":"cloud_controller.admin_read_only doppler.firehose","jti":"2914ccb3b97e4f29967b38b34e8db2e7"}

Now, I do have special characters in the secret such as *!#$&@^. Is it possible that these are problematic?

@gaigaslab-operations
Copy link
Author

gaigaslab-operations commented Oct 26, 2021

That was it!

I changed the CLIENT_SECRET on the uaac client and in the config (to match) and we can now start and run 1.2.3.

There must have been a change in the CLIENT_SECRET handling so that special characters are no longer handled correctly.

It would be good to get this fixed or documented. Any valid UAA client secret should be valid in the firehose-nozzle

@kashyap-splunk
Copy link
Collaborator

Yes, I am able to reproduce this now with special characters. And I verified it was not there in 1.2.2.

We have not changed anything related to special characters handling. But from initial analysis, it seems that this is due to an issue in the specific version of the Oauth2 library used by 'cfclient' library we have used for authentication. (which was automatically determined by Go mods during migration from Glide). I will dig deeper and update.

@luckyj5
Copy link
Collaborator

luckyj5 commented Dec 21, 2021

Fixed and available with 1.2.4 release

@luckyj5 luckyj5 closed this as completed Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants