Skip to content
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.

Fix nslookup cannot work well in initContainerTemplate #216

Merged
merged 1 commit into from
Sep 10, 2019

Conversation

hougangliu
Copy link
Member

PytorchJob workers' initContainer always try to check if master pod is up by nslookup command, however nslookup in default image busybox:1.31.0 version seems too old that it cannot work well, its exit code is always 1 for ppc64le arch even it can parse master service dns, and for amd64, it cannot work steadily as below, when I change the image to alpine:3.10, both on amd64 and ppc64le, it works well

/ # nslookup katib-suggestion-hyperband
Server:         10.0.0.10
Address:        10.0.0.10:53

Name:   katib-suggestion-hyperband.kubeflow.svc.cluster.local
Address: 10.0.223.142

*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer
*** Can't find katib-suggestion-hyperband.kubeflow.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer

/ # echo $?
0
/ # nslookup katib-suggestion-hyperband
Server:         10.0.0.10
Address:        10.0.0.10:53

** server can't find katib-suggestion-hyperband.kubeflow.svc.cluster.local: NXDOMAIN

*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer
*** Can't find katib-suggestion-hyperband.kubeflow.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer

/ # echo $?
1
/ # nslookup katib-suggestion-hyperband
Server:         10.0.0.10
Address:        10.0.0.10:53

** server can't find katib-suggestion-hyperband.kubeflow.svc.cluster.local: NXDOMAIN

*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer
*** Can't find katib-suggestion-hyperband.kubeflow.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer

/ # echo $?
1
/ # nslookup katib-suggestion-hyperband
Server:         10.0.0.10
Address:        10.0.0.10:53

Name:   katib-suggestion-hyperband.kubeflow.svc.cluster.local
Address: 10.0.223.142

*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer
*** Can't find katib-suggestion-hyperband.kubeflow.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.svc.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.cluster.local: No answer
*** Can't find katib-suggestion-hyperband.fyre.ibm.com: No answer

/ # echo $?
0

@hougangliu
Copy link
Member Author

/cc @johnugeorge

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @johnugeorge

@coveralls
Copy link

Coverage Status

Coverage remained the same at 85.345% when pulling 7c7eeaf on hougangliu:fix-worker-init into c53647c on kubeflow:master.

@johnugeorge
Copy link
Member

Thanks @hougangliu
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit dfbbc18 into kubeflow:master Sep 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants