Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote hostnames that defaults to non-FQDNs are unreachable #5083

Closed
evenbrenden opened this issue Jun 9, 2021 · 18 comments
Closed

Remote hostnames that defaults to non-FQDNs are unreachable #5083

evenbrenden opened this issue Jun 9, 2021 · 18 comments
Assignees
Milestone

Comments

@evenbrenden
Copy link

evenbrenden commented Jun 9, 2021

Version Information
Version of Akka.NET?

1.4.21-beta1

Which Akka.NET Modules?

  • Akka.Cluster
  • Akka.Remote

Describe the bug

As part of a host migration we need to use fully qualified domain names for our seed nodes, i.e. changing from

akka.cluster.seed-nodes = [ "akka.tcp://Cluster@host1:1234", "akka.tcp://Oddjob@Cluster@host2:1234" ]

to

akka.cluster.seed-nodes = [ "akka.tcp://Cluster@host1.domain.com:1234", "akka.tcp://Oddjob@Cluster@host2.domain.com:1234" ]

host1:1234 and host2:1234 are our Lighthouse services, and are both configured with

akka.remote.dot-netty.tcp {    
  port = 1234    
  hostname = ""
}

In our case, akka.remote.dot-netty.tcp.hostname defaults to host1 and host2 for their respective nodes, i.e. not FQDNs. With this configuration, the seed nodes are not found and the cluster does not get up. Setting the hostnames to FQDNs host1.domain.com and host2.domain.com fixes this, but leaves us with a hardcoded set of hostnames and a separate configuration per node.

Expected behavior

My question is if this is expected behaviour (given a host that does not provide an FQDN for itself) and/or whether there exists a way to continue using hostname-agnostic configurations for akka.remote.dot-netty.tcp.

Actual behavior

An unreachable cluster. I am assuming that this also applies to any remote non-seed nodes too, i.e. a node on port 2345 is unreachable if hostname is not an FQDN.

To Reproduce

Unfortunately I do not have minimal example, but the changes described above should be sufficient to reproduce.

Environment

  • Windows Server 2016 Standard
  • .NET 5.0
@Aaronontheweb
Copy link
Member

Thanks @evenbrenden - this is exactly why we made 1.4.21-beta1 a beta. Looks to me like our new address parser must not be handling this case correctly. We'll get it fixed before 1.4.21 goes stable.

@Aaronontheweb
Copy link
Member

Actually... This doesn't look like a bug with Akka.NET:

akka.cluster.seed-nodes = [ "akka.tcp://Cluster@host1.domain.com:1234", "akka.tcp://Oddjob@Cluster@host2.domain.com:1234" ]

Oddjob@Cluster@ is not a valid Uri. I've added a reproduction for parsing FQDNs in v1.4.21-beta1 and they work fine, no problem. Our IPV4 address parsing would barf too if we couldn't support them.

Can you double check your HOCON and make sure that everything is in order there?

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue Jun 9, 2021
@object
Copy link
Contributor

object commented Jun 9, 2021

@Aaronontheweb I believe @evenbrenden made a type when writing the address in the issue. Here's what's changed in our seed-nodes configuration:

BEFORE:
seed-nodes = [ "akka.tcp://Oddjob@maodatest01:1963", "akka.tcp://Oddjob@maodatest02:1963" ]

AFTER:
seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]

@evenbrenden can you double-check the issue and correct the addresses if necessary.

@Aaronontheweb
Copy link
Member

ok, let me give that a shot in my reproduction.

Worth noting: the Address.Parse code we use to parse in these values for Akka.Cluster wasn't actually changed in v1.4.21. That still uses Uri.TryParse under the covers and isn't used in the Akka.Remote deserialization pipeline.

@Aaronontheweb
Copy link
Member

Yep, that line works just fine in my reproduction spec also.

@Aaronontheweb
Copy link
Member

@object @evenbrenden do you have some logs to go along with this error message? It's possible that the issue here can be a DNS reachability problem - in which case you might want to try:

akka.remote.dot-netty.tcp {    
  port = 1234    
  hostname = "0.0.0.0"
  public-hostname = "maodatest02.felles.ds.nrk.no"
}

Can you also check if maodatest02.felles.ds.nrk.no and maodatest02 resolve to the same IP address on:

  • The machine binding to it
  • The machine connecting to it

That can be another fun source of trouble for DNS issues sometimes. I've seen situations where the resolution is different depending on where it's performed (last time it was the result of a Kubernetes DNS caching error.)

Aaronontheweb added a commit that referenced this issue Jun 9, 2021
@evenbrenden
Copy link
Author

Can you double check your HOCON and make sure that everything is in order there?

Was trying to make a concise example, but it backfired :) Yes, that is a typo.

akka {
  remote.dot-netty.tcp {
    port = 1963    
    hostname = "0.0.0.0"
    public-hostname = "maodatest01.felles.ds.nrk.no"
  }
  cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}

This...partly works, most nodes are struggling. Can't be sure why, some associations followed by disassociations in the logs. But this certainly works:

akka {
  remote.dot-netty.tcp {    
    port = 1963    
    hostname = "maodatest01.felles.ds.nrk.no"
  }
  cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}

This does not work:

akka {
  remote.dot-netty.tcp {    
    port = 1963    
    hostname = "maodatest01"
  }
  cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}

And neither does this:

akka {
  remote.dot-netty.tcp {    
    port = 1963    
    hostname = "" # defaults to machine hostname
  }
  cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}

I believe - but can't really confirm - that the machine defaults to a hostname that is not fully qualified. Could that cause the nodes to be unreachable, even if the machines can actually reach each other without FQDNs?

Can you also check if maodatest02.felles.ds.nrk.no and maodatest02 resolve to the same IP address on:

* The machine binding to it

* The machine connecting to it

Can confirm that the IPs are the same for these two, in both directions.

I'll try to gather some logs, but at a first glance, there are AssociationErrors everywhere as expected.

@Aaronontheweb
Copy link
Member

Ah ok, I see what's going on here:

akka {
  remote.dot-netty.tcp {    
    port = 1963    
    hostname = "maodatest01"
  }
  cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}

This won't work because Akka.Remote has no idea that maodatest01.felles.ds.nrk.no and maodatest01 are the same address, thus you should see some "dropping address for non-local recipient" error messages come from Akka.Remote.

akka {
  remote.dot-netty.tcp {    
    port = 1963    
    hostname = "maodatest01.felles.ds.nrk.no"
  }
  cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}

This is the right way to do it - Akka.Remote knows that the sender and the recipient are supposed to be the same in this scenario.

@evenbrenden
Copy link
Author

So unless we can get the hostname = "" configuration to default to exactly what is listed as host in the seed-nodes, we'll need an explicit, separate hostname configuration per node, right? For hostname = "", what does Akka.NET call underneath the hood the get the machine host? I guess that might work on some machines/interfaces, others not.

@Aaronontheweb
Copy link
Member

Depends on the transport, but Akka.NET will call Dns.GetHostName() and pick the first item off the top of the list usually if that value is not set. But yes, Akka.NET wants all of the hostnames to match 1:1.

@object
Copy link
Contributor

object commented Jun 10, 2021

@Aaronontheweb so the only way to use FQDN is to explicitly set hostname (or public_hostname) in Akka HOCON. Which requires setting these values during deployment, something that would be great to avoid. Is it possible to make a convention that will hint Akka to set FQDN value to a hostname, for example:

In addition to
akka.remote.dot-netty.tcp {
port = 1234
hostname = "0.0.0.0"
public-hostname = "maodatest02.felles.ds.nrk.no"
}

also support
akka.remote.dot-netty.tcp {
port = 1234
hostname = "0.0.0.0"
public-hostname = "" # defaults to fully qualified machine domain name
}

I.e. if hostname is set zeros and public-hostname is set to an empty string, Akka will concatenate a DNC name with domain name.
If this sounds reasonable, we can make a PR.

What do you think, Aaron?

@object
Copy link
Contributor

object commented Jun 10, 2021

But I see that self Akka.NET doesn't call DNS. GetHostName. This must be happening inside Dotnetty which is abandoned (sigh). So perhaps this is more complicated than I first thought.

@ismaelhamed
Copy link
Member

@object just an idea, but you could read an environment variable with the correct FQDN and programmatically override the public-hostname during config loading.

@object
Copy link
Contributor

object commented Jun 10, 2021

@ismaelhamed Yes, that's probably the easiest in our case. Thanks for the tip.

@Aaronontheweb
Copy link
Member

Aaronontheweb commented Jun 10, 2021

@object just an idea, but you could read an environment variable with the correct FQDN and programmatically override the public-hostname during config loading.

This is what we do ourselves.

Edit: not just for FQDNs - any hostname binding when we're running in production environments.

@object
Copy link
Contributor

object commented Jun 10, 2021

Actually we are doing the same already. So it's just to modify our code slightly.

Then IMHO it will be difficult (and probably unreasonable) to intercept DotNetty settings, we should do it in the code. Should we close the issue @evenbrenden ?

@evenbrenden
Copy link
Author

I agree, as long as we need something else than a static config, we might as well handle it application-side.

Thanks @Aaronontheweb @object @ismaelhamed!

@Zetanova
Copy link
Contributor

To mix hostname and FQDN is already a multihomed setup
related Discussion: #4993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants