Stuck Write Message when there is a issue "multiple Read calls return no data or error" #426

anhtuanluu36 · 2020-04-09T05:36:17Z

I have a strange issue
Reproduce Steps:

I call w.WriteMessages(ctx, msgs...)
I have an error "multiple Read calls return no data or error"
I call w.WriteMessages(ctx, msgs...) and it is stuck here

Here is my code:

 kafka.NewWriter(kafka.WriterConfig{
				Brokers:  brokers,
				Topic:    topic,
				Balancer: &kafka.Hash{},
				Dialer: &kafka.Dialer{
					Timeout:   10 * time.Second,
					DualStack: true,
					TLS:       tlsConfig,
				},
			})

Sometimes I see this error on Kafka:

 [2020-04-09 14:05:40,353] WARN Failed to send SSL Close message  (org.apache.kafka.common.network.SslTransportLayer)
java.io.IOException: Broken pipe
	at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
	at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466)
	at org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:210)
	at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:173)
	at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:718)
	at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:61)
	at org.apache.kafka.common.network.Selector.doClose(Selector.java:748)
	at org.apache.kafka.common.network.Selector.close(Selector.java:736)
	at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:534)
	at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
	at kafka.network.Processor.poll(SocketServer.scala:665)
	at kafka.network.Processor.run(SocketServer.scala:582)
	at java.base/java.lang.Thread.run(Thread.java:834)

Do you have any ideas?

The text was updated successfully, but these errors were encountered:

anhtuanluu36 · 2020-04-10T06:58:18Z

I think it relate to use SSL and send a lot of messages at the same time. It works fine when I only push 5 messages with the method "w.WriteMessages"

pandaadb · 2020-08-19T07:37:02Z

What version are you on?

I ran into a recent problem where any error on writing caused the driver to retry all messages. However the driver also wrote all errors to a dedicated error channel. Since the channel was read by the main routine, the error was immediately returned, however the go routine retrying the write request on the next error was stuck waiting on the channel to be read (which would never happen, because the reading routine had returned). This caused the writing to hang forever.

orn1983 · 2020-08-25T20:17:22Z

The same thing just happened to me, I believe. I had a network fault where the leader data for Kafka was stale. Then I saw this error, and my writer thread seemed to hang forever.

2020-08-25 14:47:37.713482500 ERROR: writer.go:745: error writing messages to TopicRedacted (partition 0): [6] Not Leader For Partition: the client attempted to send messages to a replica that is not the leader for some partition, the client's metadata are likely out of date
2020-08-25 14:47:37.814596500 ERROR: writer.go:745: error writing messages to TopicRedacted (partition 0): multiple Read calls return no data or error
2020-08-25 14:47:38.214899500 ERROR: writer.go:745: error writing messages to TopicRedacted (partition 0): multiple Read calls return no data or error

Presumably I've encountered the same error as you have. I haven't been able to dig down into it just yet, but it looks like the same problem.

Edit: It was the same problem. Got fixed by updating.

achille-roussel · 2020-09-29T09:16:37Z

Hello, thank you all for reporting.

Would you be able to mention which version of kafka-go you're experiencing the issue with?

NitroCao · 2021-08-26T09:48:36Z

I also encountered this problem. The version of kafka-go is 0.4.17, the version of kafka is 2.13-2.8.0. I triggered the problem when writing plenty of data. My kafka doesn't config TLS.

achille-roussel · 2021-08-31T22:29:56Z

@pandaadb do you have more details to share about this issue? Is the channel you are referring to managed within kafka-go?

I ran into a recent problem where any error on writing caused the driver to retry all messages. However the driver also wrote all errors to a dedicated error channel. Since the channel was read by the main routine, the error was immediately returned, however the go routine retrying the write request on the next error was stuck waiting on the channel to be read (which would never happen, because the reading routine had returned). This caused the writing to hang forever.

vmihailenco · 2021-09-17T09:21:35Z

Hard to say if it is the same issue, but I get multiple Read calls return no data or error from batch.Close when closing a kafka.Batch with unread messages.

orn1983 · 2022-01-30T18:18:08Z

This is a recurring issue for me, though I am not able to reproduce it manually. It seems to always be the came cadence. Starts out with:

2022-01-29 16:47:30.393884500 ERROR: writer.go:745: error writing messages to TOPIC (partition N): [6] Not Leader For Partition: the client attempted to send messages to a replica that is not the leader for some partition, the client's metadata are likely out of date

Followed by:

2022-01-29 16:47:30.496844500 ERROR: writer.go:745: error writing messages to TOPIC (partition N): multiple Read calls return no data or error
2022-01-29 16:47:30.897083500 ERROR: writer.go:745: error writing messages to TOPIC (partition N): multiple Read calls return no data or error

There seems to have been a network fault and one of the zookeeper nodes lost connectivity to the cluster, to reconnect seconds later. This triggered a re-election and for some reason the client seems to be unaware of it and tries to write to a follower. Upon failing, it just gives up.

When I try to recreate this, the client always seems to just be aware of all of this and just recover. I can't figure out what's different.

Am I missing some obvious thing where I should be performing some manual action?

orn1983 · 2022-01-30T18:56:07Z

I see I wrote above (August 25th) that it got fixed by updating, but apparently I just wasn't able to recreate it.

achille-roussel · 2022-02-15T18:12:13Z

Network connectivity issues can manifest in complex ways, I think the work in #846 will help better surface the conditions that triggered those errors.

NehaSachan · 2023-01-09T05:34:15Z

i also got this same issue-
[6] Not Leader For Partition: the client attempted to send messages to a replica that is not the leader for some partition, the client's metadata are likely out of date },

github.com/segmentio/kafka-go v0.4.38

There is no load in the kafka consumer OR publisher as such.

anhtuanluu36 added the bug label Apr 9, 2020

anhtuanluu36 mentioned this issue Apr 9, 2020

Consumer group offset getting reset occationaly #410

Closed

abuchanan-nr mentioned this issue May 21, 2020

Producer stuck on TCP connection error #451

Closed

achille-roussel self-assigned this Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck Write Message when there is a issue "multiple Read calls return no data or error" #426

Stuck Write Message when there is a issue "multiple Read calls return no data or error" #426

anhtuanluu36 commented Apr 9, 2020 •

edited by achille-roussel

Loading

anhtuanluu36 commented Apr 10, 2020

pandaadb commented Aug 19, 2020

orn1983 commented Aug 25, 2020 •

edited by achille-roussel

Loading

achille-roussel commented Sep 29, 2020

NitroCao commented Aug 26, 2021 •

edited

Loading

achille-roussel commented Aug 31, 2021

vmihailenco commented Sep 17, 2021

orn1983 commented Jan 30, 2022 •

edited by achille-roussel

Loading

orn1983 commented Jan 30, 2022

achille-roussel commented Feb 15, 2022

NehaSachan commented Jan 9, 2023 •

edited

Loading

Stuck Write Message when there is a issue "multiple Read calls return no data or error" #426

Stuck Write Message when there is a issue "multiple Read calls return no data or error" #426

Comments

anhtuanluu36 commented Apr 9, 2020 • edited by achille-roussel Loading

anhtuanluu36 commented Apr 10, 2020

pandaadb commented Aug 19, 2020

orn1983 commented Aug 25, 2020 • edited by achille-roussel Loading

achille-roussel commented Sep 29, 2020

NitroCao commented Aug 26, 2021 • edited Loading

achille-roussel commented Aug 31, 2021

vmihailenco commented Sep 17, 2021

orn1983 commented Jan 30, 2022 • edited by achille-roussel Loading

orn1983 commented Jan 30, 2022

achille-roussel commented Feb 15, 2022

NehaSachan commented Jan 9, 2023 • edited Loading

anhtuanluu36 commented Apr 9, 2020 •

edited by achille-roussel

Loading

orn1983 commented Aug 25, 2020 •

edited by achille-roussel

Loading

NitroCao commented Aug 26, 2021 •

edited

Loading

orn1983 commented Jan 30, 2022 •

edited by achille-roussel

Loading

NehaSachan commented Jan 9, 2023 •

edited

Loading