[fromtree] bluetooth: fix bug when destroying tx buffers on disconnected #3026

alwa-nordic · 2025-07-07T11:18:27Z

Review guide:

The only purpose of this PR is to cherry-pick commit from upstream #90975.

So why are there 5 commits in this PR?

f9e37e5 Revert "[nrf noup] bluetooth: conn: Skip buffer ref count check in send_buf"
5394bcc Revert "[nrf noup] bluetooth: conn: Allow for an extra ref in bt_l2cap_send_pdu"
8622c2d [nrf fromtree] bluetooth: fix bug when destroying tx queue buffers on disconnect
2aabbe6 [nrf noup] bluetooth: conn: Allow for an extra ref in bt_l2cap_send_pdu
3f54299 [nrf noup] bluetooth: conn: Skip buffer ref count check in send_buf

The commit in the middle is the cherry-pick we want. The NCS CI required that the diff of all 'fromtree' is the same as the upstream commit. This means it's not possible to alter it, and also precludes a manual conflict resolution when applying that commit.

The upstream commit did not cherry-pick without conflict. It conflicts with some noup commits we have downstream. To resolve this will move the noup changes to after the cherry-pick, by reverting them and reapplying them, doing a manual conflict resolution in the noup changes.

So what do I need to review?

All reverted noup commits are reapplied and the merge is sensible.
That the fromtree cherry-pick is not a bad merge. The CI guarantees that the merge was automatic. But an automatic merge is still is a merge that may be broken.

Guide to the noup changes

[nrf noup] bluetooth: conn: Allow for an extra ref in bt_l2cap_send_pdu

git range-diff --creation-factor=100 685c53acebd503a6931b8c2811d8a1e74e0e4828^! 2aabbe60c70ab4f3f1a65bc3e9e7764322db2df0^!

The diff-diff is a bit unwieldly because of the comments in the code. I have cheated and removed all diff-diff to do with code comments, and get the below:

1:  685c53acebd5 ! 1:  2aabbe60c70a [nrf noup] bluetooth: conn: Allow for an extra ref in bt_l2cap_send_pdu
@@ Commit message

     Signed-off-by: Marek Pieta <Marek.Pieta@nordicsemi.no>
     (cherry picked from commit d6539f49a567b8539781356ecd03c749702b28ba)
+    (cherry picked from commit 685c53acebd503a6931b8c2811d8a1e74e0e4828)

  ## subsys/bluetooth/host/conn.c ##
 @@ subsys/bluetooth/host/conn.c: static int send_buf(struct bt_conn *conn, struct net_buf *buf,

  	uint16_t frag_len = MIN(conn_mtu(conn), len);

--	__ASSERT_NO_MSG(buf->ref == 1);
-+	if (buf->ref > 1 + (cb ? 1 : 0)) {
+-	__ASSERT_NO_MSG((buf->ref == 1) || (buf->ref == 2));
++	if (buf->ref > 2 + (cb ? 1 : 0)) {
 +		__ASSERT_NO_MSG(false);
 +	}

What you see there is that the previous version of this noup changes an assertion to an if-statement, adding + (cb ? 1 : 0) to what's already there. The new version of the noup does the same, but it converts the new || (buf->ref == 2) to just bumping up the number to 2. This should be equivalent.

[nrf noup] bluetooth: conn: Skip buffer ref count check in send_buf

git range-diff --creation-factor=100 827eb444b1947eb198d4dad28f40ab38e5d3f1eb^! 3f5429916746cc270c43fe6a4c45b78d43fcf5f8^!

Again, with the comment-stuff removed:

1:  827eb444b194 ! 1:  3f5429916746 [nrf noup] bluetooth: conn: Skip buffer ref count check in send_buf
@@ Commit message

     Signed-off-by: Marek Pieta <Marek.Pieta@nordicsemi.no>
     (cherry picked from commit e589307758ca108bd1d3f12501418e0d1d892f5c)
+    (cherry picked from commit 827eb444b1947eb198d4dad28f40ab38e5d3f1eb)

  ## subsys/bluetooth/host/conn.c ##
 @@ subsys/bluetooth/host/conn.c: static int send_buf(struct bt_conn *conn, struct net_buf *buf,

  	uint16_t frag_len = MIN(conn_mtu(conn), len);

--	if (buf->ref > 1 + (cb ? 1 : 0)) {
+-	if (buf->ref > 2 + (cb ? 1 : 0)) {
 -		__ASSERT_NO_MSG(false);
 -	}

-+	__ASSERT_NO_MSG(IS_ENABLED(CONFIG_BT_ATT_SENT_CB_AFTER_TX) || (buf->ref == 1));
++	__ASSERT_NO_MSG(IS_ENABLED(CONFIG_BT_ATT_SENT_CB_AFTER_TX) || (buf->ref == 1) ||
++			(buf->ref == 2));

This noup is removing the if-condition and replacing it with an assertion again, so we convert the > 2 back to a || (buf->ref == 2) in the assertion.

The validity of the cherry-pick itself

You can look at the upstream PR to understand the change and note any comments that could indicate potential problems.

It's worth noting that the upstream PR has a test that has not been cherry-picked in this PR. I did not put much though into this other than wanting to keep the cherry-pick small. Less is more. But maybe cherry-picking this test will increase our confidence for this fix?

…nd_buf" This reverts commit 827eb44. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>

…p_send_pdu" This reverts commit 685c53a. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>

… disconnect Channel tx_queue purging on disconnect was inconsistently handled by the different channels: iso, l2cap, l2cap_br. iso channels handled purging in the tx_data_pull hook. l2cap and l2cap_br did the purging in channel delete functions and did not expect tx_data_pull to be called for a disconnected channel. Their data_pull functions could return a ptr to a net_buf that was still on the tx_queue, which is problematic when the conn tx_processor unrefs the returned buffer resulting in multiple calls to the buf destroy function. To make things consistent and correct, remove the code that tries to purge tx_queues in the tx_processor and only do purging in the channels themselves when they are deleted/disconnected. Also refactor and clarify referencing of the net_buf returned by tx_data_pull. It was confusing who had a reference and when, which could vary depending on the length of the original buffer. There are three cases: the buffer length is less than the tx.mps, greater the mps but less than the mtu so requiring segementation but not fragmentation, or greater than both mps and mtu so requiring both segmentation and fragmentation. The conn layer would increase the refcnt if the length was greater than the mtu, but not have any awareness of whether the net_buf was still on the tx_queue or not. Now it is the tx_data_pull callbacks responsibitity to increment the reference count if it is returning a pointer to a net_buf that it is still keeping on the tx_queue for segmentation purposes. The conn layer will now always transfer that reference into a fragment view and not conditional it on the length relative to the mtu, and always decrement the reference to the parent when the fragment is destroyed. So there is no risk of decrementing a reference to a net buf that might still be on a tx_queue, which simplifies error handling in particular. Also add error handling paths for when asserts are not enabled. Signed-off-by: Mike J. Chen <mjchen@google.com> (cherry picked from commit a392c33c60286574b74fe708e1a182dd99666c8d)

Allow for an additional buffer reference if callback is provided. This can be used to extend lifetime of the net buffer until the data transmission is confirmed by ACK of the remote. Signed-off-by: Marek Pieta <Marek.Pieta@nordicsemi.no> (cherry picked from commit d6539f4) (cherry picked from commit 685c53a)

If ATT sent callback is delayed until data transmission is done by BLE controller, the transmitted buffer may have an additional reference. The reference is used to extend lifetime of the net buffer until the data transmission is confirmed by ACK of the remote. send_buf function can be called multiple times, if buffer has to be fragmented over HCI. In that case, the callback is provided as an argument only for the last transmitted fragment. The `buf->ref == 1` check is skipped because it's impossible to properly validate number of references for the sent fragments if buffers may have the additional reference. Jira: NCSDK-28624 Signed-off-by: Marek Pieta <Marek.Pieta@nordicsemi.no> (cherry picked from commit e589307) (cherry picked from commit 827eb44)

ludvigsj

Looks okay. I didn't do a detailed review of the fromtree commit itself, seeing as this has already been through a thorough review upstream. Handling of the noups looks like it's doing the correct thing as far as I can tell.

KyraLengfeld · 2025-07-10T09:31:02Z

subsys/bluetooth/host/l2cap.c

@@ -729,13 +729,17 @@ int bt_l2cap_send_pdu(struct bt_l2cap_le_chan *le_chan, struct net_buf *pdu,
 		return -ENOTCONN;
 	}

-	if (pdu->ref != 1) {
+	/* Allow for an additional buffer reference if callback is provided. This can be used to


I may have misunderstood your description, but could you elaborate a bit why make this change in this PR and what the change is fixing? While the change itself (increasing the reference count possible if there is a callback) does seem sensible, I am not finding why this is added here in particular, and also whether we have some test in place ensuring this doesn't cause resource issues in some other end?

Hi
The noups are related to NCS-specific (not accepted in upstream Zephyr) workaround for HID report flow over Bluetooth LE. This PR does not show the complete use-case for the noups. For more details, refer to:

The related Jira bug ticket: https://nordicsemi.atlassian.net/browse/NCSDK-27422
PRs introducing fixes to the NCS: nrfconnect/sdk-nrf#16027 and #1819

KyraLengfeld · 2025-07-10T09:31:06Z

subsys/bluetooth/host/conn.c

-	 * the only reference (e.g. buf was removed from the conn
-	 * tx_queue). It would be 2 if the tx_data_pull kept it on
-	 * the tx_queue for segmentation.
+	/* If ATT sent callback is delayed until data transmission


Nitpick: "letter count" (forgot proper word) in one line had been increased. So better to go with the new one. Probably you can just have two long lines and run clang-format.

KyraLengfeld · 2025-07-10T09:31:12Z

@alwa-nordic Have you looked more closely at the test you mention in the description? Because if it is relatively simple to cherry-pick, I'd say we should (we may have nooup changes in some other corner affecting the fix, or will have in the future, so better to have the test also downstream than not imo). Ofc, if the test needs to be amended or drags in lots of other changes, we can decide to rely on it being upstream only, guess we also want to cherry-pick this fix quick.

Add early_disconnect test cases for iso/cis and l2cap/stress to test the bluetooth stack handling of disconnects while transmit is still in progress. Signed-off-by: Mike J. Chen <mjchen@google.com> (cherry picked from commit 93997dab8eaa4777351d6d00df49695a35d41c76)

alwa-nordic · 2025-07-10T13:06:42Z

I cherry-picked the test.

MarekPieta · 2025-07-11T09:06:55Z

subsys/bluetooth/host/conn.c

-	if (buf->ref > 2 + (cb ? 1 : 0)) {
-		__ASSERT_NO_MSG(false);
-	}
+	__ASSERT_NO_MSG(IS_ENABLED(CONFIG_BT_ATT_SENT_CB_AFTER_TX) || (buf->ref == 1) ||


Can we squash the 2 noups? This one reverts part of the first noup

Discussed offline with @alwa-nordic . Will not squash in scope of this PR.

MarekPieta

Change looks good. Tested locally, no regression found

alwa-nordic and others added 5 commits July 7, 2025 12:40

Revert "[nrf noup] bluetooth: conn: Skip buffer ref count check in se…

f9e37e5

…nd_buf" This reverts commit 827eb44. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>

Revert "[nrf noup] bluetooth: conn: Allow for an extra ref in bt_l2ca…

5394bcc

…p_send_pdu" This reverts commit 685c53a. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>

NordicBuilder mentioned this pull request Jul 7, 2025

manifest: sdk-zephyr: [fromtree] bluetooth: fix bug when destroying tx buffers on disconnected nrfconnect/sdk-nrf#23156

Open

alwa-nordic requested review from MarekPieta and ludvigsj July 8, 2025 11:54

ludvigsj approved these changes Jul 10, 2025

View reviewed changes

KyraLengfeld self-requested a review July 10, 2025 08:48

KyraLengfeld reviewed Jul 10, 2025

View reviewed changes

MarekPieta reviewed Jul 11, 2025

View reviewed changes

MarekPieta approved these changes Jul 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fromtree] bluetooth: fix bug when destroying tx buffers on disconnected #3026

[fromtree] bluetooth: fix bug when destroying tx buffers on disconnected #3026

alwa-nordic commented Jul 7, 2025 •

edited

Loading

Uh oh!

ludvigsj left a comment

Uh oh!

KyraLengfeld Jul 10, 2025 •

edited

Loading

Uh oh!

MarekPieta Jul 11, 2025

Uh oh!

KyraLengfeld Jul 10, 2025 •

edited

Loading

Uh oh!

KyraLengfeld commented Jul 10, 2025

Uh oh!

alwa-nordic commented Jul 10, 2025

Uh oh!

MarekPieta Jul 11, 2025

Uh oh!

MarekPieta Jul 11, 2025

Uh oh!

MarekPieta left a comment

Uh oh!

Uh oh!

[fromtree] bluetooth: fix bug when destroying tx buffers on disconnected #3026

Are you sure you want to change the base?

[fromtree] bluetooth: fix bug when destroying tx buffers on disconnected #3026

Conversation

alwa-nordic commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review guide:

So why are there 5 commits in this PR?

So what do I need to review?

Guide to the noup changes

[nrf noup] bluetooth: conn: Allow for an extra ref in bt_l2cap_send_pdu

[nrf noup] bluetooth: conn: Skip buffer ref count check in send_buf

The validity of the cherry-pick itself

Uh oh!

ludvigsj left a comment

Choose a reason for hiding this comment

Uh oh!

KyraLengfeld Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarekPieta Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

KyraLengfeld Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KyraLengfeld commented Jul 10, 2025

Uh oh!

alwa-nordic commented Jul 10, 2025

Uh oh!

MarekPieta Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

MarekPieta Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

MarekPieta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alwa-nordic commented Jul 7, 2025 •

edited

Loading

KyraLengfeld Jul 10, 2025 •

edited

Loading

KyraLengfeld Jul 10, 2025 •

edited

Loading