Skip to content

Block delimiter: add new Scanner class. #44158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

jeherve
Copy link
Member

@jeherve jeherve commented Jul 1, 2025

Proposed changes:

This adds a new class to the package. See matching PRs:

  • 185532-ghe-Automattic/wpcom
  • 185783-ghe-Automattic/wpcom

The Block_Delimiter class introduced next_delimiter() and scan_delimiters(), which made it possible to parse the block structure in a document in a memory-efficient way. Unfortunately, fundamental choices for the interface, namely returning a new class instance on every block delimiter and relying on a generator function, limited the CPU performance fronteir of that class as a replacement for parse_blocks().

This new class introduces Block_Scanner, more directly-modeled after the HTML API and informed by refactors incorporating Block_Delimiter. This class mutates itself and requires a new instance before scanning. The tradeoff is that it’s much faster running while maintaining the same near-zero memory overhead.

A new class is introduced due to the scale of change in the interface and in order to provide seamless refactoring of code already relying on scan_delimiters().

Other information:

  • Have you written new tests for your changes, if applicable?
  • Have you checked the E2E test CI results, and verified that your changes do not break them?
  • Have you tested your changes on WordPress.com, if applicable (if so, you'll see a generated comment below with a script to run)?

Jetpack product discussion

Does this pull request change what data or activity we track or use?

  • No

Testing instructions:

  • Is CI happy?

dmsnell and others added 4 commits July 1, 2025 11:44
The `Block_Delimiter` class introduced `next_delimiter()` and
`scan_delimiters()`, which made it possible to parse the block structure in a document in a memory-efficient way. Unfortunately, fundamental
choices for the interface, namely returning a new class instance on
every block delimiter and relying on a generator function, limited the CPU performance fronteir of that class as a replacement for `parse_blocks()`.

This new class introduces `Block_Scanner`, more directly-modeled after the HTML API and informed by refactors incorporating `Block_Delimiter`.
This class mutates itself and requires a new instance before scanning.
The tradeoff is that it’s much faster running while maintaining the same near-zero memory overhead.

A new class is introduced due to the scale of change in the interface
and in order to provide seamless refactoring of code already relying
on `scan_delimiters()`.
@jeherve jeherve requested a review from Copilot July 1, 2025 10:22
@jeherve jeherve self-assigned this Jul 1, 2025
@jeherve jeherve added [Type] Enhancement Changes to an existing feature — removing, adding, or changing parts of it [Status] In Progress [Pri] Normal labels Jul 1, 2025
Copy link
Contributor

github-actions bot commented Jul 1, 2025

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WoA dev site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin (Jetpack), and enable the add/block-scanner-delimiter branch.
  • To test on Simple, run the following command on your sandbox:
bin/jetpack-downloader test jetpack add/block-scanner-delimiter

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

Copy link
Contributor

github-actions bot commented Jul 1, 2025

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • ✅ Add a "[Type]" label (Bug, Enhancement, Janitorial, Task).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new high-performance, mutable Block_Scanner class as a replacement for the legacy Block_Delimiter, updates tests to cover it, and adds stubs, documentation, and changelog entries to support it.

  • Add WP_HTML_Span stub and include it in test bootstrap
  • Implement Block_Scanner and extensive PHPUnit tests
  • Update README and changelog to describe and document the new scanner

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/stubs/class-wp-html-span.php Add stub for WP_HTML_Span
tests/php/bootstrap.php Load the new stub in test bootstrap
tests/php/Block_Scanner_Test.php New PHPUnit tests covering all scanner behaviors
src/class-block-scanner.php Implementation of Block_Scanner
changelog/add-block-scanner-delimiter Changelog entry for the added scanner
README.md Document new Block_Scanner and legacy Block_Delimiter
Comments suppressed due to low confidence (1)

projects/packages/block-delimiter/src/class-block-scanner.php:235

  • The docblock for next_delimiter() describes support for a $freeform_blocks parameter, but the implementation currently ignores it. Please update the documentation to note that freeform scanning is not yet implemented or implement the parameter behavior to match the docs.
	public function next_delimiter( string $freeform_blocks = 'skip' ): bool { // phpcs:ignore VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable

}

$json_span = substr( $this->source_text, $this->json_at, $this->json_length );
$parsed = json_decode( $json_span, null, 512, JSON_OBJECT_AS_ARRAY | JSON_INVALID_UTF8_SUBSTITUTE );
Copy link
Preview

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing null as the second argument to json_decode() relies on default behavior. Explicitly pass true (for associative arrays) to make the intent clearer and avoid ambiguity.

Suggested change
$parsed = json_decode( $json_span, null, 512, JSON_OBJECT_AS_ARRAY | JSON_INVALID_UTF8_SUBSTITUTE );
$parsed = json_decode( $json_span, true, 512, JSON_OBJECT_AS_ARRAY | JSON_INVALID_UTF8_SUBSTITUTE );

Copilot uses AI. Check for mistakes.

* not surrounded by block delimiters. Defaults to `skip`.
* @return bool Whether a block delimiter was matched.
*/
public function next_delimiter( string $freeform_blocks = 'skip' ): bool { // phpcs:ignore VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable
Copy link
Preview

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The next_delimiter() method is very large and handles multiple responsibilities. Consider extracting parts of its logic (e.g., scanning for comment boundaries, parsing JSON spans) into private helper methods to improve readability and maintainability.

Copilot uses AI. Check for mistakes.


The Block Delimiter package provides an efficient, streaming parser for working with WordPress block structure without the memory overhead of `parse_blocks()`. It's designed for scenarios where you need to inspect, find, or modify specific blocks without parsing the entire block tree.
The Block Delimiter package provides efficient, streaming parsers for working with WordPress block structure without the memory overhead of `parse_blocks()`. It's designed for scenarios where you need to inspect, find, or modify specific blocks without parsing the entire block tree.
Copy link
Preview

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Minor grammar: change “parsers” to singular “parser” since the sentence describes the package itself.

Suggested change
The Block Delimiter package provides efficient, streaming parsers for working with WordPress block structure without the memory overhead of `parse_blocks()`. It's designed for scenarios where you need to inspect, find, or modify specific blocks without parsing the entire block tree.
The Block Delimiter package provides an efficient, streaming parser for working with WordPress block structure without the memory overhead of `parse_blocks()`. It's designed for scenarios where you need to inspect, find, or modify specific blocks without parsing the entire block tree.

Copilot uses AI. Check for mistakes.

Copy link

jp-launch-control bot commented Jul 1, 2025

Code Coverage Summary

1 file is newly checked for coverage.

File Coverage
projects/packages/block-delimiter/src/class-block-scanner.php 161/229 (70.31%) 💚

Full summary · PHP report · JS report

@jeherve jeherve added [Status] Needs Review This PR is ready for review. and removed [Status] In Progress labels Jul 1, 2025
@jeherve jeherve requested review from kraftbj and dmsnell July 1, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs [Package] Block Delimiter [Pri] Normal [Status] Needs Review This PR is ready for review. [Tests] Includes Tests [Type] Enhancement Changes to an existing feature — removing, adding, or changing parts of it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants