Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding checksums to Jetpack_JSON_API_Sync_Status_Endpoint #12209

Merged
merged 19 commits into from
May 13, 2019

Conversation

aldavigdis
Copy link
Contributor

@aldavigdis aldavigdis commented Apr 30, 2019

Changes proposed in this Pull Request:

This PR adds the following attributes to the JSON API Sync Status Endpoint:

  • posts_checksum
  • comments_checksum
  • post_meta_checksum
  • comment_meta_checksum

Those values are derived from Jetpack_Sync_WP_Replicastore::checksum_all().

Testing instructions:

Does the /sites/%s/sync/status endpoint return the attributes listed above?
Does it match with the schema provided in json-endpoints/jetpack/json-api-jetpack-endpoints.php:548?

Proposed changelog entry for your changes:

Posts, Comments, Post Meta and Comment Meta checksums are now provided via the Sync Status Endpoint.

@aldavigdis aldavigdis added [Status] Needs Review To request a review from Crew. Label will be renamed soon. [Package] Sync labels Apr 30, 2019
@aldavigdis aldavigdis requested a review from a team April 30, 2019 16:51
@aldavigdis aldavigdis requested a review from a team as a code owner April 30, 2019 16:51
@jetpackbot
Copy link

jetpackbot commented Apr 30, 2019

Thank you for the great PR description!

When this PR is ready for review, please apply the [Status] Needs Review label. If you are an a11n, please have someone from your team review the code if possible. The Jetpack team will also review this PR and merge it to be included in the next Jetpack release.

Scheduled Jetpack release: May 14, 2019.
Scheduled code freeze: May 7, 2019

Generated by 🚫 dangerJS against 550f88f

@aldavigdis
Copy link
Contributor Author

aldavigdis commented Apr 30, 2019

The initial attempt was to provide an array, but the output on the WPCOM side would always result in the output including (object) Array, which is not ideal. The way to solve that was to flatten the array into separate string values.

@roccotripaldi roccotripaldi added this to the 7.4 milestone Apr 30, 2019
roccotripaldi
roccotripaldi previously approved these changes Apr 30, 2019
Copy link
Member

@roccotripaldi roccotripaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. I was not able to test, because i'm not sure how to run requests to the sync/status endpoint.

@roccotripaldi
Copy link
Member

I was able to test after getting instructions from @enejb in Slack. My approval stands!

self::initialize_sender();

$sync_module = Jetpack_Sync_Modules::get_module( 'full-sync' );
$queue = self::$sender->get_sync_queue();
$full_queue = self::$sender->get_full_sync_queue();
$cron_timestamps = array_keys( _get_cron_array() );
$next_cron = $cron_timestamps[0] - time();
$store = new Jetpack_Sync_WP_Replicastore();
$checksums = $store->checksum_all();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is this method? Is there a situation where we would call this endpoint but not care about the checksums?

If the answers are: "pretty expensive: it's slow" and "yes, we sometimes don't care about the checksums", then we should write this in such a way that if ?fields= is set and does not include any of the checksum fields, we don't bother calculating the checksums.

If it's cheap or if we know we always want this information, it doesn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. AFAIK, the SQL queries may be rather expensive as they calculate the checksum for all posts, comments and post_meta records.

However, calling this endpoint is slow anyway, but I could try timing it before and after this change.

I'm going to follow-up on this within Poseidon to coordinate on a way forward here. Using ?fields= is possibly a good idea.

Copy link
Contributor Author

@aldavigdis aldavigdis May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the research and crunched the numbers. Note that this is made on my recent MBP development machine, using our Jetpack Docker image.

The relative overhead created by Jetpack_Sync_WP_Replicastore::checksum_all(), while high for smaller instances, becomes lower and lower the higher the number of posts is.

Initially, up to 200 posts (with increments of ca 30), the overhead added ranges between 30%-45%.

However, when we get to the tens of thousands, with increments of 5000-10000, the overhead becomes closer to 5%-15%.

The absolute range of this overhead in milliseconds ranged from 0.0032 ms to 0.0070 ms, so I really don't think this adds any significant overhead.

Looks at this fancy graph!

Delta  %  vs Number of posts(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that on the WPCOM side, I could have done a similar timing test, but I think my internet connection in combination with tunneling services that we use such as Pagekite and ngrok add too much of a response lag to make be able to trust such tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aldavigdis
I don’t understand how the as the numbers of posts increases the call with the non checksums also gets slower.
Do you have an explanation for that?
Is it also possible to remove any dependence on the network in this performance tests. Since that can have a lot of variance in there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was all done within the Jetpack_Sync_Actions::get_sync_status function, the diff is here:

diff --git a/sync/class.jetpack-sync-actions.php b/sync/class.jetpack-sync-actions.php
index 79c47b9c1..7f229342a 100644
--- a/sync/class.jetpack-sync-actions.php
+++ b/sync/class.jetpack-sync-actions.php
@@ -440,6 +440,7 @@ class Jetpack_Sync_Actions {
        }
 
        static function get_sync_status() {
+               $time_start = microtime( true );
                require_once JETPACK__PLUGIN_DIR . 'sync/class.jetpack-sync-wp-replicastore.php';
                self::initialize_sender();
 
@@ -451,7 +452,13 @@ class Jetpack_Sync_Actions {
                $store           = new Jetpack_Sync_WP_Replicastore();
                $checksums       = $store->checksum_all();
 
+               $time_taken_before_sync = microtime( true ) - $time_start;
+
                $full_sync_status = ( $sync_module ) ? $sync_module->get_status() : array();
+
+               $time_taken = microtime( true ) - $time_start;
+               l( wp_count_posts()->publish . ' ' . $time_taken . ' ' . $time_taken_before_sync );
+
                return array_merge(
                        $full_sync_status,
                        array(

If there is a better way to do this without too much hassle, then I'm all ears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice fancy graph!

Copy link
Contributor Author

@aldavigdis aldavigdis May 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I have done this as of ac1d2f8.

The checksums are not included and not crunched by the helper method unless those keys are specifically requested using the fields parameter.

Example:

/sites/site_id/sync/status?fields=posts_checksum,comments_checksum,post_meta_checksum,comment_meta_checksum

@kraftbj kraftbj added [Status] Needs Author Reply We would need you to make some changes or provide some more details about your PR. Thank you! [Type] Enhancement Changes to an existing feature — removing, adding, or changing parts of it and removed [Status] Needs Review To request a review from Crew. Label will be renamed soon. labels May 1, 2019
tyxla
tyxla previously approved these changes May 2, 2019
Copy link
Member

@tyxla tyxla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests well for me, code also looks good 👍

Left a non-blocking question, but am also curious to see how the checksums affect the performance.

json-endpoints/jetpack/json-api-jetpack-endpoints.php Outdated Show resolved Hide resolved
@aldavigdis aldavigdis dismissed stale reviews from tyxla and roccotripaldi via 319caae May 2, 2019 17:44
@kraftbj kraftbj added [Status] Needs Review To request a review from Crew. Label will be renamed soon. and removed [Status] Needs Author Reply We would need you to make some changes or provide some more details about your PR. Thank you! labels May 2, 2019
Copy link
Member

@tyxla tyxla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits, but this looks as good as before.

Would be nice to make the checksums optionally requested with a query argument or something as @mdawaffe suggested - if there are cases where they make things 50% slower, it's an improvement worthy to be considered.

json-endpoints/jetpack/json-api-jetpack-endpoints.php Outdated Show resolved Hide resolved
json-endpoints/jetpack/json-api-jetpack-endpoints.php Outdated Show resolved Hide resolved
Copy link
Member

@tyxla tyxla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still tests well for me 👍 Would be nice to add a test or update an existing one with the checksum fields.

I have a concern about casting the checksums for post/comment meta - that might lead to unexpected results if an error occurs during checksum calculation. Seems like we should account for that, just in case.

Also, I wasn't sure if we're planning to make the checksums optionally triggered by a query arg like @mdawaffe suggested - is that in the plans for this PR, @aldavigdis?

sync/class.jetpack-sync-actions.php Outdated Show resolved Hide resolved
sync/class.jetpack-sync-actions.php Outdated Show resolved Hide resolved
json-endpoints/jetpack/json-api-jetpack-endpoints.php Outdated Show resolved Hide resolved
@tyxla tyxla added [Status] Needs Author Reply We would need you to make some changes or provide some more details about your PR. Thank you! and removed [Status] Needs Review To request a review from Crew. Label will be renamed soon. labels May 7, 2019
sync/class.jetpack-sync-actions.php Outdated Show resolved Hide resolved
sync/class.jetpack-sync-actions.php Outdated Show resolved Hide resolved
sync/class.jetpack-sync-actions.php Outdated Show resolved Hide resolved
$checksums = array();

if ( is_null( $fields ) === false ) {
$fields_params = explode( ',', $fields );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the preparation (exploding, potentially trimming) happen in the endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really just a helper function. Messy but it works.

Copy link
Member

@jeherve jeherve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick note. As @tyxla suggested, I think it would be nice to extend our tests to cover this.

sync/class.jetpack-sync-actions.php Outdated Show resolved Hide resolved
@jeherve jeherve added [Status] Needs Author Reply We would need you to make some changes or provide some more details about your PR. Thank you! and removed [Status] Needs Review To request a review from Crew. Label will be renamed soon. labels May 10, 2019
Co-Authored-By: Jeremy Herve <jeremy@tagada.hu>
@aldavigdis
Copy link
Contributor Author

There don't seem to be any tests for Jetpack_JSON_API_Sync_Status_Endpoint specifically, so I think the most appropriate way to introduce a test here is to write or amend one for the Jetpack_Sync_Actions::get_sync_status helper function.

@@ -439,7 +439,7 @@ static function cleanup_on_upgrade( $new_version = null, $old_version = null ) {
}
}

static function get_sync_status() {
static function get_sync_status( $fields = null ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if fields was always an array. instead of sometimes an array and sometimes a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a helper function. Query params need to be strings anyway and I think this is the most straightforward way to handle things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also support both array and string (like many other WP core functions and methods do), but we should make sure that's properly supported by adding specific tests for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we don't need that because this is just a helper function used in a single JSON API response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we never really using anything but a string.

@lezama lezama requested a review from jeherve May 13, 2019 12:00
tests/php/sync/test_class.jetpack-sync-integration.php Outdated Show resolved Hide resolved
$this->assertArrayNotHasKey( 'post_meta_checksum', $no_checksum );
$this->assertArrayNotHasKey( 'comment_meta_checksum', $no_checksum );

$kitchen_sink_checksum = Jetpack_Sync_Actions::get_sync_status(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be a good way to split this test to separate cases (we're currently testing several cases in the same test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see nothing wrong with that.

$this->assertArrayHasKey( 'post_meta_checksum', $post_meta );
$this->assertArrayNotHasKey( 'comment_meta_checksum', $post_meta );

$comment_meta = Jetpack_Sync_Actions::get_sync_status(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some test cases for other sync status keys, other than the checksum fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In this case, it's just testing a specific concept and while the test is long, I just wish there was a simpler way to write it. :)

@@ -439,7 +439,7 @@ static function cleanup_on_upgrade( $new_version = null, $old_version = null ) {
}
}

static function get_sync_status() {
static function get_sync_status( $fields = null ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also support both array and string (like many other WP core functions and methods do), but we should make sure that's properly supported by adding specific tests for that.

@tyxla tyxla dismissed jeherve’s stale review May 13, 2019 13:48

Tests have been added

$this->assertArrayNotHasKey( 'post_meta_checksum', $no_checksum );
$this->assertArrayNotHasKey( 'comment_meta_checksum', $no_checksum );

$kitchen_sink_checksum = Jetpack_Sync_Actions::get_sync_status(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about one test per Jetpack_Sync_Actions::get_sync_status call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple assertions in a single test is something I've considered to be fine until now.

As stated before, it's just testing a single concept. I'd love to have a simpler way to write this while maintaining human-readability, but PHPUnit doesn't seem to allow it.

Copy link
Member

@tyxla tyxla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With or without moving the tests to separate methods, this tests well and LGTM 🚢

@tyxla tyxla added [Status] Ready to Merge Go ahead, you can push that green button! and removed [Status] Needs Author Reply We would need you to make some changes or provide some more details about your PR. Thank you! labels May 13, 2019
Copy link
Member

@jeherve jeherve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and tests well on my end. 👍

@aldavigdis aldavigdis merged commit 19f9d99 into master May 13, 2019
@matticbot matticbot added [Status] Needs Changelog and removed [Status] Ready to Merge Go ahead, you can push that green button! labels May 13, 2019
@jeherve jeherve deleted the add/site-audit-simple-checksum branch May 13, 2019 17:40
jeherve added a commit that referenced this pull request May 17, 2019
jeherve added a commit that referenced this pull request May 23, 2019
jeherve added a commit that referenced this pull request May 27, 2019
* Kick off the changelog

* Add 7.3.1

* Update date and post link

* changelog: add #12219

* changelog: add #12170

* changelog: add #12184

* Changelog: add #12268

* Changelog: add #12081

* Changelog: add #12323

* Changelog: add #12204

* Changelog: add #12269

* Changelog: add #12332

* changelog: add #12339

* changelog: add #12209

* Changelog: add #12319

* Changelog: add #12357

* Changelog: add #12124

* Changelog: add #12373

* Changelog: add #12252

* Changelog: add #12383

* Changelog: add #12372

* changelog: add #12337

* Changelog: add #12290

* Changelog: add #12301

* Changelog: add #12061

* Testing list: add instructions for #12061

* Changelog: add #12393

* Update minimum supported version

See #12287

* Changelog: add #12406

* Testing list: add #12406

* Changelog: add #12277

* Changelog: add #12412

* Changelog: add #11318

* Changelog: add #12328

* Changelog: add #12425

* Changelog: add #12380

* Changelog: add #12428

* Changelog: add #12414

* Changelog: add #12395

* Changelog & Testing list: add #12416, #12417, #12418, and #12348

* changelog: add #12379

* Changelog: add #12341

* changelog: add #12444

* Changelog: add #12434

* Changelog: add #12454

* Changelog: add #12460

* Changelog: add #12463

* Changelog: add #12457

* Changelog / testing list: add #10333

* Changelog: add #12467


Co-authored-by: Jeremy Herve <jeremy@jeremy.hu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Package] Sync [Type] Enhancement Changes to an existing feature — removing, adding, or changing parts of it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants