Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add orderByPriority Method for Custom Sorting in Query Builder V2 #52535

Draft
wants to merge 10 commits into
base: 11.x
Choose a base branch
from

Conversation

Mushood
Copy link

@Mushood Mushood commented Aug 20, 2024

Building upon the work done in PR #52483

Explanation:
This commit introduces the orderByPriority method to Laravel's query builder, enabling developers to sort query results based on a custom priority array. For example, you can now order records by specific IDs in a preferred sequence like this:

Model::whereIn('id', [2, 4, 6, 1, 3, 5])->orderByPriority('id', [2, 4, 6, 1, 3, 5])->get();

For MYSQL and Mariadb
This method leverages SQL's FIELD() function, allowing developers to easily display results in a user-defined order.

For Postgres, Sqlite and SQL server, the CASE() function was used to achieve the same result.

Compared to the previous PR, I have placed the logic in the corresponding grammar class as suggested to allow for flexibility of implementation depending on the database server.

Impact:
This feature greatly enhances the flexibility of data retrieval, particularly when a specific order of items is crucial for the application's logic or user interface.
We have to note that if the larger the array of priority, we might introduce performance issues

For reference:

* @param $direction
* @return string
*/
public function orderByPriority($column, array $priority, $direction = 'asc')
Copy link
Contributor

@staudenmeir staudenmeir Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this implementation: MariaDbGrammar extends MySqlGrammar and inherits the method from there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I have removed the code from MariaDbGrammar.

@staudenmeir
Copy link
Contributor

staudenmeir commented Aug 20, 2024

Please lowercase all the generated SQL: CASE WHEN => case when etc.

Please also add tests for Builder::orderByPriority(), especially the exceptions.

{
$placeholders = implode(',', array_fill(0, count($priority), '?'));

return "FIELD($column, $placeholders) $direction";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name needs to be wrapped with $this->wrap($column). This also affects the other grammars.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied the wrap function on all grammars and updated the tests accordingly.

/**
* Apply custom ordering to a query based on a priority array.
*
* @param $column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammars are missing the parameter types for $column and $direction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to add the type for column and direction

- apply wrap on columns
- lowercase SQL
- add new test for exceptions
- add test on builder for orderByPriority
@Mushood
Copy link
Author

Mushood commented Aug 21, 2024

@staudenmeir

Thank you for taking the time to review.

I have applied lowercase for all the generated SQL.
I have also added tests for exceptions and test for the function in the builder itself

@staudenmeir
Copy link
Contributor

staudenmeir commented Aug 21, 2024

I noticed that the two types of implementation behave differently with values that aren't part of the $priority array. If you use ->orderByPriority('id', ['c', 'b', 'a']) while the column also contains d and e, these records get treated differently depending on the database:

  • MySQL and MariaDB put them first because FIELD() returns 0 for unspecified values: 'd', 'e', 'c', 'b', 'a'
  • The CASE WHEN implementation puts them last because of the ELSE part: 'c', 'b', 'a', 'd', 'e'

I think it makes much more sense to put them last, but I don't see an elegant way to achieve this behavior on MySQL/MariaDB.

Not sure if that's an issue. You could argue that orderByPriority() shouldn't be used this way.

@ziadoz
Copy link
Contributor

ziadoz commented Aug 21, 2024

Would it be worth using ORDER BY array_position(ARRAY[...], field::text) in Postgres? https://www.crossingtheruby.com/2023/10/12/postgres-custom-sort-order

@Mushood
Copy link
Author

Mushood commented Aug 21, 2024

@staudenmeir
I concur with your observation

database present absent priority direction result
mysql/mariadb A B C D A B C asc D A B C
mysql/mariadb A B C D A B C desc C B A D
           
postgres/sqlite A B C D A B C asc D C B A
postgres/sqlite A B C D A B C desc A B C D

In order to make this field behavior consistent with the case behavior, it might make sense to change the "else" condition
$caseStatement = 'case '.implode(' ', $cases).' else 0 end';

However, I do agree that it does not feel right that these values come first.

@Mushood
Copy link
Author

Mushood commented Aug 21, 2024

Observation
I pondered on whether field was much better than case and by how much.

In a table of 5000 records and a priority array of 1000, CASE executed in 34 ms on average
Field instead took 23 ms on average

In the same table of 10000 records and a priority array of 1000, CASE executed in 35 ms on average
Field instead took 24 ms on average

As observed, FIELD is better but not by a significant amount. Of course, it might get significant depending on the query. I am leaning on using CASE only so as to have better consistency of behavior across all grammars.

Food for thought
I also wondered on whether if it would not be better to let the user decide on how he wants to deal with the values not found in priority by letting the user specify the default value to return in case the value was not found in the priority. This could lead to some interesting applications with this added flexibility.

For example, in the postgres grammar, I would set this instead

public function orderByPriority(string $column, array $priority, string $direction = 'asc', int $defaultIndex = 0)
{
    $column = $this->wrap($column);

    $cases = [];

    foreach ($priority as $index => $value) {
        $cases[] = "when {$column} = ? then {$index}";
    }

    $caseStatement = 'case '.implode(' ', $cases).' else '.$defaultIndex.' end';

    return "{$caseStatement} {$direction}";
}

@taylorotwell
Copy link
Member

I question if we really need this at the database layer? 😬

$ids = [9, 10, 7, 8, 4, 6, 5, 2, 3, 1];

return User::whereIn('id', $ids)
    ->get()
    ->sortBy(fn ($model) => array_search($model->id, $ids))
    ->values();

@taylorotwell taylorotwell marked this pull request as draft August 22, 2024 18:12
@taylorotwell
Copy link
Member

Drafting pending further discussion around inconsistencies around database drivers.

@bert-w
Copy link
Contributor

bert-w commented Aug 22, 2024

This kinda creates horrible queries for any query with over 10 records, but I think it would be good to actually provide a benchmark against a php-side sorting to see if it actually improves speed in a general sense (which I think it does).

Whenever sorting/aggregating/selecting can be done on the database side, it's usually a performance benefit.

@Mushood
Copy link
Author

Mushood commented Aug 23, 2024

@taylorotwell
I agree with sorting in php, if we are retrieving all the results.
However, in some cases, it might be needed before retrieval such as providing a paginated list.
I also think that when we think in terms of ids, it might be less intuitive.

Here is another example of a use case
`
$roles = ['moderator', 'admin', 'user'];

return User::orderByPriority('role', $roles)
->paginate(10);
`

@Mushood
Copy link
Author

Mushood commented Aug 23, 2024

@bert-w
I do agree that it can create horrible queries and if we are retrieving all results before ordering, I feel PHP ordering might have an edge.

However, lets assume i have a table with 1M+ records of users with a role column with the possible values :
'moderator'
'admin'
'user'

I want to display a list of users, with the order above. It might not make sense to load 1M+ records in memory before sorting.

Anyway I'll try to benchmark this and post the results here.

@ziadoz
Copy link
Contributor

ziadoz commented Aug 27, 2024

I can vouch for the performance. Using ORDER BY FIELD (…) and then $query->cursor() to iterate over the records is less memory intensive than sorting a collection, especially when you have lots of records to sort and lots of values to sort them by.

@Mushood
Copy link
Author

Mushood commented Aug 30, 2024

I have thought about this and my conclusion would be to use CASE even for mysql.
This would provide a consistent output, independent of database driver, for cases where the values is not present in the result set.

The behavior would be:

  1. Ascending: Values not in list would be at the end
  2. Descending: Values not in list would be first

Let me know what you think

If agreed, i will move the code back to builder itself instead of the corresponding builder grammar since it is the same implementation.

From my tests, I see that field is indeed faster, but marginally faster. I think consistency is more important.

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants