Use Opensearch for learning resource API results whenever feasible #189

mbertrand · 2023-11-06T16:19:31Z

Description/Context

Use opensearch for API results for lists when feasible (ie not for lists of learning paths requested by editors, perhaps other exceptions) and perhaps single object results as well.

Plan/Design

Option 1: OpenSearch endpoint for lists, database endpoints for individual objects and learning path CRUD

Continue using the database to return:

Published individual learning resource details (url: /api/v1/learning_resources/<pk>/, /api/v1/courses/<pk>/, etc)
Published child resources of a resource (url: /api/v1/learning_resources/<pk>/children/). Example: episodes in a podcast, resources in a learning path

Provide a new database-backed API endpoint - /api/v1/learning_path_admin/ - specifically for learning paths CRUD. This will return specific path details as well as lists of learning paths, both published and unpublished. Only users with editing permission will be allowed to access it.

Use opensearch for all read-only learning resource list endpoints (published only):

/api/v1/learning_resources/, /api/v1/courses/, /api/v1/programs/, etc
/api/v1/learning_resources/new/ (need to add created_on values to the index and sort by that)
api/v1/learning_resources/upcoming/ (need to build an opensearch query filtering by runs.published and (runs.start_date or runs.enrollment_start) then ordering by runs.start_date)

Separate views will be used for each:

learning_resources.views.LearningResourceViewSet(RetrieveModelMixin, viewsets.GenericViewSet) for individual objects and their children
learning_resources.views.LearningPathAdminViewSet(viewsets.ModelViewSet) for learning path CRUD
learning_resources_search.views.LearningResourcesSearchView(ESView) for lists of learning resources.

URL's would have to be loaded in the following order in open_discussions/urls.py:

    re_path(r"", include("learning_resources.urls")),
    re_path(r"", include("learning_resources_search.urls")),

Option 2: OpenSearch endpoint for lists and individual results, database for learning path editing only.

Similar to above, but use opensearch for individual object results and child resources of individual results

Would need to adjust serializers so that each child resource document in opensearch includes a parents field indicating the relation (and type of relation) to other resources. For example, /api/v1/podcasts/<id>/episodes could not rely on database relationships to get correct results, would need to query opensearch where parent_id=<id> and relation_type=podcast,

Separate views will be used for each:

learning_resources.views.LearningPathViewSet(viewsets.ModelViewSet) for learning path CRUD
learning_resources_search.views.LearningResourcesSearchView(ESView) for learning resources lists and individual objects

Option 3: Conditionally switch to an opensearch API view from a database API view, depending on parameters.

  @extend_schema(responses=LearningResourceSerializer(many=True))
  @action(methods=["GET"], detail=False, name="New Resources")
  def new(self, request: Request) -> QuerySet:  # noqa: ARG002
      if self.use_database():  # query param of db=True, or path editor requesting list of all paths
     
          page = self.paginate_queryset(self.get_queryset().order_by("-created_on"))
          serializer = self.get_serializer(page, many=True)
          return self.get_paginated_response(serializer.data)

      return LearningResourcesSearchView().get(request, sortby="-created_on")   
      # or insert the code from the view above here directly.

This does not seem like the best approach, for several reasons:

It mixes views from two different apps, which does not seem like good practice
Need to ensure that the different views always accept the same query parameters and apply filters consistently.
The views may have slightly different outputs. For example, the current search endpoint does not include next and previous page attributes, though maybe it should. And the current database-backed endpoint never includes this data structure:
```
"metadata": {
    "aggregations": {},
    "suggest": []
}
```
Though again, the outputs could probably be made consistent, up to a point (if using database, can't get actual values for those metadata fields, other than empty ones like above).

The text was updated successfully, but these errors were encountered:

abeglova · 2023-11-07T15:26:06Z

I would lean toward option 1 just for simplicity's sake. Also Option 1 is a subset of Option 2 and I agree that Option 3 is a bad idea for the reasons you describe. So we can start on implementing Option 1

Ferdi · 2023-11-07T15:37:14Z

I suggest we continue to do what we are doing and revisit this issue later on if needed.

ChristopherChudzicki · 2023-11-07T19:29:58Z

If we do hold off on using Elasticsearch for the "catalog listing" APIs (/learning_resources/, /courses/, etc), I think we should at least consider removing some filtering options from the catalog listing APIs. Currently /courses? offered_by=... and /search?offered_by=... accept different values. The easiest way to be consistent would be to only allow filtering the search API. And this seems equally powerful, too.

Ferdi · 2023-11-07T20:13:51Z

@ChristopherChudzicki I'm suggesting we leave both endpoints in place as they are and deal with inconsistencies in filters etc.
/search?offered_by would be used exclusively by search. /courses?offered_by would be used when accuracy of data is more important etc.

Currently /courses? offered_by=... and /search?offered_by=... accept different values

Why they accept different values ?

mbertrand self-assigned this Nov 6, 2023

mbertrand removed their assignment Nov 7, 2023

mbertrand mentioned this issue Nov 20, 2023

Platform/offered_by API filtering bugs/inconsistencies #237

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Opensearch for learning resource API results whenever feasible #189

Use Opensearch for learning resource API results whenever feasible #189

mbertrand commented Nov 6, 2023 •

edited

Loading

abeglova commented Nov 7, 2023

Ferdi commented Nov 7, 2023

ChristopherChudzicki commented Nov 7, 2023

Ferdi commented Nov 7, 2023

Use Opensearch for learning resource API results whenever feasible #189

Use Opensearch for learning resource API results whenever feasible #189

Comments

mbertrand commented Nov 6, 2023 • edited Loading

Description/Context

Plan/Design

Option 1: OpenSearch endpoint for lists, database endpoints for individual objects and learning path CRUD

Option 2: OpenSearch endpoint for lists and individual results, database for learning path editing only.

Option 3: Conditionally switch to an opensearch API view from a database API view, depending on parameters.

abeglova commented Nov 7, 2023

Ferdi commented Nov 7, 2023

ChristopherChudzicki commented Nov 7, 2023

Ferdi commented Nov 7, 2023

mbertrand commented Nov 6, 2023 •

edited

Loading