Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Opensearch for learning resource API results whenever feasible #189

Open
mbertrand opened this issue Nov 6, 2023 · 4 comments
Open

Comments

@mbertrand
Copy link
Member

mbertrand commented Nov 6, 2023

Description/Context

Use opensearch for API results for lists when feasible (ie not for lists of learning paths requested by editors, perhaps other exceptions) and perhaps single object results as well.

Plan/Design

Option 1: OpenSearch endpoint for lists, database endpoints for individual objects and learning path CRUD

Continue using the database to return:

  • Published individual learning resource details (url: /api/v1/learning_resources/<pk>/, /api/v1/courses/<pk>/, etc)
  • Published child resources of a resource (url: /api/v1/learning_resources/<pk>/children/). Example: episodes in a podcast, resources in a learning path

Provide a new database-backed API endpoint - /api/v1/learning_path_admin/ - specifically for learning paths CRUD. This will return specific path details as well as lists of learning paths, both published and unpublished. Only users with editing permission will be allowed to access it.

Use opensearch for all read-only learning resource list endpoints (published only):

  • /api/v1/learning_resources/, /api/v1/courses/, /api/v1/programs/, etc
  • /api/v1/learning_resources/new/ (need to add created_on values to the index and sort by that)
  • api/v1/learning_resources/upcoming/ (need to build an opensearch query filtering by runs.published and (runs.start_date or runs.enrollment_start) then ordering by runs.start_date)

Separate views will be used for each:

  • learning_resources.views.LearningResourceViewSet(RetrieveModelMixin, viewsets.GenericViewSet) for individual objects and their children
  • learning_resources.views.LearningPathAdminViewSet(viewsets.ModelViewSet) for learning path CRUD
  • learning_resources_search.views.LearningResourcesSearchView(ESView) for lists of learning resources.

URL's would have to be loaded in the following order in open_discussions/urls.py:

    re_path(r"", include("learning_resources.urls")),
    re_path(r"", include("learning_resources_search.urls")),

Option 2: OpenSearch endpoint for lists and individual results, database for learning path editing only.

Similar to above, but use opensearch for individual object results and child resources of individual results

  • Would need to adjust serializers so that each child resource document in opensearch includes a parents field indicating the relation (and type of relation) to other resources. For example, /api/v1/podcasts/<id>/episodes could not rely on database relationships to get correct results, would need to query opensearch where parent_id=<id> and relation_type=podcast,

Separate views will be used for each:

  • learning_resources.views.LearningPathViewSet(viewsets.ModelViewSet) for learning path CRUD
  • learning_resources_search.views.LearningResourcesSearchView(ESView) for learning resources lists and individual objects

Option 3: Conditionally switch to an opensearch API view from a database API view, depending on parameters.

  @extend_schema(responses=LearningResourceSerializer(many=True))
  @action(methods=["GET"], detail=False, name="New Resources")
  def new(self, request: Request) -> QuerySet:  # noqa: ARG002
      if self.use_database():  # query param of db=True, or path editor requesting list of all paths
     
          page = self.paginate_queryset(self.get_queryset().order_by("-created_on"))
          serializer = self.get_serializer(page, many=True)
          return self.get_paginated_response(serializer.data)

      return LearningResourcesSearchView().get(request, sortby="-created_on")   
      # or insert the code from the view above here directly.

This does not seem like the best approach, for several reasons:

  • It mixes views from two different apps, which does not seem like good practice

  • Need to ensure that the different views always accept the same query parameters and apply filters consistently.

  • The views may have slightly different outputs. For example, the current search endpoint does not include next and previous page attributes, though maybe it should. And the current database-backed endpoint never includes this data structure:

    "metadata": {
        "aggregations": {},
        "suggest": []
    }
    

    Though again, the outputs could probably be made consistent, up to a point (if using database, can't get actual values for those metadata fields, other than empty ones like above).

@mbertrand mbertrand self-assigned this Nov 6, 2023
@abeglova
Copy link
Contributor

abeglova commented Nov 7, 2023

I would lean toward option 1 just for simplicity's sake. Also Option 1 is a subset of Option 2 and I agree that Option 3 is a bad idea for the reasons you describe. So we can start on implementing Option 1

@Ferdi
Copy link

Ferdi commented Nov 7, 2023

I suggest we continue to do what we are doing and revisit this issue later on if needed.

@mbertrand mbertrand removed their assignment Nov 7, 2023
@ChristopherChudzicki
Copy link
Contributor

If we do hold off on using Elasticsearch for the "catalog listing" APIs (/learning_resources/, /courses/, etc), I think we should at least consider removing some filtering options from the catalog listing APIs. Currently /courses? offered_by=... and /search?offered_by=... accept different values. The easiest way to be consistent would be to only allow filtering the search API. And this seems equally powerful, too.

@Ferdi
Copy link

Ferdi commented Nov 7, 2023

@ChristopherChudzicki I'm suggesting we leave both endpoints in place as they are and deal with inconsistencies in filters etc.
/search?offered_by would be used exclusively by search. /courses?offered_by would be used when accuracy of data is more important etc.

Currently /courses? offered_by=... and /search?offered_by=... accept different values

Why they accept different values ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants