Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 2255: create endpoint to fetch many components and services #2623

Merged
merged 2 commits into from
Jul 8, 2023

Conversation

nathan-mittelette
Copy link
Contributor

@nathan-mittelette nathan-mittelette commented Mar 25, 2023

Description

In this PR I create two new endpoints GET /api/v1/dependencyGraph/project/{uuid}/directDependencies and
GET /api/v1/dependencyGraph/component/{uuid}/directDependencies.

The purpose of this endpoint it's to avoid the frontend from doing multiple requests to the backend to fetch each node of the Dependency Graph. With this endpoint, you can get all directDependencies of a project or a component by providing his UUID.

Addressed Issue

fixes #2255

Additional Details

The frontend changes are available in this PR : DependencyTrack/frontend#455

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

@valentijnscholten
Copy link
Contributor

I think it's good to fix/improve this usecase, thanks for the PR.

I do think however there should be a way to specify a project as a parameter. Currently a list of uuids is required. For a project with 1000 components, this would be ~36kB of data being submitted to the server. And a 36kB SQL query being generated. Retrieving all components/services for a project at once is the primary usecase, so an efficient implementation of that should be present.

@nathan-mittelette
Copy link
Contributor Author

If I ever understand correctly, would you like me to make an endpoint that allows you to return all the services and components of a project at once?
The current behavior allows you to return only a specific list of components or services with the aim of loading only one layer at a time from the Dependency Graph. This was mainly to avoid projects that have a lot of components from having a big request each time the project was loaded.

However, if I ever understood correctly, I'm okay with implementing an endpoint that meets your request.

@valentijnscholten
Copy link
Contributor

I guess I meant all the direct dependencies, I think that's what currently being retrieved one-by-one?

@nathan-mittelette
Copy link
Contributor Author

Ok so if I pass the parent's UUID and return all the dependencies in one endpoint it's ok?
The main problem with that is that I will have to parse JSON contains in the column of DIRECT_DEPENDENCIES in the table COMPONENTS to get all child UUID. Is it okay to do that on the backend?

@nathan-mittelette nathan-mittelette force-pushed the issue_2255 branch 2 times, most recently from 49faee0 to f28850e Compare April 8, 2023 10:36
@nathan-mittelette
Copy link
Contributor Author

@valentijnscholten I updated the behavior.

Now, it's a GET request with UUID in the path. You can give a project or component UUID.
In response, you get all direct dependencies (component or service) of the project or component.

@nathan-mittelette nathan-mittelette force-pushed the issue_2255 branch 2 times, most recently from cf63d66 to a88c464 Compare April 22, 2023 10:27
@nathan-mittelette nathan-mittelette force-pushed the issue_2255 branch 2 times, most recently from 438a0a3 to 8c2dad3 Compare May 13, 2023 15:42
@nscuro
Copy link
Member

nscuro commented May 13, 2023

Thanks for the PR @nathan-mittelette!

Just wondering, did you test this with projects that were experiencing problems with the current implementation? How much of a difference does this make?

As the amount of data returned per component is not changed, I assume we're potentially still serving "too much" for the intended purpose of the new endpoints: Rendering the dependency graph. Perhaps we can make it perform even better by utilizing fetch groups or result classes (projections) to only query and serve the data required for the use case.

@nathan-mittelette
Copy link
Contributor Author

@nscuro

I just compare with the old implementation and it makes a huge difference. The fact that the old implementation does one request per child-dependency, you sometimes have a large number of requests sent to the backend. Even if the request is light, a backend can't handle easily a lot of requests at the same time. With the new implementation, it's really faster and light for the backend. It now does fewer requests to the database and only has one connection to handle.

I do some adjustments thanks to your feedback about fetch groups and projections. I choose to implement projections to have only one object that can handle both Component and ServiceComponent data.
I still have one little issue, for each Component I have to fetch the latest version available and I didn't find a way to do it without doing one request to the database per Component. The fact that it didn't have a logical link between RepositoryMetaData and Component made the task harder. Any idea to improve this part?

* @author Nathan Mittelette <mittelette.nathan@gmail.com>
* @since 4.9.0
*/
public final class DependencyGraphResponse implements Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can replace this class with a record. DataNucleus should be smart enough to use the record constructor when populating results.

@nscuro
Copy link
Member

nscuro commented May 26, 2023

That sounds great @nathan-mittelette!

RE: RepositoryMetaData, there's currently no way to get it in a single query via JOIN or similar. As you noted, its identity is decoupled from individual components. Querying it would require decomposition of the component's purl field into type, namespace, and name, which is not possible AFAICT.

A minor optimization could be to de-duplicate type-namespace-name pairs prior to querying for them. You could then query for RepositoryMetaComponents in batches. Pseudo-SQL:

SELECT "REPOSITORY_TYPE", "NAMESPACE", "NAME", "LATEST_VERSION" 
FROM "REPOSITORY_META_COMPONENT"
WHERE 
    ("REPOSITORY_TYPE" = :type1 AND "NAMESPACE" = :namespace1 AND "NAME" = :name1)
    OR ("REPOSITORY_TYPE" = :type2 AND "NAMESPACE" = :namespace2 AND "NAME" = :name2)
    OR ...

You could then correlate results from each batch with DependencyGraphResponse items and populate their latestVersion field accordingly.

By de-duplicating, you'd avoid querying the same metadata more than once, if the same component exists multiple times in a project, but with different versions.
By batching queries you achieve fewer DB-roundtrips. Just need to pay attention that queries don't get too large.

@nathan-mittelette
Copy link
Contributor Author

@nscuro

I updated the PR with some changes :

  • Convert DependencyGraphResponse to a record
  • Add a batch mechanism to fetch all RepositoryMetaComponent
  • Update the tests to also cover the RepositroyMetaComponent fetch

I also add a batchSize to 10 RepositoryMetaComponent to fetch at the same time. I don't have a strong opinion about this value.

Copy link
Contributor

@melba-lopez melba-lopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions @nathan-mittelette !!

@melba-lopez
Copy link
Contributor

@nscuro not sure why sonatype is flagging this lift as @nathan-mittelette actually sent command to sonatype to ignore this issue.

https://lift.sonatype.com/results/github.com/DependencyTrack/dependency-track/01H20ZVFRKZHN2DXCJJKGX52K9/?ph=PhaseIntroduced

@melba-lopez
Copy link
Contributor

@nathan-mittelette there's now a branch conflict. Can you fix?

Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, some final nitpicks left. Functionality-wise, I tested it and it works great!

@nathan-mittelette, could you please also update the description of this PR, so that users discovering it via our change logs get an accurate description of how the final implementation looks like?

@sonatype-lift
Copy link
Contributor

sonatype-lift bot commented Jul 3, 2023

🛠 Lift Auto-fix

Some of the Lift findings in this PR can be automatically fixed. You can download and apply these changes in your local project directory of your branch to review the suggestions before committing.1

# Download the patch
curl https://lift.sonatype.com/api/patch/github.com/DependencyTrack/dependency-track/2623.diff -o lift-autofixes.diff

# Apply the patch with git
git apply lift-autofixes.diff

# Review the changes
git diff

Want it all in a single command? Open a terminal in your project's directory and copy and paste the following command:

curl https://lift.sonatype.com/api/patch/github.com/DependencyTrack/dependency-track/2623.diff | git apply

Once you're satisfied, commit and push your changes in your project.

Footnotes

  1. You can preview the patch by opening the patch URL in the browser.

Signed-off-by: Nathan Mittelette <mittelette.nathan@gmail.com>
Signed-off-by: Nathan Mittelette <mittelette.nathan@gmail.com>
@melba-lopez
Copy link
Contributor

hi @nathan-mittelette. Can you check the 2 sonatype bugs identified

image

@nathan-mittelette
Copy link
Contributor Author

hi @nathan-mittelette. Can you check the 2 sonatype bugs identified

image

@melba-lopez
Hi, for me those two new issues are not relevant, there is no memory leak. I didn't close the ProjectQueryManager resource but the only unmanaged resource from this class is PersistenceManager. PersistenceManager is the same reference as QueryManager and QueryManager is closed at the end of try-with-resources, so all unmanaged resources are closed at the end of the endpoint.

This is what I explain under the sonartype comments

@sonatype-lift ignore

The resource ProjectQueryManager is not released, but it doesn't have to. The only critical resource of ProjectQueryManager is the PersistenceManager which is the same instance as QueryManager. So when QueryManager is close, all critical resources of ProjectQueryManager are closed. The QueryManager is close at the end of the try scope because it's a try-with-resources.

@nscuro
Copy link
Member

nscuro commented Jul 8, 2023

@melba-lopez @nathan-mittelette Yeah, this particular "finding" is popping up from time to time, but we can't ignore this particular case in Lift without ingoring all resource leak findings... No tool is perfect :)

Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nathan-mittelette, much appreciated!

@nscuro nscuro merged commit 51d513a into DependencyTrack:master Jul 8, 2023
7 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frontend creates a lot of requests for projects with high amount of components
4 participants