Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve updatedAt, version, and keywords when searching through API #6300

Closed
tainguyenbui opened this issue Oct 21, 2019 · 7 comments · Fixed by #6441
Closed

Retrieve updatedAt, version, and keywords when searching through API #6300

tainguyenbui opened this issue Oct 21, 2019 · 7 comments · Fixed by #6441
Assignees

Comments

@tainguyenbui
Copy link
Contributor

tainguyenbui commented Oct 21, 2019

What we want to achieve
To retrieve the latest datasets information such as updatedAt, dataset version, description, authors, name, keywords and dataverse link through search without the need to perform extra requests. (We believe these heavy operations could be reduced.)

Current scenario:
When searching for files contained in the latest datasets, I am able to retrieve a list of results. However, it requires for an iteration through the results in order to get the dataset information based on doi for each of those results in order to retrieve some of the fields above.

Desired scenario:
No extra iterations are necessary


Current example of search output

{
                "name": "demo20190123a",
                "type": "dataset",
                "url": "https://doi.org/10.5072/FK2/H1DFMX",
                "global_id": "doi:10.5072/FK2/H1DFMX",
                "description": "demo",
                "published_at": "2019-01-23T16:56:36Z",
                "citationHtml": "Reilly, Grainne, 2019, \"demo20190123a\", <a href=\"https://doi.org/10.5072/FK2/H1DFMX\" target=\"_blank\">https://doi.org/10.5072/FK2/H1DFMX</a>, Demo Dataverse, V1, UNF:6:xR77FQWCFydmL55sRXuD+w== [fileUNF]",
                "identifier_of_dataverse": "demo",
                "name_of_dataverse": "Demo Dataverse",
                "citation": "Reilly, Grainne, 2019, \"demo20190123a\", https://doi.org/10.5072/FK2/H1DFMX, Demo Dataverse, V1, UNF:6:xR77FQWCFydmL55sRXuD+w== [fileUNF]",
                "entity_id": 301815,
                "authors": [
                    "Reilly, Grainne"
                ]
            }

An example of what it might be useful

{
        "id": 3066,
        "doi": "doi:10.5072/FK2/ML0NCM",
        "publisher": "Demo Dataverse",
        "storageIdentifier": "file://10.5072/FK2/ML0NCM",
        "versionId": 807,
        "versionState": "RELEASED",
        "createdAt": "2016-05-24T15:32:55Z",
        "updatedAt": "2016-05-24T15:35:07Z",
        "title": "Test GeoConnect",
        "authors": [
            "Admin, Dataverse"
        ],
        "contacts": [
            {
                "fullname": "Admin, Dataverse",
                "email": "admin@mailinator.com"
            }
        ],
        "description": {
            "text": "1"
        },
        "subject": [
            "Arts and Humanities"
        ],
        "keywords": [],
        "publication": [],
        "producer": {},
        "relatedMaterial": [],
        "dataSource": [],
        "geographicCoverage": [],
    }

Potentially, having the datasetId as part of the search result could also be beneficial.

What we are currently doing with some of the metadata blocks
We are mapping metadata blocks to a simpler structure in order to reduce complexity at the client side.

Thanks in advance! We are really enjoying working with Dataverse and collaborating with Dataverse developers

@djbrooke
Copy link
Contributor

Thanks @tainguyenbui for suggesting an approach for reducing the number of API requests. We'll take a look at this during sprint planning and may be back with questions.

@tainguyenbui tainguyenbui changed the title Retrieve dataset files, updatedAt, version, and more when searching Retrieve updatedAt, version, and keywords when searching through API Oct 22, 2019
@tainguyenbui
Copy link
Contributor Author

tainguyenbui commented Oct 22, 2019

I've removed the Files requirement because it could considerably increase the size of the payload and it is usually not that important as that request to retrieve the files information could be done later on, when a user clicks on a dataset. Perhaps, having the number of files could be an interesting property.

@djbrooke djbrooke self-assigned this Oct 22, 2019
@djbrooke
Copy link
Contributor

  • We would want to make sure we respect the setting that does not make available email addresses (consider making this the default if it's not already)
  • For performance concerns, we'd be OK taking the hit on the indexing side instead of going to the DB every time
  • We should take a look if there are other requests for adding additional fields

@djbrooke djbrooke removed their assignment Oct 23, 2019
@landreev landreev self-assigned this Nov 1, 2019
@pdurbin
Copy link
Member

pdurbin commented Nov 12, 2019

We should take a look if there are other requests for adding additional fields

We might want to take a look at this one: API search with subtree param :add more hierarchy information in result #6354

@tainguyenbui any interest in that? 😄

@tainguyenbui
Copy link
Contributor Author

@pdurbin that would be an interesting approach 🤔

Definitely interested 😜 although I am a bit worried about "very long list" of extra params requested 😬

@sekmiller sekmiller self-assigned this Nov 20, 2019
@sekmiller sekmiller self-assigned this Nov 21, 2019
@pdurbin
Copy link
Member

pdurbin commented Nov 22, 2019

We should take a look if there are other requests for adding additional fields

To fix #6396 we could add "draft" or "1.1" or whatever, the dataset version. Perhaps also optionally the id from the "datasetversion" table (perhaps triggered by setting "show_entity_ids" to true).

@pdurbin
Copy link
Member

pdurbin commented Dec 5, 2019

I just wanted to note that this issue came up while discussing #6396 during tech hours on Tuesday. It may or may not make sense to make a single pull request for both issues. Time will tell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants