Skip to content

Commit

Permalink
Merge remote-tracking branch 'IQSS/develop' into DANS-external_exporters
Browse files Browse the repository at this point in the history
  • Loading branch information
qqmyers committed Jul 5, 2023
2 parents fc710e2 + 6053fa3 commit 55ec209
Show file tree
Hide file tree
Showing 140 changed files with 5,312 additions and 1,914 deletions.
21 changes: 12 additions & 9 deletions .github/workflows/container_app_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,36 +58,39 @@ jobs:
echo "IMAGE_TAG=$(echo "${{ github.event.client_payload.pull_request.head.ref }}" | tr '\\/_:&+,;#*' '-')" >> $GITHUB_ENV
# Necessary to split as otherwise the submodules are not available (deploy skips install)
- name: Build app container image with local architecture and submodules (profile will skip tests)
- name: Build app and configbaker container image with local architecture and submodules (profile will skip tests)
run: >
mvn -B -f modules/dataverse-parent
-P ct -pl edu.harvard.iq:dataverse -am
install
- name: Deploy multi-arch application container image
- name: Deploy multi-arch application and configbaker container image
run: >
mvn
-Dapp.image.tag=${{ env.IMAGE_TAG }} -Dbase.image.tag=${{ env.BASE_IMAGE_TAG }}
${{ env.REGISTRY }} -Ddocker.platforms=${{ env.PLATFORMS }}
-P ct deploy
-Ddocker.registry=ghcr.io -Ddocker.platforms=${{ env.PLATFORMS }}
-Pct deploy
- uses: marocchino/sticky-pull-request-comment@v2
with:
header: app-registry-push
header: registry-push
hide_and_recreate: true
hide_classify: "OUTDATED"
number: ${{ github.event.client_payload.pull_request.number }}
message: |
:package: Pushed preview application image as
:package: Pushed preview images as
```
ghcr.io/gdcc/dataverse:${{ env.IMAGE_TAG }}
```
:ship: [See on GHCR](https://github.com/orgs/gdcc/packages/container/package/dataverse). Use by referencing with full name as printed above, mind the registry name.
```
ghcr.io/gdcc/configbaker:${{ env.IMAGE_TAG }}
```
:ship: [See on GHCR](https://github.com/orgs/gdcc/packages/container). Use by referencing with full name as printed above, mind the registry name.
# Leave a note when things have gone sideways
- uses: peter-evans/create-or-update-comment@v3
if: ${{ failure() }}
with:
issue-number: ${{ github.event.client_payload.pull_request.number }}
body: >
:package: Could not push preview image :disappointed:.
See [log](https://github.com/IQSS/dataverse/actions/runs/${{ github.run_id }}) for details.
:package: Could not push preview images :disappointed:.
See [log](https://github.com/IQSS/dataverse/actions/runs/${{ github.run_id }}) for details.
25 changes: 18 additions & 7 deletions .github/workflows/container_app_push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ on:
- master
paths:
- 'src/main/docker/**'
- 'modules/container-configbaker/**'
- '.github/workflows/container_app_push.yml'

env:
Expand Down Expand Up @@ -42,7 +43,7 @@ jobs:
distribution: temurin
cache: maven

- name: Build app container image with local architecture and submodules (profile will skip tests)
- name: Build app and configbaker container image with local architecture and submodules (profile will skip tests)
run: >
mvn -B -f modules/dataverse-parent
-P ct -pl edu.harvard.iq:dataverse -am
Expand All @@ -52,7 +53,7 @@ jobs:

hub-description:
needs: build
name: Push image description to Docker Hub
name: Push image descriptions to Docker Hub
# Run this when triggered via push or schedule as reused workflow from base / maven unit tests.
# Excluding PRs here means we will have no trouble with secrets access. Also avoid runs in forks.
if: ${{ github.event_name != 'pull_request' && github.ref_name == 'develop' && github.repository_owner == 'IQSS' }}
Expand All @@ -66,6 +67,13 @@ jobs:
repository: gdcc/dataverse
short-description: "Dataverse Application Container Image providing the executable"
readme-filepath: ./src/main/docker/README.md
- uses: peter-evans/dockerhub-description@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
repository: gdcc/configbaker
short-description: "Dataverse Config Baker Container Image providing setup tooling and more"
readme-filepath: ./modules/container-configbaker/README.md

# Note: Accessing, pushing tags etc. to DockerHub or GHCR will only succeed in upstream because secrets.
# We check for them here and subsequent jobs can rely on this to decide if they shall run.
Expand Down Expand Up @@ -130,12 +138,12 @@ jobs:
echo "REGISTRY='-Ddocker.registry=ghcr.io'" >> $GITHUB_ENV
# Necessary to split as otherwise the submodules are not available (deploy skips install)
- name: Build app container image with local architecture and submodules (profile will skip tests)
- name: Build app and configbaker container image with local architecture and submodules (profile will skip tests)
run: >
mvn -B -f modules/dataverse-parent
-P ct -pl edu.harvard.iq:dataverse -am
install
- name: Deploy multi-arch application container image
- name: Deploy multi-arch application and configbaker container image
run: >
mvn
-Dapp.image.tag=${{ env.IMAGE_TAG }} -Dbase.image.tag=${{ env.BASE_IMAGE_TAG }}
Expand All @@ -145,12 +153,15 @@ jobs:
- uses: marocchino/sticky-pull-request-comment@v2
if: ${{ github.event_name == 'pull_request' }}
with:
header: app-registry-push
header: registry-push
hide_and_recreate: true
hide_classify: "OUTDATED"
message: |
:package: Pushed preview application image as
:package: Pushed preview images as
```
ghcr.io/gdcc/dataverse:${{ env.IMAGE_TAG }}
```
:ship: [See on GHCR](https://github.com/orgs/gdcc/packages/container/package/dataverse). Use by referencing with full name as printed above, mind the registry name.
```
ghcr.io/gdcc/configbaker:${{ env.IMAGE_TAG }}
```
:ship: [See on GHCR](https://github.com/orgs/gdcc/packages/container). Use by referencing with full name as printed above, mind the registry name.
35 changes: 29 additions & 6 deletions .github/workflows/shellcheck.yml
Original file line number Diff line number Diff line change
@@ -1,24 +1,47 @@
name: "Shellcheck"
on:
push:
branches:
- develop
paths:
- conf/solr/**
- modules/container-base/**
- conf/solr/**/.sh
- modules/container-base/**/*.sh
- modules/container-configbaker/**/*.sh
pull_request:
branches:
- develop
paths:
- conf/solr/**
- modules/container-base/**
- conf/solr/**/*.sh
- modules/container-base/**/*.sh
- modules/container-configbaker/**/*.sh
jobs:
shellcheck:
name: Shellcheck
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: shellcheck
uses: reviewdog/action-shellcheck@v1
with:
github_token: ${{ secrets.github_token }}
reporter: github-pr-review # Change reporter.
fail_on_error: true
# Container base image uses dumb-init shebang, so nail to using bash
shellcheck_flags: "--shell=bash --external-sources"
shellcheck_flags: "--shell=bash --external-sources"
# Exclude old scripts
exclude: |
*/.git/*
conf/docker-aio/*
doc/*
downloads/*
scripts/database/*
scripts/globalid/*
scripts/icons/*
scripts/installer/*
scripts/issues/*
scripts/r/*
scripts/tests/*
scripts/vagrant/*
tests/*
2 changes: 2 additions & 0 deletions conf/solr/8.11.1/update-fields.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

set -euo pipefail

# [INFO]: Update a prepared Solr schema.xml for Dataverse with a given list of metadata fields

#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
# This script will
# 1. take a file (or read it from STDIN) with all <field> and <copyField> definitions
Expand Down
3 changes: 3 additions & 0 deletions doc/release-notes/6542-mdc-legacy-counts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
###For installations using MDC (Make Data Count), it is now possible to display both the MDC metrics and the legacy access counts, generated before MDC was enabled.

This is enabled via the new setting `:MDCStartDate` that specifies the cutoff date. If a dataset has any legacy access counts collected prior to that date, those numbers will be displayed in addition to the any MDC numbers recorded since then.
3 changes: 3 additions & 0 deletions doc/release-notes/8889-filepids-in-collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
It is now possible to configure registering PIDs for files in individual collections.

For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the [:FilePIDsEnabled](https://guides.dataverse.org/en/latest/installation/config.html#filepidsenabled) section of the Configuration guide for details.
4 changes: 4 additions & 0 deletions doc/release-notes/9431-checksum-alg-in-direct-uploads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Direct upload via the Dataverse UI will now support any algorithm configured via the :FileFixityChecksumAlgorithm setting.
External apps using the direct upload API can now query Dataverse to discover which algorithm should be used.

Sites that have been using an algorithm other than MD5 and direct upload and/or dvwebloader may want to use the /api/admin/updateHashValues call (see https://guides.dataverse.org/en/latest/installation/config.html?highlight=updatehashvalues#filefixitychecksumalgorithm) to replace any MD5 hashes on existing files.
1 change: 1 addition & 0 deletions doc/release-notes/9480-h5web.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A file previewer called H5Web is now available for exploring and visualizing NetCDF and HDF5 files.
3 changes: 3 additions & 0 deletions doc/release-notes/9558-async-indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Performance improvements, especially for large datasets containing thousands of files.
Uploading files one by one to the dataset is much faster now, allowing uploading thousands of files in an acceptable timeframe. Not only uploading a file, but all edit operations on datasets containing many files, got faster.
Performance tweaks include indexing of the datasets in the background and optimizations in the amount of the indexing operations needed. Furthermore, updates to the dateset no longer wait for ingesting to finish. Ingesting was already running in the background, but it took a lock, preventing updating the dataset and degrading performance for datasets containing many files.
1 change: 1 addition & 0 deletions doc/release-notes/9573-configbaker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A container has been added called "configbaker" that configures Dataverse while running in containers. This allows developers to spin up Dataverse with a single command.
6 changes: 6 additions & 0 deletions doc/release-notes/9588-datasets-api-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
The following APIs have been added:

- /api/datasets/summaryFieldNames
- /api/datasets/privateUrlDatasetVersion/{privateUrlToken}
- /api/datasets/privateUrlDatasetVersion/{privateUrlToken}/citation
- /api/datasets/{datasetId}/versions/{version}/citation
5 changes: 5 additions & 0 deletions doc/release-notes/9656-api-optional-dataset-params.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The following fields are now available in the native JSON output:

- alternativePersistentId
- publicationDate
- citationDate
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ Tool Type Scope Description
Data Explorer explore file A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/dataverse-data-explorer-v2 for the instructions on adding Data Explorer to your Dataverse.
Whole Tale explore dataset A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html>`_.
Binder explore dataset Binder allows you to spin up custom computing environments in the cloud (including Jupyter notebooks) with the files from your dataset. `Installation instructions <https://github.com/data-exp-lab/girder_ythub/issues/10>`_ are in the Data Exploration Lab girder_ythub project. See also :ref:`binder`.
File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is <https://hypothes.is/>`_ annotations, images, PDF, text, video, tabular data, spreadsheets, GeoJSON, zip, and NcML files - allowing them to be viewed without downloading the file. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/gdcc/dataverse-previewers
File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is <https://hypothes.is/>`_ annotations, images, PDF, text, video, tabular data, spreadsheets, GeoJSON, zip, HDF5, NetCDF, and NcML files - allowing them to be viewed without downloading the file. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/gdcc/dataverse-previewers
Data Curation Tool configure file A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@
{
"locale":"{localeCode}"
}
],
"allowedApiCalls": [
{
"name":"retrieveDatasetJson",
"httpMethod":"GET",
"urlTemplate":"/api/v1/datasets/{datasetId}",
"timeOut":10
}
]
}
]
},
"allowedApiCalls": [
{
"name":"retrieveDatasetJson",
"httpMethod":"GET",
"urlTemplate":"/api/v1/datasets/{datasetId}",
"timeOut":10
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@
{
"locale":"{localeCode}"
}
],
"allowedApiCalls": [
{
"name":"retrieveDataFile",
"httpMethod":"GET",
"urlTemplate":"/api/v1/access/datafile/{fileId}",
"timeOut":270
}
]
}
},
"allowedApiCalls": [
{
"name":"retrieveDataFile",
"httpMethod":"GET",
"urlTemplate":"/api/v1/access/datafile/{fileId}",
"timeOut":270
}
]
}
47 changes: 43 additions & 4 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,28 @@ Creates a link between a dataset and a Dataverse collection (see the :ref:`datas

curl -H "X-Dataverse-key: $API_TOKEN" -X PUT http://$SERVER/api/datasets/$linked-dataset-id/link/$linking-dataverse-alias

List Collections that are Linked from a Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Lists the link(s) created between a dataset and a Dataverse collection (see the :ref:`dataset-linking` section of the User Guide for more information). ::

curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/datasets/$linked-dataset-id/links

It returns a list in the following format:

.. code-block:: json
{
"status": "OK",
"data": {
"dataverses that link to dataset id 56782": [
"crc990 (id 18802)"
]
}
}
.. _unlink-a-dataset:

Unlink a Dataset
^^^^^^^^^^^^^^^^

Expand All @@ -131,15 +153,32 @@ Mint a PID for a File That Does Not Have One
In the following example, the database id of the file is 42::

export FILE_ID=42
curl http://localhost:8080/api/admin/$FILE_ID/registerDataFile
curl "http://localhost:8080/api/admin/$FILE_ID/registerDataFile"

Mint PIDs for all unregistered published files in the specified collection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Mint PIDs for Files That Do Not Have Them
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias::

If you have a large number of files, you might want to consider miniting PIDs for files individually using the ``registerDataFile`` endpoint above in a for loop, sleeping between each registration::
curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}"

It will not attempt to register the datafiles in its sub-collections, so this call will need to be repeated on any sub-collections where files need to be registered as well. File-level PID registration must be enabled on the collection. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)

This API will sleep for 1 second between registration calls by default. A longer sleep interval can be specified with an optional ``sleep=`` parameter::

curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}?sleep=5"

Mint PIDs for ALL unregistered files in the database
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following API will attempt to register the PIDs for all the published files in your instance that do not yet have them::

curl http://localhost:8080/api/admin/registerDataFileAll

The application will attempt to sleep for 1 second between registration attempts as not to overload your persistent identifier service provider. Note that if you have a large number of files that need to be registered in your Dataverse, you may want to consider minting file PIDs within indivdual collections, or even for individual files using the ``registerDataFiles`` and/or ``registerDataFile`` endpoints above in a loop, with a longer sleep interval between calls.



Mint a New DOI for a Dataset with a Handle
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
Loading

0 comments on commit 55ec209

Please sign in to comment.