diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 7c0f362486a..5725cafa193 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -943,7 +943,7 @@ Some external tools are also ready to be translated, especially if they are usin Tools for Translators -+++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++ The list below depicts a set of tools that can be used to ease the amount of work necessary for translating the Dataverse software by facilitating this collaborative effort and enabling the reuse of previous work: @@ -1075,7 +1075,9 @@ BagIt file handler configuration settings: BagIt Export ------------ -Your Dataverse installation may be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant `_ zipped `BagIt `_ bags to `Chronopolis `_ via `DuraCloud `_ or alternately to any folder on the local filesystem. +Your Dataverse installation may be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant `_ zipped `BagIt `_ archival Bags (sometimes called BagPacks) to `Chronopolis `_ via `DuraCloud `_ or alternately to any folder on the local filesystem. + +These archival Bags include all of the files and metadata in a given dataset version and are sufficient to recreate the dataset, e.g. in a new Dataverse instance, or postentially in another RDA-conformant repository. The Dataverse Software offers an internal archive workflow which may be configured as a PostPublication workflow via an admin API call to manually submit previously published Datasets and prior versions to a configured archive such as Chronopolis. The workflow creates a `JSON-LD `_ serialized `OAI-ORE `_ map file, which is also available as a metadata export format in the Dataverse Software web interface. @@ -1086,7 +1088,7 @@ At present, the DPNSubmitToArchiveCommand, LocalSubmitToArchiveCommand, and Goog Duracloud Configuration +++++++++++++++++++++++ -Also note that while the current Chronopolis implementation generates the bag and submits it to the archive's DuraCloud interface, the step to make a 'snapshot' of the space containing the Bag (and verify it's successful submission) are actions a curator must take in the DuraCloud interface. +Also note that while the current Chronopolis implementation generates the archival Bag and submits it to the archive's DuraCloud interface, the step to make a 'snapshot' of the space containing the archival Bag (and verify it's successful submission) are actions a curator must take in the DuraCloud interface. The minimal configuration to support an archiver integration involves adding a minimum of two Dataverse Software Keys and any required Payara jvm options. The example instructions here are specific to the DuraCloud Archiver\: @@ -1110,7 +1112,7 @@ It also can use one setting that is common to all Archivers: :BagGeneratorThread ``curl http://localhost:8080/api/admin/settings/:BagGeneratorThreads -X PUT -d '8'`` -By default, the Bag generator zips two datafiles at a time when creating the Bag. This setting can be used to lower that to 1, i.e. to decrease system load, or to increase it, e.g. to 4 or 8, to speed processing of many small files. +By default, the Bag generator zips two datafiles at a time when creating the archival Bag. This setting can be used to lower that to 1, i.e. to decrease system load, or to increase it, e.g. to 4 or 8, to speed processing of many small files. Archivers may require JVM options as well. For the Chronopolis archiver, the username and password associated with your organization's Chronopolis/DuraCloud account should be configured in Payara: @@ -1127,7 +1129,7 @@ ArchiverClassName - the fully qualified class to be used for archiving. For exam ``curl -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand" http://localhost:8080/api/admin/settings/:ArchiverClassName`` -\:BagItLocalPath - the path to where you want to store BagIt. For example\: +\:BagItLocalPath - the path to where you want to store the archival Bags. For example\: ``curl -X PUT -d /home/path/to/storage http://localhost:8080/api/admin/settings/:BagItLocalPath`` @@ -1142,7 +1144,7 @@ ArchiverClassName - the fully qualified class to be used for archiving. For exam Google Cloud Configuration ++++++++++++++++++++++++++ -The Google Cloud Archiver can send Dataverse Project Bags to a bucket in Google's cloud, including those in the 'Coldline' storage class (cheaper, with slower access) +The Google Cloud Archiver can send archival Bags to a bucket in Google's cloud, including those in the 'Coldline' storage class (cheaper, with slower access) ``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.GoogleCloudSubmitToArchiveCommand"`` @@ -1168,12 +1170,12 @@ For example: .. _Archiving API Call: -API Call -++++++++ +API Calls ++++++++++ -Once this configuration is complete, you, as a user with the *PublishDataset* permission, should be able to use the API call to manually submit a DatasetVersion for processing: +Once this configuration is complete, you, as a user with the *PublishDataset* permission, should be able to use the admin API call to manually submit a DatasetVersion for processing: -``curl -H "X-Dataverse-key: " http://localhost:8080/api/admin/submitDataVersionToArchive/{id}/{version}`` +``curl -X POST -H "X-Dataverse-key: " http://localhost:8080/api/admin/submitDatasetVersionToArchive/{id}/{version}`` where: @@ -1181,10 +1183,22 @@ where: ``{version}`` is the friendly version number, e.g. "1.2". -The submitDataVersionToArchive API (and the workflow discussed below) attempt to archive the dataset version via an archive specific method. For Chronopolis, a DuraCloud space named for the dataset (it's DOI with ':' and '.' replaced with '-') is created and two files are uploaded to it: a version-specific datacite.xml metadata file and a BagIt bag containing the data and an OAI-ORE map file. (The datacite.xml file, stored outside the Bag as well as inside is intended to aid in discovery while the ORE map file is 'complete', containing all user-entered metadata and is intended as an archival record.) +The submitDatasetVersionToArchive API (and the workflow discussed below) attempt to archive the dataset version via an archive specific method. For Chronopolis, a DuraCloud space named for the dataset (it's DOI with ':' and '.' replaced with '-') is created and two files are uploaded to it: a version-specific datacite.xml metadata file and a BagIt bag containing the data and an OAI-ORE map file. (The datacite.xml file, stored outside the Bag as well as inside is intended to aid in discovery while the ORE map file is 'complete', containing all user-entered metadata and is intended as an archival record.) In the Chronopolis case, since the transfer from the DuraCloud front-end to archival storage in Chronopolis can take significant time, it is currently up to the admin/curator to submit a 'snap-shot' of the space within DuraCloud and to monitor its successful transfer. Once transfer is complete the space should be deleted, at which point the Dataverse Software API call can be used to submit a Bag for other versions of the same Dataset. (The space is reused, so that archival copies of different Dataset versions correspond to different snapshots of the same DuraCloud space.). +A batch version of this admin api call is also available: + +``curl -X POST -H "X-Dataverse-key: " 'http://localhost:8080/api/admin/archiveAllUnarchivedDatasetVersions?listonly=true&limit=10&latestonly=true'`` + +The archiveAllUnarchivedDatasetVersions call takes 3 optional configuration parameters. +* listonly=true will cause the API to list dataset versions that would be archived but will not take any action. +* limit= will limit the number of dataset versions archived in one api call to <= . +* latestonly=true will limit archiving to only the latest published versions of datasets instead of archiving all unarchived versions. + +Note that because archiving is done asynchronously, the calls above will return OK even if the user does not have the *PublishDataset* permission on the dataset(s) involved. Failures are indocated in the log and the archivalStatus calls in the native api can be used to check the status as well. + + PostPublication Workflow ++++++++++++++++++++++++ @@ -2578,7 +2592,7 @@ Number of errors to display to the user when creating DataFiles from a file uplo .. _:BagItHandlerEnabled: :BagItHandlerEnabled -+++++++++++++++++++++ +++++++++++++++++++++ Part of the database settings to configure the BagIt file handler. Enables the BagIt file handler. By default, the handler is disabled. diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java b/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java index 63ecfd7cdc1..2fc97e23fb7 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java @@ -2754,7 +2754,7 @@ public String updateCurrentVersion() { */ try { updateVersion = commandEngine.submit(archiveCommand); - if (updateVersion.getArchivalCopyLocation() != null) { + if (!updateVersion.getArchivalCopyLocationStatus().equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE)) { successMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.success"); } else { errorMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.failure"); @@ -5562,9 +5562,14 @@ public void archiveVersion(Long id) { if (cmd != null) { try { DatasetVersion version = commandEngine.submit(cmd); - logger.info("Archived to " + version.getArchivalCopyLocation()); + if (!version.getArchivalCopyLocationStatus().equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE)) { + logger.info( + "DatasetVersion id=" + version.getId() + " submitted to Archive, status: " + dv.getArchivalCopyLocationStatus()); + } else { + logger.severe("Error submitting version " + version.getId() + " due to conflict/error at Archive"); + } if (version.getArchivalCopyLocation() != null) { - resetVersionTabList(); + setVersionTabList(resetVersionTabList()); this.setVersionTabListForPostLoad(getVersionTabList()); JsfHelper.addSuccessMessage(BundleUtil.getStringFromBundle("datasetversion.archive.success")); } else { diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java b/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java index 510cb2866e8..30815c43381 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java @@ -40,6 +40,8 @@ import javax.persistence.Index; import javax.persistence.JoinColumn; import javax.persistence.ManyToOne; +import javax.persistence.NamedQueries; +import javax.persistence.NamedQuery; import javax.persistence.OneToMany; import javax.persistence.OneToOne; import javax.persistence.OrderBy; @@ -60,6 +62,13 @@ * * @author skraffmiller */ + +@NamedQueries({ + @NamedQuery(name = "DatasetVersion.findUnarchivedReleasedVersion", + query = "SELECT OBJECT(o) FROM DatasetVersion AS o WHERE o.dataset.harvestedFrom IS NULL and o.releaseTime IS NOT NULL and o.archivalCopyLocation IS NULL" + )}) + + @Entity @Table(indexes = {@Index(columnList="dataset_id")}, uniqueConstraints = @UniqueConstraint(columnNames = {"dataset_id,versionnumber,minorversionnumber"})) diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetVersionServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetVersionServiceBean.java index df787ae1391..23fc1961b7d 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetVersionServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetVersionServiceBean.java @@ -1195,4 +1195,24 @@ private DatasetVersion getPreviousVersionWithUnf(DatasetVersion datasetVersion) public DatasetVersion merge( DatasetVersion ver ) { return em.merge(ver); } + + /** + * Execute a query to return DatasetVersion + * + * @param queryString + * @return + */ + public List getUnarchivedDatasetVersions(){ + + try { + List dsl = em.createNamedQuery("DatasetVersion.findUnarchivedReleasedVersion", DatasetVersion.class).getResultList(); + return dsl; + } catch (javax.persistence.NoResultException e) { + logger.log(Level.FINE, "No unarchived DatasetVersions found: {0}"); + return null; + } catch (EJBException e) { + logger.log(Level.WARNING, "EJBException exception: {0}", e.getMessage()); + return null; + } + } // end getUnarchivedDatasetVersions } // end class diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Admin.java b/src/main/java/edu/harvard/iq/dataverse/api/Admin.java index f1f9c788f1e..ef08444af69 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Admin.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Admin.java @@ -105,9 +105,6 @@ import static edu.harvard.iq.dataverse.util.json.JsonPrinter.json; import static edu.harvard.iq.dataverse.util.json.JsonPrinter.rolesToJson; import static edu.harvard.iq.dataverse.util.json.JsonPrinter.toJsonArray; -import java.math.BigDecimal; - - import java.util.ArrayList; import java.util.Arrays; import java.util.Date; @@ -1805,23 +1802,28 @@ public Response validateDataFileHashValue(@PathParam("fileId") String fileId) { } - @GET - @Path("/submitDataVersionToArchive/{id}/{version}") - public Response submitDatasetVersionToArchive(@PathParam("id") String dsid, @PathParam("version") String versionNumber) { + @POST + @Path("/submitDatasetVersionToArchive/{id}/{version}") + public Response submitDatasetVersionToArchive(@PathParam("id") String dsid, + @PathParam("version") String versionNumber) { try { AuthenticatedUser au = findAuthenticatedUserOrDie(); - // Note - the user is being set in the session so it becomes part of the - // DataverseRequest and is sent to the back-end command where it is used to get - // the API Token which is then used to retrieve files (e.g. via S3 direct - // downloads) to create the Bag - session.setUser(au); // TODO: Stop using session. Use createDataverseRequest instead. + Dataset ds = findDatasetOrDie(dsid); DatasetVersion dv = datasetversionService.findByFriendlyVersionNumber(ds.getId(), versionNumber); if (dv.getArchivalCopyLocation() == null) { String className = settingsService.getValueForKey(SettingsServiceBean.Key.ArchiverClassName); - AbstractSubmitToArchiveCommand cmd = ArchiverUtil.createSubmitToArchiveCommand(className, dvRequestService.getDataverseRequest(), dv); + // Note - the user is being sent via the createDataverseRequest(au) call to the + // back-end command where it is used to get the API Token which is + // then used to retrieve files (e.g. via S3 direct downloads) to create the Bag + AbstractSubmitToArchiveCommand cmd = ArchiverUtil.createSubmitToArchiveCommand(className, + createDataverseRequest(au), dv); + // createSubmitToArchiveCommand() tries to find and instantiate an non-abstract + // implementation of AbstractSubmitToArchiveCommand based on the provided + // className. If a class with that name isn't found (or can't be instatiated), it + // will return null if (cmd != null) { if(ArchiverUtil.onlySingleVersionArchiving(cmd.getClass(), settingsService)) { for (DatasetVersion version : ds.getVersions()) { @@ -1834,9 +1836,10 @@ public Response submitDatasetVersionToArchive(@PathParam("id") String dsid, @Pat public void run() { try { DatasetVersion dv = commandEngine.submit(cmd); - if (dv.getArchivalCopyLocation() != null) { - logger.info("DatasetVersion id=" + ds.getGlobalId().toString() + " v" + versionNumber + " submitted to Archive at: " - + dv.getArchivalCopyLocation()); + if (!dv.getArchivalCopyLocationStatus().equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE)) { + logger.info( + "DatasetVersion id=" + ds.getGlobalId().toString() + " v" + versionNumber + + " submitted to Archive, status: " + dv.getArchivalCopyLocationStatus()); } else { logger.severe("Error submitting version due to conflict/error at Archive"); } @@ -1845,13 +1848,105 @@ public void run() { } } }).start(); - return ok("Archive submission using " + cmd.getClass().getCanonicalName() + " started. Processing can take significant time for large datasets. View log and/or check archive for results."); + return ok("Archive submission using " + cmd.getClass().getCanonicalName() + + " started. Processing can take significant time for large datasets and requires that the user have permission to publish the dataset. View log and/or check archive for results."); + } else { + logger.log(Level.SEVERE, "Could not find Archiver class: " + className); + return error(Status.INTERNAL_SERVER_ERROR, "Could not find Archiver class: " + className); + } + } else { + return error(Status.BAD_REQUEST, "Version was already submitted for archiving."); + } + } catch (WrappedResponse e1) { + return error(Status.UNAUTHORIZED, "api key required"); + } + } + + + /** + * Iteratively archives all unarchived dataset versions + * @param + * listonly - don't archive, just list unarchived versions + * limit - max number to process + * lastestonly - only archive the latest versions + * @return + */ + @POST + @Path("/archiveAllUnarchivedDatasetVersions") + public Response archiveAllUnarchivedDatasetVersions(@QueryParam("listonly") boolean listonly, @QueryParam("limit") Integer limit, @QueryParam("latestonly") boolean latestonly) { + + try { + AuthenticatedUser au = findAuthenticatedUserOrDie(); + + List dsl = datasetversionService.getUnarchivedDatasetVersions(); + if (dsl != null) { + if (listonly) { + JsonArrayBuilder jab = Json.createArrayBuilder(); + logger.fine("Unarchived versions found: "); + int current = 0; + for (DatasetVersion dv : dsl) { + if (limit != null && current >= limit) { + break; + } + if (!latestonly || dv.equals(dv.getDataset().getLatestVersionForCopy())) { + jab.add(dv.getDataset().getGlobalId().toString() + ", v" + dv.getFriendlyVersionNumber()); + logger.fine(" " + dv.getDataset().getGlobalId().toString() + ", v" + dv.getFriendlyVersionNumber()); + current++; + } + } + return ok(jab); + } + String className = settingsService.getValueForKey(SettingsServiceBean.Key.ArchiverClassName); + // Note - the user is being sent via the createDataverseRequest(au) call to the + // back-end command where it is used to get the API Token which is + // then used to retrieve files (e.g. via S3 direct downloads) to create the Bag + final DataverseRequest request = createDataverseRequest(au); + // createSubmitToArchiveCommand() tries to find and instantiate an non-abstract + // implementation of AbstractSubmitToArchiveCommand based on the provided + // className. If a class with that name isn't found (or can't be instatiated, it + // will return null + AbstractSubmitToArchiveCommand cmd = ArchiverUtil.createSubmitToArchiveCommand(className, request, dsl.get(0)); + if (cmd != null) { + //Found an archiver to use + new Thread(new Runnable() { + public void run() { + int total = dsl.size(); + int successes = 0; + int failures = 0; + for (DatasetVersion dv : dsl) { + if (limit != null && (successes + failures) >= limit) { + break; + } + if (!latestonly || dv.equals(dv.getDataset().getLatestVersionForCopy())) { + try { + AbstractSubmitToArchiveCommand cmd = ArchiverUtil.createSubmitToArchiveCommand(className, request, dv); + + dv = commandEngine.submit(cmd); + if (!dv.getArchivalCopyLocationStatus().equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE)) { + successes++; + logger.info("DatasetVersion id=" + dv.getDataset().getGlobalId().toString() + " v" + dv.getFriendlyVersionNumber() + " submitted to Archive, status: " + + dv.getArchivalCopyLocationStatus()); + } else { + failures++; + logger.severe("Error submitting version due to conflict/error at Archive for " + dv.getDataset().getGlobalId().toString() + " v" + dv.getFriendlyVersionNumber()); + } + } catch (CommandException ex) { + failures++; + logger.log(Level.SEVERE, "Unexpected Exception calling submit archive command", ex); + } + } + logger.fine(successes + failures + " of " + total + " archive submissions complete"); + } + logger.info("Archiving complete: " + successes + " Successes, " + failures + " Failures. See prior log messages for details."); + } + }).start(); + return ok("Starting to archive all unarchived published dataset versions using " + cmd.getClass().getCanonicalName() + ". Processing can take significant time for large datasets/ large numbers of dataset versions and requires that the user have permission to publish the dataset(s). View log and/or check archive for results."); } else { logger.log(Level.SEVERE, "Could not find Archiver class: " + className); return error(Status.INTERNAL_SERVER_ERROR, "Could not find Archiver class: " + className); } } else { - return error(Status.BAD_REQUEST, "Version already archived at: " + dv.getArchivalCopyLocation()); + return error(Status.BAD_REQUEST, "No unarchived published dataset versions found"); } } catch (WrappedResponse e1) { return error(Status.UNAUTHORIZED, "api key required"); diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java index 15b3cd2b9db..7941dfd70c8 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java @@ -1153,7 +1153,7 @@ public Response publishDataset(@PathParam("id") String id, @QueryParam("type") S */ try { updateVersion = commandEngine.submit(archiveCommand); - if (updateVersion.getArchivalCopyLocation() != null) { + if (!updateVersion.getArchivalCopyLocationStatus().equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE)) { successMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.success"); } else { successMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.failure"); @@ -3352,7 +3352,6 @@ public Response setDatasetVersionArchivalStatus(@PathParam("id") String datasetI dsv.setArchivalCopyLocation(JsonUtil.prettyPrint(update)); dsv = datasetversionService.merge(dsv); - logger.fine("location now: " + dsv.getArchivalCopyLocation()); logger.fine("status now: " + dsv.getArchivalCopyLocationStatus()); logger.fine("message now: " + dsv.getArchivalCopyLocationMessage()); diff --git a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/DuraCloudSubmitToArchiveCommand.java b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/DuraCloudSubmitToArchiveCommand.java index d37d9e655b0..c7da2247a31 100644 --- a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/DuraCloudSubmitToArchiveCommand.java +++ b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/DuraCloudSubmitToArchiveCommand.java @@ -63,7 +63,7 @@ public WorkflowStepResult performArchiveSubmission(DatasetVersion dv, ApiToken t // ToDo - change after HDC 3A changes to status reporting // This will make the archivalCopyLocation non-null after a failure which should // stop retries - dv.setArchivalCopyLocation("Attempted"); + if (dataset.getLockFor(Reason.finalizePublication) == null && dataset.getLockFor(Reason.FileValidationFailed) == null) { // Use Duracloud client classes to login