Skip to content

Datasets

Megan R Brett edited this page Nov 28, 2022 · 21 revisions

Managing datasets

Once you have created a project, you can begin adding datasets to the project. A project can have many datasets or only one.

Each dataset should have the same data model. Each dataset has its own form; in order to use a different form you must create an additional dataset. Information is exported from DataScribe by dataset.

Adding Datasets

To add a dataset, go to the Project view for your project, then select “Add new dataset” from the dropdown in the upper right corner of the browser window.

This will take you to the form to build a new dataset. There are fields for:

  • Name (required): a name for the dataset
  • Description: A description of the dataset. This will display in the dataset metadata when viewing the project.
  • Guidelines: Instructions and useful information for transcribers of this dataset.
  • Item Set (required): Select from a dropdown of item sets in the Omeka S install.
  • Revert review status: checkbox. Check this to automatically revert an approved item's review status to "not reviewed" when a record belonging to the item is added or modified. The default behavior is to retain the approved review status.
    • This matters when exporting data - only approved data is exported. So if the box is unchecked, changes to items which are approved will be automatically added to the export.
  • Export missing and illegible: checkbox. When this box is checked, the export will append columnns for illegible and missing flags for each field.
  • Import form: import a DataScribe dataset form (json). This option is only available when adding a new dataset.

Using the eye icon in the upper right corner, you can set the dataset as private or public. Private datasets are only visible to Global Admins of the Omeka S install and those users who have been added to the project. A public dataset can be viewed by anyone who is logged in to the Omeka S installation.

Once you have created the dataset, you will be taken to the dataset browse page. A message should appear in the main work area “No items found. Sync this dataset. If you have recently synced, the sync job likely has not finished.”

To sync the dataset, either click the phrase “sync this dataset” or use the Actions menu in the upper right corner of the window and select the “Sync dataset” option. Syncing updates the dataset to match the items in the source Omeka S item set. The first sync will populate the dataset with items from the source item set. For more on syncing once a dataset is populated, see "Sync dataset" below.

Once you have synced the dataset, you will need to reload the page in order to see the items.

Browse datasets

You can see all the datasets which are part of a project in two places: on the DataScribe dashboard (if you are a user on that project) and in the project's browse dataset and project metadata view, found on the right side of the browser window.

On the dashboard, projects which show up in the "My projects" section will include areas for each dataset in the project. Within each dataset are links to interact with specific kinds of records and items.

You can access the browse dataset and project metadata view by clicking on the project's title from the dashboard. This will take you to a page with the project metadata in a side drawer on the right and a table of all datasets. The wayfinding in the top of the page should read "Dashboard > Projects > Datasets".

The dataset table lists:

  • the dataset’s title
  • The creator of the dataset (Omeka user)
  • Date the dataset was created.

Clicking on the ellipses (three dots) in a dataset’s row will change the side drawer to display the metadata for this dataset. Click the pencil icon to edit the dataset. Click the dataset's title to take you to its browse items and records page.

You can toggle the side drawer open or closed by clicking on the triangle icon.

Dataset actions

Dataset actions are available from the dataset's browse items and browse records page. The wayfinding in the top of the page should read "Dashboard > Projects > Datasets > Items" or "Dashboard > Projects > Datasets > Records".

The "Browse records" and "Edit dataset" buttons are always visible; other options are available from the "More actions" dropdown.

The options to export the dataset and the dataset form can be found at the very bottom of the right-hand drawer.

Browse toggle

This button is in the upper right-hand corner of the browser window, next to the "Edit Dataset" button.

Depending on what your current browse view is, this will either display as “Browse records” or “Browse items”. Use this button to toggle between the two views.

Edit dataset

This button is in the upper right-hand corner of the browser window, next to the "More actions" button.

Editing the dataset opens a page with two tabs: General, which has many of the same options as when you created the dataset; and the Form Builder tab.

The General options are largely the same as when you created the dataset:

  • Name (required): a name for the dataset
  • Description: A description of the dataset. This will display in the dataset metadata when viewing the project.
  • Guidelines: Instructions and useful information for transcribers of this dataset.
  • Item Set: Be aware that changing the associated item set will result in dataloss.
  • Revert review status: checkbox. Check this to automatically revert an approved item's review status to "not reviewed" when a record belonging to the item is added or modified. The default behavior is to retain the approved review status.
    • This matters when exporting data - only approved data is exported. So if the box is unchecked, changes to items which are approved will be automatically added to the export.
  • Export missing and illegible: checkbox. When this box is checked, the export will append columnns for illegible and missing flags for each field.

Note that you cannot import a form into an existing dataset.

The Form Builder tab allows you to create or edit the form for your dataset

Sync dataset

This is the first option in the "More actions" dropdown.

Syncing updates the dataset with the information from the items in the item set. You should synchronize periodically, any time items are added or removed from the item set.

Be aware that any items which have been removed from the Omeka S item set will be deleted from the DataScribe dataset, along with any records which have been created for those items.

Syncing does not send any information from DataScribe to Omeka S.

Validate dataset

This is the second option in the "More actions" dropdown.

Running a validation checks the inputs in all existing records against any settings you have in the form builder. For example, it will check whether required fields are filled, whether numbers match the required format, and whether data in a field matches any regular expressions you have implemented.

Once you have run the validation, records with invalid values will be marked with a red alert, or exclamation point, on their browse rows. You will be able to filter by items and records “with values marked as invalid” in order to find them more quickly.

You should validate periodically, after fields are added to the form, after field validation rules change, and before exporting the dataset.

Export dataset

This is the third option in the "More actions" dropdown.

Use this link to generate the dataset as an exportable csv file. Note that the file will only include records from approved items, and any invalid values will be given as “null” in the CSV.

Running an export will overwrite any previous export file.

Once the export is complete, you will be able to download the dataset from the link in the right-hand drawer.

Move dataset

This is the fourth option in the "More actions" dropdown.

This action allows you to move a dataset from its current project to a new project. The right-hand drawer will open with the header "Move dataset" and a dropdown of all the projects in your DataScribe installation. In the dropdown menu, select the project to which you want to move the dataset. Then click the "move dataset" button underneat the dropdown menu.

Download dataset

Once you have validated and exported the dataset

Exported files are stored in the Omeka install. They are made available in the Dataset metadata drawer on the right hand side of the window. Once you have completed a successful export, the bottom option of the drawer with have the heading "Download dataset."

You will need to download the file in order to open it or save it locally; the link functions as a direct link to the file. You may need to right-click or control-click in order to be able to save the linked file to your computer - some browsers will simply open the link in-browser.

An exported csv will have three columns in addition to the data created in the dataset form:

  • Omeka Item number: the item number for the Omeka item. This will allow you to connect exported DataScribe information with the items in your Omeka S installation.
  • DataScribe item number: item number in the DataScribe dataset. Does not correspond to Omeka S item number.
  • DataScribe Record number: record number for that record in DataScribe. Record numbers are assigned across DataScribe as a whole, so an item may not have sequential record numbers.
  • DataScribe Record Position: order of records within an item. If the insert before/after option has been used when editing or creating a record, the position number sequence may not match the record number sequence. For example, the records with positions 1, 2, 3, 4 could have record numbers 1, 3, 2, 4.

Export form

This action is available from the drawer in the Dataset view, near the download dataset option. You can export the form from the dataset to import when creating new datasets. Note: you cannot import a form to an existing dataset - only new datasets.

When you click on the link, your browser will prompt you to select where to save the form. The form will be called "form_export" by default - you can change this in the download and save dialog. Forms are exported in json format.

If you run an export and the file does not update or there is no file to download, check the Omeka S job for an error message.