Skip to content
Bruno Ferreira edited this page Jan 3, 2017 · 21 revisions

How it works

Database preservation toolkit converts from a source database format to a destination database format. The format may be a database management system, a preservation format or even plain text.

To retrieve from a source, the application uses an import module.

To write to a destination, the application uses an export module.

It is the pair composed of an import module and an export module that provides the conversion functionality. There are different modules which can be used and even configured to provide a conversion between database formats.

General usage

The command line application takes a series of arguments, that can be provided in any order. These define the application's behavior.

General usage: java [properties] -jar dbptk-app-x.y.z.jar [plugin] <importModule> [import module options] <exportModule> [export module options]

How to specify the parameters

The general use command is generic and cannot be used as is. Here are a list of modifications that must be carried out:

  • java is the java command, the full path may also be used
  • [properties] may be omitted or replaced with special configurations that influence the conversion(more details)
  • -jar dbptk-app-x.y.z.jar tells java to execute the dbptk-app-x.y.z.jar file (the file name must be adjusted to match the one you have)
  • [plugin] is optional, and should be replaced with plugin configurations (if any)
  • <importModule> should be replaced with the import module specification, e.g. -i mysql or --import=postgresql
  • <exportModule> should be replaced with the export module specification, e.g. -e mysql or --export=postgresql
  • [import module options] should be replaced with parameters to specify the behavior of the import module, e.g. --import-username=username --import-password="p4ssw0rd" (to specify source database username and password)
  • [export module options] should be replaced with parameters to specify the behavior of the export module, e.g. --export-file=filename.siard --export-compress --export-pretty-xml (to specify the SIARD-2 export module behavior)

Short/long parameter format

Parameters have two interchangeable formats, a longer format for readability (e.g. --import-hostname=localhost) and a short format which is faster to type (e.g. -i localhost). Notice that the difference is the shorter/longer parameter name and the number of short dashes used (there is no distinction in using space character or equal sign to separate parameters).

Parameters

For the [plugin] part

-p, --plugin=plugin.jar    (optional) the file containing a plugin module. Several plugins can be specified, separated by a semi-colon (;)

Available import modules, for the [import module options] part

Specify the import module with: -i <module>, --import=module

Import module: jdbc

-id, --import-driver=value    (required) the name of the the JDBC driver class. For more info about this refer to the website or the README file
-ic, --import-connection=value    (required) the connection url to use in the connection

Import module: microsoft-access

-if, --import-file=value    (required) path to the Microsoft Access file

Import module: microsoft-sql-server

-is, --import-server-name=value    (required) the name (host name) of the server
-idb, --import-database=value    (required) the name of the database we'll be accessing
-iu, --import-username=value    (required) the name of the user to use in the connection
-ip, --import-password=value    (required) the password of the user to use in the connection
-il, --import-use-integrated-login    (optional) use windows login; by default the SQL Server login is used
-ide, --import-disable-encryption    (optional) use to turn off encryption in the connection
-iin, --import-instance-name=value    (optional) the name of the instance
-ipn, --import-port-number=value    (optional) the port number of the server instance, default is 1433

Import module: mysql

-ih, --import-hostname=value    (required) the hostname of the MySQL server
-idb, --import-database=value    (required) the name of the MySQL database
-iu, --import-username=value    (required) the name of the user to use in connection
-ip, --import-password=value    (required) the password of the user to use in connection
-ipn, --import-port-number=value    (optional) the port that the MySQL server is listening

Import module: oracle

-is, --import-server-name=value    (required) the name (or IP address) of the Oracle server
-idb, --import-database=value    (required) the name of the database to use in the connection
-iu, --import-username=value    (required) the name of the user to use in connection
-ip, --import-password=value    (required) the password of the user to use in connection
-ipn, --import-port-number=value    (required) the port that the Oracle server is listening
-ial, --import-accept-license    (optional) declare that you accept OTN License Agreement, which is necessary to use this module

Import module: postgresql

-ih, --import-hostname=value    (required) the name of the PostgreSQL server host (e.g. localhost)
-idb, --import-database=value    (required) the name of the database to connect to
-iu, --import-username=value    (required) the name of the user to use in connection
-ip, --import-password=value    (required) the password of the user to use in connection
-ide, --import-disable-encryption    (optional) use to turn off encryption in the connection
-ipn, --import-port-number=value    (optional) the port of where the PostgreSQL server is listening, default is 5432

Import module: siard-1

-if, --import-file=value    (required) Path to SIARD1 archive file

Import module: siard-2

-if, --import-file=value    (required) Path to SIARD2 archive file

Import module: siard-dk

-if, --import-folder=value    (required) Path to (the first) SIARDDK archive folder. Archive folder name must match the expression AVID.[A-ZÆØÅ]{2,4}.[1-9][0-9]*.1 Any additional parts of the archive (eg. with suffixes .2 .3 etc) referenced in the tableIndex.xml will also be processed.
-ias, --import-as-schema=value    (required) Name of the database schema to use when importing the SIARDDK archive. Suggested values: PostgreSQL:'public', MySQL:'<name of database>', MSSQL:'dbo'

Available export modules, for the [export module options] part

Specify the export module with: -e <module>, --export=module

Export module: solr

-eh, --export-hostname=value    (optional) Solr Cloud server hostname or address
-ep, --export-port=value    (optional) Solr Cloud server port
-ezh, --export-zookeeper-hostname=value    (optional) Zookeeper server hostname or address
-ezp, --export-zookeeper-port=value    (optional) Zookeeper server port

Export module: jdbc

-ed, --export-driver=value    (required) the name of the the JDBC driver class. For more info about this refer to the website or the README file
-ec, --export-connection=value    (required) the connection url to use in the connection

Export module: list-tables

-ef, --export-file=value    (required) Path to output file that can be read by SIARD2 export module

Export module: microsoft-sql-server

-es, --export-server-name=value    (required) the name (host name) of the server
-edb, --export-database=value    (required) the name of the database we'll be accessing
-eu, --export-username=value    (required) the name of the user to use in the connection
-ep, --export-password=value    (required) the password of the user to use in the connection
-el, --export-use-integrated-login    (optional) use windows login; by default the SQL Server login is used
-ede, --export-disable-encryption    (optional) use to turn off encryption in the connection
-ein, --export-instance-name=value    (optional) the name of the instance
-epn, --export-port-number=value    (optional) the port number of the server instance, default is 1433

Export module: mysql

-eh, --export-hostname=value    (required) the hostname of the MySQL server
-edb, --export-database=value    (required) the name of the MySQL database
-eu, --export-username=value    (required) the name of the user to use in connection
-ep, --export-password=value    (required) the password of the user to use in connection
-epn, --export-port-number=value    (optional) the port that the MySQL server is listening

Export module: oracle

-es, --export-server-name=value    (required) the name (or IP address) of the Oracle server
-edb, --export-database=value    (required) the name of the database to use in the connection
-eu, --export-username=value    (required) the name of the user to use in connection
-ep, --export-password=value    (required) the password of the user to use in connection
-epn, --export-port-number=value    (required) the port that the Oracle server is listening
-eal, --export-accept-license    (optional) declare that you accept OTN License Agreement, which is necessary to use this module
-esc, --export-source-schema=value    (optional) the name of the source schema to export to the Oracle database. A schema with this name must exist in the Oracle database and it must be the default tablespace for the specified user. If omitted, the name of the first schema will be used

Export module: postgresql

-eh, --export-hostname=value    (required) the name of the PostgreSQL server host (e.g. localhost)
-edb, --export-database=value    (required) the name of the database to connect to
-eu, --export-username=value    (required) the name of the user to use in connection
-ep, --export-password=value    (required) the password of the user to use in connection
-ede, --export-disable-encryption    (optional) use to turn off encryption in the connection
-epn, --export-port-number=value    (optional) the port of where the PostgreSQL server is listening, default is 5432

Export module: siard-1

-ef, --export-file=value    (required) Path to SIARD1 archive file
-ec, --export-compress    (optional) use to compress the SIARD1 archive file with deflate method
-ep, --export-pretty-xml    (optional) write human-readable XML
-etf, --export-table-filter=value    (optional) file with the list of tables that should be exported (this file can be created by the list-tables export module).
-emd, --export-meta-description[=value]    (optional) SIARD descriptive metadata field: Description of database meaning and content as a whole.
-ema, --export-meta-archiver[=value]    (optional) SIARD descriptive metadata field: Name of the person who carried out the archiving of the database.
-emac, --export-meta-archiver-contact[=value]    (optional) SIARD descriptive metadata field: Contact details (telephone, email) of the person who carried out the archiving of the database.
-emdo, --export-meta-data-owner[=value]    (optional) SIARD descriptive metadata field: Owner of the data in the database. The person or institution that, at the time of archiving, has the right to grant usage rights for the data and is responsible for compliance with legal obligations such as data protection guidelines.
-emdot, --export-meta-data-origin-timespan[=value]    (optional) SIARD descriptive metadata field: Origination period of the data in the database (approximate indication in text form).
-emcm, --export-meta-client-machine[=value]    (optional) SIARD descriptive metadata field: DNS name of the (client) computer on which the archiving was carried out.

Export module: siard-2

-ef, --export-file=value    (required) Path to SIARD2 archive file
-ec, --export-compress    (optional) use to compress the SIARD2 archive file with deflate method
-ep, --export-pretty-xml    (optional) write human-readable XML
-etf, --export-table-filter=value    (optional) file with the list of tables that should be exported (this file can be created by the list-tables export module).
-eel, --export-external-lobs    (optional) Saves any LOBs outside the siard file.
-eelpf, --export-external-lobs-per-folder=value    (optional) The maximum number of files present in an external LOB folder. Default: 1000 files.
-eelfs, --export-external-lobs-folder-size=value    (optional) Divide LOBs across multiple external folders with (approximately) the specified maximum size (in Megabytes). Default: do not divide.
-emd, --export-meta-description[=value]    (optional) SIARD descriptive metadata field: Description of database meaning and content as a whole.
-ema, --export-meta-archiver[=value]    (optional) SIARD descriptive metadata field: Name of the person who carried out the archiving of the database.
-emac, --export-meta-archiver-contact[=value]    (optional) SIARD descriptive metadata field: Contact details (telephone, email) of the person who carried out the archiving of the database.
-emdo, --export-meta-data-owner[=value]    (optional) SIARD descriptive metadata field: Owner of the data in the database. The person or institution that, at the time of archiving, has the right to grant usage rights for the data and is responsible for compliance with legal obligations such as data protection guidelines.
-emdot, --export-meta-data-origin-timespan[=value]    (optional) SIARD descriptive metadata field: Origination period of the data in the database (approximate indication in text form).
-emcm, --export-meta-client-machine[=value]    (optional) SIARD descriptive metadata field: DNS name of the (client) computer on which the archiving was carried out.

Export module: siard-dk

-ef, --export-folder=value    (required) Path to SIARDDK archive folder. Archive folder name must match the expression AVID.[A-ZÆØÅ]{2,4}.[1-9][0-9]*.[1-9][0-9]
-etf, --export-table-filter=value    (optional) file with the list of tables that should be exported (this file can be created by the list-tables export module).
-eai, --export-archiveIndex=value    (optional) Path to archiveIndex.xml input file
-eci, --export-contextDocumentationIndex=value    (optional) Path to contextDocumentationIndex.xml input file
-ecf, --export-contextDocumentationFolder=value    (optional) Path to contextDocumentation folder which should contain the context documentation for the archive

For the [properties] part

Several properties are available to modify specific conversion behaviour. You can consider them as knobs that can be turned to fine-tune the conversion.

The properties have a format like part1.part2.part3, with multiple lower-case parts separated by dots. All properties have a corresponding environment variable, like PART1_PART2_PART3 (corresponding to the previous example), with the same parts in upper-case and separated by underscores.

Properties are added to the command line like this:

... -Dpart1.part2.part3=value -Danother.property=othervalue ...

Note: in windows, each property and value pair must be enclosed in ", example ... "-Dpart1.part2.part3=value" ...

If both the environment variable and the property are set, the property is used.

For simplicity, only the properties will be described, and the environment variables can be derived from those by using upper-cased letters and replacing the dots with underscores (as described above).

Available properties

Fetch size

Controls the amount of rows that are retrieved from the database and stored in memory at once.

  • dbptk.jdbc.fetchsize.default (Integer) - the first fetch size to try (default: 0, which means "use the default value suggested/calculated by the driver")
  • dbptk.jdbc.fetchsize.small (Integer) - the second fetch size to try, in case the first one caused an issue (default: 10)
  • dbptk.jdbc.fetchsize.minimum (Integer) - the last fetch size to try, in case the second one also caused an issue. This is the last try before giving up on fetching information from this table (default: 1)

Setting dbptk.jdbc.fetchsize.default to 1 fetches one row at a time, using minimal memory during the conversion but taking longer to convert the database.

For more details check https://github.com/keeps/db-preservation-toolkit/pull/292