Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

⛔ [DEPRECATED] Dockerfile for running example Senzing Jupyter notebooks.

License

Notifications You must be signed in to change notification settings

senzing-garage/docker-jupyter

Repository files navigation

docker-jupyter

⛔ Deprecated

Although Senzing can be used with Jupyter, Jupyter notebooks are not part of the Senzing product.

If you are beginning your journey with Senzing, please start with Senzing Quick Start guides.

No Maintenance Intended

Overview

The docker-jupyter repository holds example Senzing Jupyter notebooks in the notebooks subdirectory.

The senzing/jupyter docker image is a Senzing-ready image hosting the example Senzing notebooks.

These notebooks are built upon the DockerHub Jupyter organization docker images. The default base image is jupyter/minimal-notebook. There is more information on the Jupyter Docker Stacks.

In addition, the Jupyter notebooks can be viewed on nbviewer.jupyter.org. For example, visit Senzing examples on NbViewer.

Related artifacts

  1. DockerHub

Contents

  1. Expectations
    1. Space
    2. Time
    3. Background knowledge
  2. Demonstrate using Docker
    1. Initialize Senzing
    2. Configuration
    3. Volumes
    4. Docker network
    5. Database support
    6. Run docker container
    7. Run Jupyter
    8. Guides and References
  3. Develop
    1. Prerequisite software
    2. Clone repository
    3. Develop notebooks on host system
    4. Build docker image for development
  4. Examples
  5. Errors
  6. References

Legend

  1. 🤔 - A "thinker" icon means that a little extra thinking may be required. Perhaps you'll need to make some choices. Perhaps it's an optional step.
  2. ✏️ - A "pencil" icon means that the instructions may need modification before performing.
  3. ⚠️ - A "warning" icon means that something tricky is happening, so pay attention.

Expectations

Space

This repository and demonstration require 9 GB free disk space.

Time

Budget 40 minutes to get the demonstration up-and-running, depending on CPU and network speeds.

Background knowledge

This repository assumes a working knowledge of:

  1. Jupyter
  2. Docker

Demonstrate using Docker

Initialize Senzing

  1. If Senzing has not been initialized, visit "How to initialize Senzing with Docker".

Configuration

Configuration values specified by environment variable or command line parameter.

Non-Senzing configuration can be seen at Jupyter Docker Stacks

Volumes

  1. ✏️ Specify the directory containing the Senzing installation. Use the same SENZING_VOLUME value used when performing "How to initialize Senzing with Docker". Example:

    export SENZING_VOLUME=/opt/my-senzing
    1. Here's a simple test to see if SENZING_VOLUME is correct. The following commands should return file contents. Example:

      cat ${SENZING_VOLUME}/g2/g2BuildVersion.json
      cat ${SENZING_VOLUME}/data/3.0.0/libpostal/data_version
    2. ⚠️ macOS - File sharing must be enabled for SENZING_VOLUME.

    3. ⚠️ Windows - File sharing must be enabled for SENZING_VOLUME.

  2. Identify the data_version, etc, g2, and var directories. Example:

    export SENZING_DATA_VERSION_DIR=${SENZING_VOLUME}/data/3.0.0
    export SENZING_ETC_DIR=${SENZING_VOLUME}/etc
    export SENZING_G2_DIR=${SENZING_VOLUME}/g2
    export SENZING_VAR_DIR=${SENZING_VOLUME}/var

Docker network

🤔 Optional: Use if docker container is part of a docker network.

  1. List docker networks. Example:

    sudo docker network ls
  2. ✏️ Specify docker network. Choose value from NAME column of docker network ls. Example:

    export SENZING_NETWORK=*nameofthe_network*
  3. Construct parameter for docker run. Example:

    export SENZING_NETWORK_PARAMETER="--net ${SENZING_NETWORK}"

Database support

🤔 Optional: Some database need additional support. For other databases, these steps may be skipped.

  1. Db2: See Support Db2 instructions to set SENZING_OPT_IBM_DIR_PARAMETER.
  2. MS SQL: See Support MS SQL instructions to set SENZING_OPT_MICROSOFT_DIR_PARAMETER.

Run docker container

  1. ✏️ Set environment variables. Example:

    export JUPYTER_NOTEBOOKS_SHARED_DIR=$(pwd)
    export WEBAPP_PORT=8888
  2. 🤔 Optional: Run Jupyter without token authentication. Example:

    export JUPYTER_PARAMETERS="start.sh jupyter notebook --NotebookApp.token=''"
  3. Run docker container. Example:

    sudo docker run \
      --interactive \
      --name senzing-jupyter \
      --publish ${WEBAPP_PORT}:8888 \
      --rm \
      --tty \
      --volume ${JUPYTER_NOTEBOOKS_SHARED_DIR}:/notebooks/shared \
      --volume ${SENZING_DATA_VERSION_DIR}:/opt/senzing/data \
      --volume ${SENZING_ETC_DIR}:/etc/opt/senzing \
      --volume ${SENZING_G2_DIR}:/opt/senzing/g2 \
      --volume ${SENZING_VAR_DIR}:/var/opt/senzing \
      ${SENZING_NETWORK_PARAMETER} \
      ${SENZING_OPT_IBM_DIR_PARAMETER} \
      ${SENZING_OPT_MICROSOFT_DIR_PARAMETER} \
      senzing/jupyter ${JUPYTER_PARAMETERS}

Run Jupyter

  1. If no token authentication, access your jupyter notebooks at: http://127.0.0.1:8888/

  2. If token authentication, locate the URL in the Docker log. Example:

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://(a152e5586fdc or 127.0.0.1):8888/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    Adjust the URL. Example:

    http://127.0.0.1:8888/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    Paste the URL into a web browser.

Guides and References

The Jupyter notebooks in notebooks/senzing-examples are of two types:

  1. References - Information on specific method invocations and their parameters. Examples:
    1. G2Config reference
    2. G2Engine reference
  2. Guides - Illustrations of how to use methods to accomplish tasks. Often points to appropriate "Reference" entries for specific method invocations. Examples:
    1. G2Config add data source
    2. G2Engine add record

Develop

Prerequisite software

The following software programs need to be installed:

  1. git
  2. make
  3. docker
  4. jupyter notebooks

Clone repository

For more information on environment variables, see Environment Variables.

  1. Set these environment variable values:

    export GIT_ACCOUNT=senzing
    export GIT_REPOSITORY=docker-jupyter
    export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git
    export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}"
  2. Follow steps in clone-repository to install the Git repository.

Develop notebooks on host system

  1. Set environment variables for senzing directories. See Volumes. Example:

    export SENZING_VOLUME=/opt/my-senzing
    
    export SENZING_DATA_DIR=${SENZING_VOLUME}/data
    export SENZING_DATA_VERSION_DIR=${SENZING_DATA_DIR}/3.0.0
    export SENZING_ETC_DIR=${SENZING_VOLUME}/etc
    export SENZING_G2_DIR=${SENZING_VOLUME}/g2
    export SENZING_VAR_DIR=${SENZING_VOLUME}/var
  2. Set environment variables. Example:

    export PYTHONPATH=${SENZING_G2_DIR}/python
    export LD_LIBRARY_PATH=${SENZING_G2_DIR}/lib:${SENZING_G2_DIR}/lib/debian
    export SENZING_SQL_CONNECTION="sqlite3://na:na@${SENZING_VAR_DIR}/sqlite/G2C.db"
  3. Start juypter notebook. Example:

    cd ${GIT_REPOSITORY_DIR}
    
    jupyter notebook

Build docker image for development

  1. Option #1: Using docker command and GitHub.

    sudo docker build --tag senzing/jupyter https://github.com/senzing/docker-jupyter.git#main
  2. Option #2: Using docker command and local repository.

    cd ${GIT_REPOSITORY_DIR}
    sudo docker build --tag senzing/jupyter .
  3. Option #3: Using make command.

    cd ${GIT_REPOSITORY_DIR}
    sudo make docker-build

    Note: sudo make docker-build-development-cache can be used to create cached docker layers.

Examples

Errors

  1. See docs/errors.md.

References

  1. A gallery of interesting Jupyter Notebooks
  2. Senzing notebooks
  3. IJava setup