pipe-segment

This repository contains the segment pipeline, a dataflow pipeline which divides vessel tracks into contiguous "segments", separating out noise and signals that may come from two or more vessels which are broadcasting using the same MMSI at the same time.

If you are going to be contribuiting, jump directly to the How to contribute section. If you just want to run the pipeline, use the following instructions.

How to run

First, make sure you have git installed, and configure a SSH-key for GitHub. Then, clone the repository:

git clone git@github.com:GlobalFishingWatch/pipe-segment.git

Dependencies

Install Docker Engine using the docker official instructions (avoid snap packages) and the docker compose plugin. No other dependencies are required.

Google Cloud setup

The pipeline reads it's input from (and write its output to) BigQuery, so you need to first authenticate with your google cloud account inside the docker images.

Create external volume to share GCP authentication across containers:

docker volume create --name=gcp

Run authentication service

docker compose run gcloud auth application-default login

Configure the project:

docker compose run gcloud config set project world-fishing-827
docker compose run gcloud auth application-default set-quota-project world-fishing-827

Building docker image

To build the docker image, run:

docker compose build

CLI

The pipeline includes a CLI that can be used to start both local test runs and remote full runs.

Wtih docker compose run dev --help you can see the available processes:

$ docker compose run dev --help
Available Commands
  segment                     run the segmenter in dataflow
  segment_identity_daily      generate daily summary of identity messages
                              per segment
  segment_vessel_daily        generate daily vessel_ids per segment
  segment_info                create a segment_info table with one row
                              per segment
  vessel_info                 create a vessel_info table with one row
                              per vessel_id
  segment_vessel              Create a many-to-many table mapping between
                              segment_id, vessel_id and ssvid

If you want to know the parameters of one of the processes, run for example:

docker compose run dev segment --help

How to contribute

The Makefile should ease the development process.

Git Workflow

Please refer to our git workflow documentation to know how to manage branches in this repository.

Setup the environment

Create a virtual environment:

make venv
. .venv/bin/activate

Authenticate to google cloud and set up project (not necessary if you already did it on this machine):

make gcp

Install dependencies:

make install

Run unit tests:

make test

Alternatively, you can run the unit tests inside the docker container:

make build
make testdocker

Run all tests in docker including ones that hit some GCP API (currently failing).

make testdocker-all

Updating dependencies

The requirements.txt contains all transitive dependencies pinned to specific versions. This file is compiled automatically with pip-tools, based on requirements/prod.in.

Use requirements/prod.in to specify high-level dependencies with restrictions. Do not modify requirements.txt manually.

To re-compile dependencies, just run

make requirements

If you want to upgrade all dependencies to latest available versions (compatible with restrictions declared), just run:

make upgrade-requirements

Schema

To get the schema for an existing bigquery table - use something like this

bq show --format=prettyjson world-fishing-827:pipeline_measures_p_p516_daily.20170923 | jq '.schema'`

Name		Name	Last commit message	Last commit date
Latest commit History 587 Commits
.github/workflows		.github/workflows
assets		assets
examples		examples
notebooks		notebooks
pipe_segment		pipe_segment
qa		qa
requirements		requirements
sandbox/stats		sandbox/stats
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
CHANGES.md		CHANGES.md
Dockerfile		Dockerfile
GIT-WORKFLOW.md		GIT-WORKFLOW.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
activate_venv.sh		activate_venv.sh
cloudbuild.yaml		cloudbuild.yaml
codecov.yml		codecov.yml
docker-compose.yaml		docker-compose.yaml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pipe-segment

How to run

Dependencies

Google Cloud setup

Building docker image

CLI

How to contribute

Git Workflow

Setup the environment

Updating dependencies

Schema

About

Releases 25

Packages

Contributors 8

Languages

License

GlobalFishingWatch/pipe-segment

Folders and files

Latest commit

History

Repository files navigation

pipe-segment

How to run

Dependencies

Google Cloud setup

Building docker image

CLI

How to contribute

Git Workflow

Setup the environment

Updating dependencies

Schema

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 25

Packages 0

Contributors 8

Languages

Packages