Skip to content
This repository has been archived by the owner on Jun 20, 2020. It is now read-only.

Tutorial: Training LEAP from scratch

Talmo Pereira edited this page Jun 7, 2018 · 2 revisions

LEAP works by operating on images captured from any source. Here we'll be walking through how to use LEAP when you're starting from just a single video of an animal you want to track.

Before starting, make sure you've installed all the requirements and downloaded this repository.

Note: LEAP and the documentation are a work in progress! We'll be updating this as development progresses.

Overview

The main steps of the workflow are as follows:

  1. Create a dataset for labeling via cluster sampling. This can be skipped if you want to work directly on your video frames or if your data are not egocentrically aligned (see The data section).
  2. Create a skeleton defining the points we want to track. This step is mainly necessary for visualization.
  3. Label a few frames and fast train. This is repeated a few times until the predictions from LEAP are accurate enough.
  4. Apply the trained network to new/unlabeled data.

The data

In this example, we'll be starting from an aligned movie of our beautiful housecat, Tyson (named after Neil de Grasse, not Mike), captured as a depth map using a Microsoft Kinect (v2):

Color corresponds to distance from the camera. Not visible in infrared: a dastardly laser point that seems to constantly elude capture by our protagonist.

Download: box.h5 (105 MB)

These data are stored in an HDF5 file which is the primary data format that LEAP uses both for inputting data and saving it out. You can think of these as MAT files if you're familiar with MATLAB or NPY files if you're coming from Python/numpy. We use this format because it's supported in every programming language and environment, in addition to being lossless (in contrast to MP4s, AVIs, etc.).

We'll be providing code and instructions for converting your videos to this format in a coming guide on preprocessing. Some handy tools in the meanwhile: VideoReader, h5create, h5write.

Using the handy built-in MATLAB functions for working with HDF5 files we can inspect the datasets contained in the file:

>> h5disp('box.h5')
HDF5 box.h5 
Group '/' 
    Attributes:
        'exptPath':  'D:\data\CatNect\20180217_225142'
        'dataPath':  'D:\data\CatNect\20180217_225142\20180217_225142.mat'
    Dataset 'box' 
        Size:  192x192x1x2085
        MaxSize:  192x192x1x2085
        Datatype:   H5T_IEEE_F32LE (single)
        ChunkSize:  192x192x1x2
        Filters:  deflate(1)
        FillValue:  0.000000
        Attributes:
            'dtype':  'single'
...

What is important to note here is the shape of the input dataset, which is preserved when we read it into MATLAB:

>> box = h5read('box.h5','/box');
>> size(box)
ans =
         192         192           1        2085

The dimensions correspond to the height (192), width (192), channels (1), and frames (2085). Make sure your data are stored in this format! The image dimensions should be divisible by 4 or LEAP will not work! (This is because the network performs down- and upsampling steps.)

The datatype here is single/float32 but note that uint8 formatted images should work perfectly fine with LEAP! Multi-channel (RGB) images are also supported but have not been thoroughly tested (try grayscale first to see if it works with your data -- it's faster!).

You can inspect the first frame in the dataset in MATLAB by displaying it with a scaled colormap since the pixel values are normalized to be between [0, 1]:

>> imagesc(box(:,:,:,1)), axis image, colorbar

Another important note about data shape: MATLAB and Python differ in byte-ordering, which means that the shape of HDF5 datasets will be permuted in Python vs MATLAB. We can see this for our current dataset by opening it in Python:

λ ipython
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import h5py

In [2]: f = h5py.File("box.h5")

In [3]: f["box"]
Out[3]: <HDF5 dataset "box": shape (2085, 1, 192, 192), type "<f4">

Creating a labeling dataset

Once we have the data in a "box" dataset, we're ready to start labeling! Before we leap straight into it however, it is good to consider whether we can do any sort of sampling to ensure that the images we label are as diverse as possible.

In a video, subsequent frames tend to be pretty similar so labeling these won't really be that instructive to the network as would two completely different frames.

Although we could try to skip around the video and find examples that appear different, we could also use a simple statistical technique for accomplishing this step: cluster sampling. In cluster sampling, the basic idea is to group the data by its intrinsic statistics, and then draw an equal number of samples from each group. This way, images that look unique but that are less common have a fair chance of appearing in the labeling set, allowing us to better prioritize our labeling efforts.

We provide a GUI for doing this in LEAP, accessible via the MATLAB command cluster_sample. After typing it in the console, you'll see a window that looks like this:

Select data

We're going to start by clicking Add file... and selecting the box.h5 file containing our video. If you have multiple videos you can add them all here individually or by adding a whole folder. After a few moments, the GUI will update to display a few statistics about how many frames are in the dataset.

Note that all videos must share the same image shape!

Load the data

Having added our video, we can now begin by loading some sample frames from each video. We set the Samples per video field (top-right panel) to a number that will total less than ~10,000 to ensure we don't run out of memory. Click Load samples and wait until a confirmation message is displayed below.

The GUI will try to load as many images as possible with an even stride so we get a uniform sampling across time, reducing the problem mentioned above regarding self-similarity in temporally adjacent images.

PCA

Images are quite large, even at 192 x 192 that's 36864 numbers to describe a single frame! Since most of the pixels don't change much or are correlated when they do, we can use Principal component analysis as a quick an easy way to reduce the dimensionality of the images. PCA can be a little tricky to wrap our heads around, but what we're basically doing is finding a set of common "image weights" that best resemble most of the data.

Click Compute PCA and wait until a message is displayed to indicate it's finished processing. Warning: This may take several minutes with large amounts of data!

After it's done, we can inspect some properties about our data to evaluate how many dimensions faithfully describe our images. Click Plot variance explained to first inspect how the data variance is spread out across the Principal Components:

This plot describes how many dimensions are needed to explain increasing amounts of total variance in the dataset. The goal is to make sure that the blue line captures a reasonably large amount of variance in the data, while keeping the dimensionality low.

It's hard to see the individual components, so let's zoom into the first 100:

Now we see that nearly 90% of the variance is captured in the first 100 dimensions, with most of it being within the first 10. So let's stick with 100 and see how we do.

Next, we can inspect "what" each PC is capturing by clicking Plot eigenmodes:

Each box here is a visualization of the weights that constitutes each PC. Similar colors mean those pixels are correlated. Zooming into the first row, we can see this a little more clearly:

Alright! Make sure the Use these projections for clustering box is checked before we go to the next step, otherwise the app will use the raw images for clustering.

Clustering

Now that we have preprocessed our images into 100-dimensional vectors, we're ready to group them into clusters of similar appearance.

The Number of clusters to use is an arbitrary choice and depends on the data. A good place to start is with 5 and see how well the data are separated. Start by clicking Compute clusters and waiting for the message to appear:

After it's done, the next thing we want to do is to Plot cluster centroids to get a sense for the average image in each group:

As we can see, the clusters do seem to capture a good variety of poses! If you think there is greater variability in your images that you want to separate out, try increasing the number of clusters and re-evaluating.

Now we're ready to save our sampled dataset! Select the number of Samples per cluster, which should be at most the size of your smallest cluster. If the smallest cluster has too few images (<10), try clustering with fewer groups.

We recommend leaving the Sample order on the default Cluster order option, which basically saves the images in alternating group order, such that each consecutive sample comes from a different cluster. The Shuffle order performs a random shuffling after the images are sampled for truly unbiased data ordering (but may take more labeled samples to generalize to all images).

After setting these parameters, click Preview or Save... to generate a sampled dataset and save it to a new file which we'll use for labeling.

Creating a skeleton

In addition to the images we use for labeling, we also need to specify a "skeleton" file which describes the number of points that we'll be labeling for tracking. This file also allows us to specify connections between points for visualization and descriptive names that will be useful later to disambiguate which point is which during labeling and analysis.

To get started, we first launch the GUI by typing create_skeleton in the MATLAB command window:

We need to provide an example image before we can start defining the skeleton. Load up the first frame from the labeled dataset we generated previously:

>> I = h5readframes('catnect_labeling.h5','/box',1);

On the left side of the GUI, click Refresh and you should see the variable containing the image we just loaded appear. Click Import from workspace to load it into the GUI:

The image will show up with the current skeleton overlaid:

The default skeleton is perhaps better suited to a hexapod, so we can just delete all of the body parts and fill out the table with the parts we'll be tracking on this data. Click Select all and then Delete selected to clear the table. Click Add new and edit the name and parent columns until all the parts and their connections are defined:

Reminder: The connections defined by the parent column are optional. They only serve to display the lines for ease of visualization.

The last thing we want to define in our skeleton is the default position of each body part. These are the locations that the skeleton points will show up on any unlabeled image until they're initialized with network predictions. Specify these locations by clicking and dragging the points in the image:

Great! Now just click Save skeleton to file... and we're ready to do some labeling!

Labeling and training

This is the main step of the LEAP workflow. The goal is to label a few images, i.e., place the skeleton points in the correct location within the image, and then train the network to predict these locations on new frames. The predicted locations are displayed as the initial position of the points in unlabeled frames and we just have to correct the ones that are wrong rather than positioning every point from scratch.

We'll start by opening the GUI from MATLAB: label_joints

We'll be prompted to select the sampled dataset from earlier, as well as the skeleton we created in the last step. You will not have to specify the skeleton again the next time we select the labeling dataset.

Once loaded, the full GUI will appear:

There's a lot going on here! First, some basic controls:

  • Click and drag the markers on the main image to place them in the correct location within the image
  • Use the arrows keys (Right/Left) to move forward and backward within the dataset

There are a few more utilities that this GUI provides for more advanced use, but for now let's just jump in and label some images! Move the points around until they are in their correct location:

After completing this frame, move to the next frame and you'll see that the status bar below should update with a green bar and several stats denoting how many images are labeled so far (only images where ALL points are labeled are counted). Each row of the status bar corresponds to a single part. The top bar fits the entire dataset, whereas the bottom one is a zoomed in version centered on the current image index.

Our labels are automatically saved to a MAT file in the same folder as the labeling dataset every time we change frames, so don't worry about losing our work! Keep going until 10 or so frames are labeled.

Now we're ready to do a round of fast training! Click Fast train network to bring up the parameters:

We'll leave all the parameters at their default for this dataset, but if your data differs, adjust these accordingly. The only thing we recommend changing is the Rotation, which you should set to 0 if your data are not egocentrically aligned!

After clicking OK, you'll notice a whole lot of text begin to scroll up in the MATLAB Command Window:

This is the network training progress being reported as it chugs through the 15 epochs we specified. Everything is automated and the learned parameters and other data are saved into a folder created under models/fast_train/ in the parent LEAP directory.

It'll take about 5-20 minutes for LEAP to do its magic, so go grab a coffee while you wait. When you come back, you should see some text reporting the results of our fast training:

These are metrics based on the Euclidean distance error between the predicted and correct locations for all the images we've labeled so far. This gives us an overall idea of how well the network is doing so far so we can get an idea of whether we need to keep going.

Switching back to the GUI, when we go to an unlabeled frame we see that the markers are now colored in yellow and are in different positions than the default ones we specified with our skeleton:

It looks like the network did pretty well, but it misplaced the tip of the tail (it likely got thrown off by the image artifacts). Reposition that point just as we did before by clicking and dragging, and then mark the remaining points as correctly positioned by pressing the F key on your keyboard. All the points will turn green and you'll see the labeled counter increase.

That's it! Just repeat this process a few times until the predictions are accurate enough for analysis.

Apply the trained network

After we've done sufficient labeling and training, we can easily use our trained network on new data by using one of the wrappers provided with LEAP.

In MATLAB, let's load up our original movie and generate predicted locations for each frame:

>> modelPath = 'D:\OneDrive\code\leap\models\fast_train\180603_220044-n=40\final_model.h5';
>> box = h5read('box.h5','/box');
>> preds = predict_box(box, modelPath)
...
preds = 
  struct with fields:

        Attributes: [1×1 struct]
         conf_pred: [6×2085 single]
    positions_pred: [6×2×2085 single]

The resulting structure contains the predicted positions in the preds.positions_pred field. This array is of the shape (body parts) x (x,y) x (frames), where each row is an image coordinate.

We can quickly visualize these predictions by plotting them with our helper function:

>> skeleton = load('cat-skeleton.mat');
>> imagesc(box(:,:,:,1)),axis image,hold on,plot_joints_single(preds.positions_pred(:,:,1),skeleton)

Or you can use our built-in custom video player (scroll with Left/Right arrow keys):

>> vplay(box, @(~,idx)plot_joints_single(preds.positions_pred(:,:,idx),skeleton))

You can also plot the trajectory of a single part by pulling out a single row:

>> figure,plot(squeeze(preds.positions_pred(3,1,:)),squeeze(preds.positions_pred(3,2,:)),'.-')

If you prefer to use Python, check out the documentation in predict_box.py:

>> python predict_box.py -h
Usage: predict_box.py [OPTIONS] box-path model-path out-path 
 
Predict and save peak coordinates for a box. 
 
Arguments: 
  box-path          path to HDF5 file with box dataset 
  model-path        path to Keras weights file or run folder with weights 
                    subfolder 
  out-path          path to HDF5 file to save results to 
 
Options: 
  --box-dset=STR    name of HDF5 dataset containing box images (default: /box) 
  --epoch=STR       epoch to use if run folder provided instead of Keras 
                    weights file 
  --verbose=BOOL    if True, prints some info and statistics during procesing 
                    (default: True) 
  --overwrite       if True and out_path exists, file will be overwritten 
  --save-confmaps   if True, saves the full confidence maps as additional 
                    datasets in the output file (very slow) 
 
Other actions: 
  -h, --help        Show the help 

Happy LEAPing!