Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

【Unofficial】Using Pre trained Models

Shan Yi edited this page Aug 31, 2018 · 1 revision

This document is no longer maintained.

A Complete Tutorial

This tutorial is a step-by-step guide on how to create an online handwriting recognition application using a pre-trained MNIST model.

For all these pre-trained models, please go on reading. We hope they could help you create innovative applications.

1. word2vec

The PaddlePaddle program that trains the model is from Chapter 4 of the PaddlePaddle book.

For a pre-trained model, please download the following files:

We use the Peen Treebank (PTB) (Tomas Mikolov’s pre-processed version) dataset. There are 2073 words in the dictionary.

The learned embedding table is part of the parameters, represents each of the 2073 words by a vector of 32 float values.

Given the learned embedding table, we can compute cosine similarity of two words:

from scipy import spatial
import numpy

def load_dict_and_embedding():
    word_dict = dict()
    with open("word_dict", "r") as f:
        for line in f:
            key, value = line.strip().split(" ")
            word_dict[key] = int(value)

    embeddings = numpy.loadtxt("embedding_table", delimiter=",")
    return word_dict, embeddings

# load word dict and embedding table
word_dict, embedding_table = load_dict_and_embedding()

print(spatial.distance.cosine(embedding_table[word_dict['car']], embedding_table[word_dict['world']]))
print(spatial.distance.cosine(embedding_table[word_dict['say']], embedding_table[word_dict['talking']]))

It would print

0.0698071067872
0.583393289346

The first number shows that "car" and "world" are less correlated, and the second one shows that "say" and "talk" are highly correlated.

2. Image Classification

The PaddlePaddle program that trains the model is from Chapter 3 of the PaddlePaddle book. Please be aware that this program requires CUDA GPU.

For a pre-trained model, please download the following files:

The training data is the cifar10 dataset, which includes 10 classes of 32*32 color images:

Class Label
airplane 0
automobile 1
bird 2
cat 3
deer 4
dog 5
frog 6
horse 7
ship 8
truck 9

This model takes a 32x32x3-dimensional vector as the input, which should be a flatten an image -- the first 1024 (32*32) values in this vector should be the red channel, followed by the green and the blue channel. Within each channel, values are in row-major, so the first 32 values in each channel are from the first row. We need to normalize values into the range [0.0, 1.0].

The model's output is a 10-vector of class possibilities.

We can run a PaddlePaddle server in a Docker container to serve the model.

  1. Download the topology and parameter files to the current directory.

  2. Run the server container

    nvidia-docker run --name=my_svr -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
  3. Check that the server is up and running

    docker logs my_svr

    should print something like

    I0915 19:56:44.282585    66 Util.cpp:166] commandline:  --use_gpu=True
    
  4. Call the server. The following Python script processes an image and sends it to the server:

    import cv2
    import numpy as np
    import json
    import requests
    
    img_file = "./img.png"
    BACKEND_URL = "http://localhost:8000"
    
    img = cv2.imread(img_file)
    img = np.swapaxes(img, 1, 2)
    img = np.swapaxes(img, 1, 0)
    arr = img.flatten()
    arr = arr / 255.0
    req = {"image": arr.tolist()}
    req = requests.request("POST", url=BACKEND_URL, json=req)
    print json.dumps(req.json())

    It prints result label probabilities like

    {
       "code": 0,
       "data": [
         [
           4.751561937155202e-05,
           3.0828364288026933e-06,
           1.4101417946221773e-05,
           0.9994580149650574,
           0.0001739991275826469,
           0.00024292869784403592,
           3.745338835869916e-05,
           1.409481956216041e-05,
           2.895288162108045e-06,
           5.973368843115168e-06
         ]
       ],
       "message": "success"
    }
    

3. Sentiment Classification

The PaddlePaddle program that trains this model comes from Chapter 6 of the PaddlePaddle book.

We can download the following files to get a pre-trained model:

The training data is the IMDB dataset. This model takes a sequence of word indexes as an input and outputs possibilities of the sentiment being positive or negative.

We can run a PaddlePaddle server instance in a Docker container to serve this model:

  1. Download the topology and parameter files through the links above, and copy them to the current directory.

  2. Run the server

    docker run --name my_svr -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=0 paddlepaddle/book:serve
  3. Check that the server is up and running

    docker logs my_svr

    should print something like

    I0915 19:56:44.282585    66 Util.cpp:166] commandline:  --use_gpu=False
    
  4. Call the server. According to the above dictionary, words in the sentence "I like it" corresponds to indices 8, 37, 7. We can send this sentence to the server by running:

    curl -v -H "Content-Type: application/json" -X POST -d '{"word":[8,37,7]}' http://localhost:8000/

    The response should look like:

    {
      "code": 0,
      "data": [
        [
          0.9999890327453613,
          1.0963042768707965e-05
        ]
      ],
      "message": "success"
    }
    

4. Machine Translation

The PaddlePaddle program that trains the model comes from Chapter 8 of the PaddlePaddle book.

We can download the following files to get a pre-trained model:

The training dataset is the WMT-14 dataset (French to English) and got a BLUE score of 26.92. Both the source and target dictionaries have 30000 words.

We can run the PaddlePaddle server in a Docker container:

  1. Download the model files to the current directory.

  2. Run the server

    docker run --name my_svr -v $(pwd):/data -d -p 8000:80 -e WITH_GPU=0 -e OUTPUT_FIELD=prob,id paddlepaddle/book:serve
    
  3. Call the server. For example, words in the French sentence "le temps est très bon" correspond to word IDs 0, 12, 169, 22, 631, and 1, according to the source dictionary. This Python script sends the sentence to the server and prints the translation result:

<s> le temps est bon <e>
0th 0.058041  Time is good . <e>
1th 0.051606  time is good . <e>
2th 0.015649  the time is good . <e>

5. Recognize Digits

The PaddlePaddle program that trains this model comes from Chapter 2 of the PaddlePaddle book.

We can download the following files to get a pre-trained model:

The training dataset is the MNIST dataset, where each image is a handwritten gray-scale digit, cropped into 28x28 pixels.

We can run a PaddlePaddle server instance in a Docker container to serve this model.

  1. Download model files to the current directory.
  2. Run the server
    nvidia-docker run --name my_svr -d -v $PWD:/data -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
  3. Use this Gist to flatten and print an image. You can copy-n-paste the result into the following curl command to send it to the server:
    curl -v -H "Content-Type: application/json" -X POST -d '{"img":[/*your image goes here*/]} http://localhost:8000/'
    The final output should look like the following
    {
    "code": 0,
    "data": [
    [
    0.03862910717725754,
    0.5247572064399719,
    0.04542972892522812,
    0.02226484753191471,
    0.09190762042999268,
    0.01627335511147976,
    0.04605291783809662,
    0.13657279312610626,
    0.04367228224873543,
    0.03444007411599159
    ]
    
    ],
    "message": "success"
    
    }
    

6. Object Detection

The PaddlePaddle program that trains this model is the PaddlePaddle model bank.

alt

Feel free to check out this demo.

We can download the following files to get pre-trained model:

The training dataset is the PASCAL VOC dataset

The labels include:

class label
background 0
aeroplane 1
bicycle 2
bird 3
boat 4
bottle 5
bus 6
car 7
cat 8
chair 9
cow 10
diningtable 11
dog 12
horse 13
motorbike 14
person 15
pottedplant 16
sheep 17
sofa 18
train 19
tvmonitor 20

We can run a PaddlePaddle server in a Docker container to serve this model. This model uses some layers that don't have CPU-only implementations, so we will have to run the server on a CUDA GPU computer:

wget https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/param.tar
wget https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/inference_topology.pkl
nvidia-docker run --name my_svr -d -v $PWD:/data -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu

We can send an image to this server to get its predicted labels. Because images in the training dataset have mean pixels values 104, 117, and 124 for the RGB channels respectively, we'd need to shift RGB values of the input image and normalized in the range [0.0, 1.0]. Also, the input image should be resized or cropped as 300 x 300 pixels.

The server accepts a JSON of the processed the input image:

{
  "image": [ -104, 108, 112, ...]
}

The image field should contain 3x300x300 values.

The following sample program loads, resizes, normalizes, and flattens an image and sends the result to the server for recognition:

import numpy as np
import requests
from PIL import Image
import json

# Change to your backend URL
BACKEND_URL = "http://127.0.0.1:8000"

img = Image.open("test.jpg")
# Resize or crop to 300 x 300
img = img.crop((0, 0, 300, 300))
mean = np.array([104, 117, 124], dtype='float32')[:, np.newaxis, np.newaxis]

# The image shape should be [channel, height, width], i.e., [3, 300, 300]
img = np.swapaxes(img, 1, 2)
img = np.swapaxes(img, 1, 0)

img = (np.array(img) - mean).flatten()
req = {"image": img.tolist()}

req = requests.request("POST", url=BACKEND_URL, json=req)
print json.dumps(req.json())

It should print something like the following:

{
    "message": "success",
    "code": 0,
    "data": [
        [
            0,
            3,
            0.013827803544700146,
            0.914117693901062,
            0.6044294238090515,
            1,
            0.7246007323265076
        ],
        [ ... ],
        ...
    ]
}

The response includes status message and code. The data field is an array of 7-vectors, where each vector corresponds to a detected object, and include the following seven elements:

  1. Always be zero.
  2. A detected label. Please be aware that 0 means background.
  3. The confidence score. The higher, the more confident.
  4. The x-axis of the upper-left position of the detected object.
  5. The y-axis of the upper-left position of the detected object.
  6. The x-axis of the bottom-right position of the detected object.
  7. The y-axis of the bottom-right position of the detected object.