Skip to content

How to recognize custom objects

Pete Warden edited this page Apr 7, 2014 · 12 revisions

The real power of the convolutional neural network approach to image recognition comes from its flexibility. It's fairly easy to retrain the top levels of a network to spot new kinds of objects, even on a low-powered mobile device. I'll show you how you can use the LearningExample application to spot an object or logo you care about, and then how to add that capability to your own application.

Getting started

You'll need XCode and an iPhone, preferably a 5 or 5S for best performance. Copy this git repository to your local machine, load the LearningExample project from the examples folder, build it, and run it on your iPhone.

Learning from positive examples

You should see a screen like this. The first thing you need to do to create your model is capture 100 frames that contain the object you want to recognize. These are the 'positive' images, and once you press the 'Start Learning' button, the phone will capture whatever's in the viewfinder as an example of what it should be looking for.

Picking good positive examples is an art form, not a science, but here are some tips. You should think about how the object you want to recognize is likely to appear when users are pointing their phones at it. For this example I'll be using a wine bottle, and I've made the choice that I want it to be good at recognizing an upright wine bottle from a couple of feet away, since that's the likely way they'll appear in the restaurant photos I happen to be interested in. That means I won't try to put the bottle on its side, take pictures from above, or do close-ups of the label when I'm collecting examples.

It is important to make sure that it's really the bottle that it's recognizing and not other objects in the background though. To help with this I am going to shoot from different angles so it's in front of different objects, and move the bottle around my desk a bit to vary the lighting.

Once you have a plan for collecting your positive images, press the 'Start Learning' button and the app will continuously capture examples frames over the course of a minute or so, depending on the processing speed of your phone. You'll see a progress bar at the top, once that's completely blue the message should change again.

Learning from negative examples

You've collected enough images that contain the object you care about, now you need to help the system understand how images that don't contain the object are different.

Again, this is an art not a science, but the basic idea is that you want to mimic the kind of situations your application's users are likely to encounter. I normally hide the object out of sight (in this case I popped the wine bottle under my desk) and then scanned around my workspace. I took care to use some of the same backgrounds I shot the wine bottle against in the positive phase, so that objects appearing there were less likely to be mis-identified as the thing I actually care about just because they're in a lot of the positive shots too. I also swing around and get some shots of the ceiling, floor and out the window so there's some variety in what's excluded from the wine bottle detection. Once you're ready, press the 'Continue Learning' button, and it will take around a minute to capture all the background images.

Testing the predictor

Once you've captured all the examples, the custom object recognizer code should start running automatically. As you pan the phone around the scene, you should see the top bar peak when an object is in shot, and remain lower when it's not present.

See if the estimates seem reliable enough for your application, and don't worry too much if the first attempt isn't perfect. By running through the training again and fine-tuning exactly which positive and negative images you use, you can often improve the results a lot. For example if there is a particular object that it consistently mistakes for the one you care about, try focusing on that for a few seconds during the negative phase. If it's bad at recognizing the thing you're looking for from particular angles, devote more time to those. If you're still struggling you can try increasing the constants at the top of SquareCamViewController.m that control how many images are captured - kPositivePredictionTotal and kNegativePredictionTotal. That will mean the training will take proportionally longer, but may give better results. You could also try capturing positive and negative images yourself offline, and loading them from resources, but that's not supported by this example app.

Saving the predictor

Once you're happy with the prediction, you'll need to save the model that you've created so you can load it in your own application. Since the iPhone file system is read-only, the parameters that define your custom predictor are written out to the developer console. Once you've trained a predictor, open up the console in XCode, and look for the line '------------- SVM File output - copy lines below ------------'. Starting on the line below, select the output until the end, which may be several thousand lines.

...

Once they're all selected, copy the text onto the clipboard. Then open up a new plain text document in TextEdit.app, and paste in all the lines. Save that document as a .txt file into your app's XCode project (making sure it's added as a resource to the copy resource phase), and then you should be able to call jpcnn_load_predictor() in your application to load the prediction model. Then you can run the pre-trained neural network on your incoming images, and then your prediction model, to get custom object recognition in just a few hundred milliseconds!

For a more detailed view of the code involved, check out the SavedModelExample sample project.