Computer vision is fun (and difficult) but what's even more fun (and also difficult) is to find a way to integrate it with one of the most popular video game development suites available that is Unity 3D.
To do just that, you will need to have an OpenCV C# wrapper for Unity and so this is where Emgu CV comes in.
Emgu CV is a dual license cross platform .Net wrapper to the OpenCV image processing library. The dual license is for both open source and commercial development purposes.
If you're aiming for the open source license, you can use Emgu CV for free under the version 3 of the GNU General Public License. The down side is that you are also required to share your source code to the public which means any application that you have created with Emgu CV must be distributed for free.
Don't like the idea of giving away things for free? Then you should try to purchase the commercial license but be prepared to wave goodbye to $399 per single developer or $799 per work group from your coffers. The commercial license is also available at Unity Asset Store.
Unless you have hundreds of cash lying around, the commercial license is neither a viable option. Another alternative is to use OpenCvSharp. Although unlike Emgu CV, the Unity support for OpenCVSharp is almost non-existent therefore you will need to find a workaround of sorts.
Anyway, I decided to play around with Emgu CV and honestly it works quite well than I expected.
In case you're wondering whether I bought the license or not, the answer is obviously NO. But I will definitely try and create my own OpenCV C# wrapper once I have enough time and diligence to do so.
With that being said, in the spirit of open source community I will share some Emgu CV examples on how to create face tracker and pedestrian detector from a video source, i.e. webcam. For this tutorial, I am using Unity version 5.5.2f1 in Windows 10 environment.
Emgu CV has been around for quite a long time now so some of the tutorials in the website are slightly outdated but you could find the updated documentation using the link below:
INSTRUCTION STARTS HERE:
Setting up Emgu CV with Unity is quite straightforward as long as you copy the necessary dlls to your Unity project assets folder.
At the time of this writing, I'm currently using Emgu CV version 126.96.36.1994 therefore the required dlls may vary:
You should also copy the 'System.Drawing.dll' from the Unity root folder.
We will be using Haar Cascade classifier, so copy the "haarcascade_frontalface_alt.xml" file into the assets folder.
Next, we create a new environment variable in our system environment called 'Emgu_Dir' pointing to the Emgu CV directory:
Also, add the Emgu CV bin directory to the 'Path' environment variable. Make sure to set the directory according to your machine's architecture, in this case, mine was x64:
Once you have done with those steps, we then create a new C# script and lets name it as faceDetect.cs. I'm using Visual Studio 2015 so make sure to restart Unity afterwards if your Visual Studio project solution does not reload automatically.
After restarting Unity, open up the faceDetect.cs file as we are going to write the following script below:
For this tutorial, we will be implementing Haar-based cascade classifier for face detection. First, we need to create a new object from the VideoCapture class and pass the value '0' to denote the default camera used on your machine. If we want to use a secondary camera, just change the value depending on the camera device number listed in the log (Please check Unity documentation on how to check for list of camera devices).
Since we've already copied the classifier xml file into the data path, we only need to append the classifier's file name in order to load the Haar cascade classifier. Next, we initialize the frame width and height in the video stream and then finally, we start the video capture from our camera.
TIP: With a little bit of modification to the script, you may use different classifiers to detect eyes or even cats but for now, lets keep things simple.
In our Update function, we invoke a custom method called faceDetector. So in faceDetector, we use QueryFrame to capture BGR image which is a Mat object and convert it to raw byte images. Then, we convert the raw byte images to grayscale since Haar-based cascade classifier works just fine with grayscale images.
In computer vision, converting to grayscale image is ideal in most cases because it has one channel ranging from 0 to 255 pixel values in comparison to RGB which has three channels (four if you're counting alpha channel). So in this case, we don't need to process 256 pixel values three times and that's the beauty of Haar-based cascade classifier in terms of performance.
Notice that the order of RGB in OpenCV is inverted, which is BGR. One of the reasons why is that OpenCV has a very long history developing their tools with BGR colour format and also it has some correlation with horses' ass but you'll understand what I mean by following this link:
Interesting trivia. Moving on.
The DetectMultiScale method is used to detect objects of different sizes in the input image and these detected objects are returned as a list of rectangles or bounding boxes. Each parameter for DetectMultiScale is briefly explained in the documentation:
Next, we create a foreach loop that draws a green bounding box every time an object (face) is detected. Then, we convert the pixels of the current frame to bitmap, store them in memory as raw byte array and load them into the material texture. Notice that we are converting this to raw pixel data because we are processing live video stream therefore performance is at utmost importance.
Finally, in the OnDestroy() function, we flush the data from memory and stop the video capture respectively.
Setting up the Unity scene:
To "activate" the script, first we create a 3D object like a plane or cube. For simplicity's sake, let's create a 2D plane instead:
Next, we create a new empty material and lets just call it 'white':
Assign the 'white' material to the plane and turn off "Receive Shadows". In the Plane's inspector window, we see that the material variable was exposed from the script properties and this is where we attach the 'white' material in the 'Mt' input box. This allows the video stream to be rendered onto the Plane's texture:
Hit the play button to run the scene and you should see a green rectangle following your face around the screen. By the way, that's me!:
Pretty neat, right? One caveat is that the face tracking does not perform well under low lighting condition as you will notice a slight drop of frame rate. Speaking of performance, there is also a CUDA version of Haar Cascade face detection but I'm sure you can implement it by yourself since the implementation are quite similar to this one.
In the second part of this blog post, I will show how to create a simple pedestrian detector script. Hope you guys enjoy reading this post and until next time, happy coding!