OpenCV Android Object recognition

Face detection on Android with Kotlin

Posted on by Peter Tokaji

Introduction

In this article, we will take a tour around the most widespread use case of machine learning, computer vision. The FaceID authentication feature of the iPhone X, and the Google Lenses object recognizer are accurate real-life examples of different fields of image processing algorithms in action. Most of these technologies run in the cloud, or need a huge amount of computational power for training with input data and creating a model. But nowadays, mobile phones or bigger IoT devices have the CPU and memory capacity to use these models of this on-device. This post demonstrates the steps and the possibilities of implementing the face recognition on an image in an Android environment.

Where should we start?

OpenCV is one of the most famous open source libraries for computer vision, with wrappers for a wide variety of programming languages (C++, Python, Java, etc.), and supports all of the major operating systems (Linux, Mac OS, Windows, Android, and iOS). I chose to use the Java wrapper of the library, which, of course, also means Kotlin support. To use it, just add these two lines into your build.gradle file and you're good to go!

implementation "org.bytedeco:javacv:1.4.1"
implementation "org.bytedeco.javacpp-presets:opencv:3.4.1-1.4.1:android-arm"

Side note, the x86 architecture or 64 bit also supported, but they have separeted libraries, which you need to add to your dependencies.

Setup

First of all, we will need a picture for finding the face or faces if more people appear on it. We'll use the default camera for this purpose. For an Android developer, this should be nothing special, so we'll focus on the interesting parts.
Steps:

  1. Ask for camera permission
    Straightforward, for taking a picture, the app needs the user agreement for using the built-in camera.
  2. Camera intent with URI
    Before opening the camera, we need to create a file for the captured image and pass it with the IMAGE_CAPTURE Intent.
  3. Do stuff with the resulting image
    After taking a photo, the app will receive and process it.

The last step is a bit tricky, different manufacturers and different camera app returns the image in different orientations. This means that for example on a Samsung Galaxy S8, a photo taken in portrait mode will result in landscape mode with a 90-degree rotation, but if you use a Nexus 5X with the same orientation, it is in portrait mode by default. Fortunately, all this information is encoded into the EXIF data of the picture, which can be extracted easily. Here is the related code snippet:

val bitmap = MediaStore.Images.Media.getBitmap(this.contentResolver, uri)
val inputStream = contentResolver.openInputStream(uri)
return when (ExifInterface(inputStream).getAttributeInt(ExifInterface.TAG_ORIENTATION, ExifInterface.ORIENTATION_UNDEFINED)) {
    ExifInterface.ORIENTATION_ROTATE_90 -> 
        Bitmap.createBitmap(bitmap, 0, 0, bitmap.width, bitmap.height, Matrix().apply { postRotate(90F) }, true)
    ExifInterface.ORIENTATION_ROTATE_180 -> 
        Bitmap.createBitmap(bitmap, 0, 0, bitmap.width, bitmap.height, Matrix().apply { postRotate(180F) }, true)
    ExifInterface.ORIENTATION_ROTATE_270 -> 
        Bitmap.createBitmap(bitmap, 0, 0, bitmap.width, bitmap.height, Matrix().apply { postRotate(270F) }, true)
    else -> bitmap
        }

It's a necessary step, not just for showing the photo to the user in the right orientation, but for using JavaCV as well - facial recognition only works with a proper orientation.

The best part

After solving the orientation issue, we need a mathematical, matrix-based representation of the image. We'll use the AndroidFrameConverter class to convert the Bitmap to a Frame, which then we'll convert into a Mat in the next step. So let's see how the detection workflow itself looks like!

Loading the model

The very first step is initializing the classifier with a model. In our case, the model is a mathematical representation of thousands of faces compressed as an XML file. Fortunately, its size is less than 1 MB, so neither the storage, nor the opening process consumes a lot of memory or CPU.

Detecting faces

Important: to increase the accuracy of detection, it is worth to convert the input image to grayscale - you can achieve this with a proper ColorMatrixColorFilter.

We're all set now. The next step is to call the function doing the actual work:

detectMultiScale(
  grayScaled, // input image
  rectangles, // output rectangle
  1.2, // scale factor
  10, // minimum neighbors.
  0, // flags
  Size(40, 40), //minimum size
  null //maximum size
)
  1. inputImage - the grayscaled input image mentioned above.
  2. ouputRectangles - the output rectangle vector of the faces, we'll talk about this in details later.
  3. scaleFactor (default=1.1) - this parameter sets the scaling step size. The algorithm scales the image multiple times and runs the search flow on every resolution. Setting it to a low value increases the probability to find a face on the picture, but also increases run time.
  4. minNeighbors (default=3) - control the false positive and true positive face detected ratio. False positive is the incorrectly detected face, when the library mark an object whihc is actually not face. Increasing the number results in finding faces with less possibility but with better quality and accuracy.
  5. flags - for an older implementation, currently not in use
  6. minSize - the minimum size of a detected face area
  7. maxSize - the maximum size of a detected face area

The output of the algorithm fills the rectangles vector with rectangles for each face detected. These rectangles represent the physical location of the face in the input image. You can crop them out, draw a rectangle around the head or use that information as you wish.
As you can see, this is a blocking method and unfortunately there's no asynchronous version. Using Kotlin Coroutines for resolving this issue would be an excellent choice, for example:

launch {
    val numberOfFaces = FaceDetection.detectFaces(mat)
    runOnUiThread {
        facesValue.text = numberOfFaces.toString()
    }
}

What else?

This example just scratches the surface of the possibilities of JavaCV and computer vision in mobile apps. In the future, I'm considering to add gender and age detection to the app, and explore some of the further features of the library. There are a couple of challenges that need to tackle for age or gender detection in mobile environment but stayed tune for part 2.

Conclusion

I hope I was able to demonstrate the ease of integration and usage of the OpenCV computer vision library, and give a little insight into what is possible with this technology, even in a mobile environment. The sample codes and the whole app is available in my Github repo.