Colour-Science object detection technology 

 

 

We have developed several applications which allow us to detect objects or to segment images in different important image parts. The object detection implementation inside i2e can also be trained to detect other objects then faces. We have for example also trained the recognition of eyes. Colour-Science has a 40’000 image database to train object recognition. In this database the exact face, eye and mouth coordinates are known.

 

For image enhancement and image classification it is very important to detect important objects or parts of the image. Only like this it is possible to do a good image enhancement.

 

See also the following documents:

 

colour-science face detection >>>

 

colour-science eye detection >>>

 

I2e Object and Face detection

 

The i2e object detection module consists mainly of two parts.

 

1) The training tools which are needed to train the detection engine for some specific objects (eg. faces, eyes, logos and others)

 

2) The object recognition engine which will attempt to recognise the trained objects in bitmap images.

 

The training tools are mainly written in Microsoft C or Visual Basic. The imaging and object recognition functions which are used come either from i2e routines or from optimized Intel routines. The Intel routines used come from the Intel Open Source Computer Vision Library and use the speed optimizes Intel Performance Primitives when executed on Pentium 4 or later Intel processors


Training tools

The training tools consist of a set of programs used to mark objects in a image database and then use those manually marked objects to generate a train file with the exact coordinates of the objects. In this example we show how the object recognition engine is trained with faces

 

To train the object recognition we need about 7000 different faces. We have a digital image database of about 50’000 high resolution images from different camera brands (Sony, Canon, Nikon, HP ..). Now the first step is to manually go through this database and mark all faces.

For this we have a software called: “Object coordinate retrival software

 

 

With this software we select positive sample images (the ones with faces in it) and mark manually the eyes and the centre of the mouth. We can also classify negative sample images which are the ones which do not contain any face.

 

At the end we have a database which describes for all of our images if they contain faces and the exact position of the faces.

 

In a second step we will now extract a selection of face images. For this we have a second software tool. The software is called “Create sample training file software” This tool allows you to show the manually selected eye and mouth coordinates (-> red rectangle) and it will also create a bigger box shown in green which shows the whole face. The green box will sort of float around the red box to cover best the whole face region.

 

 

If you want now to train the object recognition engine then you have to export a selection of face images from the image database. With the software you can export statistics about the ratio of eye_distance/eye_to_mouth_distance, eye tilt angle face contrast and face size.

Discriminating those values you can for example only export big frontal faces or you can export tilted faces …

An interesting thing is the ratio of eye_distance/eye_to_mouth_distance. By sorting our face image database we have seen that this ratio can be used to sort faces by age. The mean ratio is 0.9 for the whole database. Children have a larger head with ratios up to 1.3 and adults have normally a slimmer, longer head with ratios down to 0.7. By extracting this ratio out of images you could sort the images by age of the people on the image which is quite a nice idea.

 

You have also to extract negative sample images which contain no faces at all.

 

For a good training set about 6000 positive face samples and 4000 negative samples are needed.

 

The positive samples can be exported in one database file. This file will then contain about 24x24pixel black and white subimages of the face regions.

 

 sample of the face region of the above image

 

 

Once you have built your positive and negative image data sets you can begin to train the classifier.

 

For this a program called “Train object recognition engine” is used.

 

Test software

 

To test the i2e image enhancement functions and object recognition functions we have a software called “i2e Static Library Test”.

 

With this software you can load a selection of images and perform the enhancement and object recognition functions.

 

 

 

The software allows also exporting all the intermediate maps which are used during the processing and validation cycles.

 


A sample result:

 

Original image:

 

enhanced image with detected face rectangles


Skin validation maps of the four face rectangles

      

 

edge map

skin color map

 

shadows map


vegetation map

 

 

 

Object recognition stages:

 

I2e Image enhancement

For an optimal detection we need to have a good image quality. Therefore the images are pre corrected. Contrast is stretched. Underexposures or overexposures are corrected and color casts are removed.

Pre rotation of the image

Because many images are +-90° rotated the image has to be presented to the object recognition algorithm in upright position. For this we need to pre rotate the image. The interesting about this function is that we can detect rotated images and it would be possible to automatically rotate them in upright position for the customer.

Feature scaling and scanning of the image

We need to be able to detect different sizes of faces. To make this possible the features itself are scaled.

Pre processing of the scan window

In order to attain a high recognition rate the scanned image regions must be enhanced itself. By doing this it is for example possible to correct locally the density of faces in the dark background.

Object detection

The scan wildow is slided over the image in different scales and the object detection probabilities are summed in a probability map

Combination of overlapping detections

The probability map will normally show several nearby probability peaks which would result in several overlapping windows. To avoid this, overlapping windows are approximated with one big rectangle

Validation of detections

As the last step it is important do remove as much “false positive” detections as possible. Because the detection algorithm only looks at density changes one validation is for example to check that a minimum of skin colored pixels are present in the rectangle.