In a previous post I explained how to produce color or intensities histogram of different regions of an image. For example Fig 1, Fig 2 and Fig 3 presents the regions for three different type of region divisions: 1, 3, 8 respectively.

Fig 1. 1x1 region divisions
Fig 2. 3x3 region divisions
Fig 3. 8x8 region divisions

 

 

 

 

 

 

 

The basic goal was to produce small subimages of aproximately the same size and then calculate the histogram over the subimage. In this post I will use the existent function regionImHistogram  to produce features for a set of images that are in the same directory. The images are numerated in the directory to simplify the process and to track it back to the data I have in other tables

For each image I am going to produce a lot of features. The basic idea is to produce histograms of many regions. Concerned of the size of the descriptor I am going to stop using 256 bins for the histograms. Instead of that I am going to use different quantities of bins depending on the regions I am dividing the image. If I have more regions, I will use less bins. More regions also means less pixels, so maybe this will give a little bit more of generalization or statistical power to the feature. Here is the code for 6 different region divisions.

% Parameters:
% - directory with the images
% - number of images on the directory
% - name of the file with the features
function extract_cohs(dir, samples, filename)

% The different amount of regions the image
% is going to be divided
fibs = [1,2,3,5,8,13];
% The bins per region
bins = [128, 64, 32, 16, 8, 4];

% A counter of the number of features added
total = 0;
% The ranges that indicate were the set of features
% per region division are going to be saved
ranges = [6, 2];

% This cycle calculates the ranges in the vector
for fib = 1:size(fibs)(2)
    ranges(fib,1) = total + 1;
    total += 3*fibs(fib)*fibs(fib)*bins(fib);
    ranges(fib,2) = total;
endfor

% create the vector that is going to keep all the samples
histo = zeros(samples, total);

% open each image and process it
for ind = 1:samples
    im = imread(strcat(dir, int2str(ind)));
    for fib = 1:size(fibs)(2)
        histo(ind,ranges(fib,1):ranges(fib,2)) = regionImHistogram(im, fibs(fib), bins(fib));
    endfor
endfor

% save the features
save("-text", filename, "histo");

% save the values of the ranges
save("-text", "ranges.dat", "ranges");

The previous code will generate 6780 features per image and depending on the quantity of images it could take a while. It’s quite straight forward to calculate intensity histograms from this code. Two changes are necessary:

1. Instead of

    total += 3*fibs(fib)*fibs(fib)*bins(fib);

You have to take out the 3*

    total += fibs(fib)*fibs(fib)*bins(fib);

2. After

    im = imread(strcat(dir, int2str(ind)));

You have to transform the image to grayscale

    im = imread(strcat(dir, int2str(ind)));
    if isrgb(im)
        im = rgb2gray(im)
    else

I ll be posting some code to produce edge orientation histogram very soon.

In a previous post I explained how to produce color or intensities histogram of an image. In this post I will post some codes to produce them in different regions. The idea remains the same, however we are going to divide the image in different regions to obtain global and local histograms of the same image.

For example Fig 1, Fig 2 and Fig 3 presents the regions for three different type of region divisions: 1, 3, 8 respectively.

Fig 1. 1×1 region divisions
Fig 2. 3×3 region divisions
Fig 3. 8×8 region divisions

 

 

 

 

 

 

 

The basic goal is to produce small subimages of aproximately the same size and then calculate the histogram over the subimage. Here is the code.

function [data] = regionImHistogram(im, r, bins)

% 1. extract the x and y size of the image
ys = size(im,1);
xs = size(im,2);

% 2. calculate the number of pixels of each region
size = round(ys/r)*round(xs/r)

% 3. create a structure to keep all the histograms
coh = zeros(bins*3, r*r);

% 4. iterate over all the regions
for j = 1:r
    for i = 1:r
        %5. extract the subimage
        % 14/12/2016: this doesn't work - for some reason transform the crop two grayscale
        % clip = im(round((j-1)*ys/r+1):round(j*ys/r),round((i-1)*xs/r+1):round(i*xs/r));

        % 14/12/2016: Use this instead
        clip = imcrop(im,[round((i-1)*xs/r+1) round((j-1)*ys/r+1) round(xs/r)-1  round(ys/r)-1]);

        %6. calculate the histogram and normalize it
        coh(:,(j-1)*r+i) = linearHistogram(clip, bins)/size;
    end
end

% 7. put it all in just one vector
data = zeros(1,numel(coh));
data(:) = coh(:);

Notice that instruction 3 creates a matrix bins*3 x r*r. Remember that the color histograms has a histogram per color so it’s three times the number of bins. And we are going to need one per block (3*3). Instruction 5. extract the subimage. Instruction 6. use the function explained in my previous post to build the color/intensity histogram. Instruction 7. is particularly important because I am generating descriptors for training my Adaboost classifier.

Then, this would be the code to produce the histograms of a 3×3 region division

% open the image
im = imread(path/to/image);
% call the function
linear = regionImHistogram(im, 3, 256);

Note that it is almost the same code to produce an intensity histograms of a 3×3 region division

% open the image
im = imread(path/to/image);
% transform to gray scale
im = im = rgb2gray(im);
% call the function
linear = regionImHistogram(im, 3, 256);

There is going to be a final post of how to use the regionImHistogram to generate multiple histograms of different regions and different amount of bins.

The bins of the color or intensities histograms are common features that are used in many kinds of applications. In my case, I am interested on them because I want them as the input of a classifier of images (Adaboost that I explained in my previous post). For that reason I have a couple extra requirements that I will explain later on this post.

There are different ways of representing the colors of an image. A very common and natural one is the combination of the three primary colors: Red, Green and Blue. Each pixel in an image is basically a combination of values of these three colors. The higher the value, the higher the intensity. These values are always in the range of 0 to 255. Then it is possible to think in a histogram of, for example, red for a particular image. We could have at most 256 bins (potentially less if we want to join several bins into one). Each bin would have the total number of pixels that have a particular intensity of red. Then, we repeat the process for green and blue.

I found this code that is able to build three histograms, one for each color by simple call:

% 1. Read the image
im = imread('/path/to/image');
% 2. Calculate the histogram with 256 bins
hist = imHistogram(im,256);

Moreover, you also can calculate intensities histograms by just transforming the color image into a gray scale image.  The main difference is that instead of three histograms for three colors the method is going to return just one histogram of 256 values with the intensities.

 
% 1. Read the image
im = imread('/path/to/image');
% 2. verify that it is a color image
if isrgb(im)
% 3. transform it
im = rgb2gray(im);
endif
% 4. calculate the histogram with 256 bins
hist = imHistogram(im,256);

This worked perfect on Octave (a free open source alternative to Matlab) but I had two small issues related with my particular problem:

1. I needed a linear representation of the histograms because I wanted to use the values as the input of a classifier (Adaboost that I explained in my previous post).  Instead of three separated histograms represented in different vectors of 256 bin each, I needed just one big vector of 768 with the three histograms next to the other. That was not difficult to solve.

 
% 1. read the image
im = imread('path/to/image');
% 2. calculate the histogram with 256 bins
hist = imHistogram(im,256);
% 3. make the vector of the right size (768)
linear = zeros(1,numel(hist));
% 4. copy the values to the new vector
linear(:) = hist(:);

2. My second problem was a bit more particular. I needed to process a big collection of images and I found out some of them were in gray scale. This cause an inconsistency in the size of the vectors. As I said before, the gray scale just have one value that defines the intensity instead of three values of the three colors. I solved it by copying the vector three times when I found that a image was gray (i.e. the imHistogram returns a vector of 256).

 
% 1. read the image
im = imread('/path/to/image');
% 2. check if it is gray
if !isrgb(im)
% 3. calculate the histogram with 256 bins
hist = imHistogram(im, 256);
% 4. create a new vector of the right size
linear = zeros(1,numel(hist));
% 5. copy the values to the new vector
linear(:) = hist(:);
endif

My final method looks like this:

%linearHistogram.m
function linear = linearHistogram(im, bins)
% check if it is a color image
if isrgb(im)
% calculate the histogram with 256 bins per color
hist = imHistogram(im, bins);
% create a new vector of the right size
linear = zeros(1,numel(hist));
% copy the values to the new vector
linear(:) = hist(:);
else
% calculate the histogram
hist = imHistogram(im, bins);
% create a new vector of three times the size
linear = zeros(1,length(hist)*3);
% set the same histogram to the three sections
linear(1:256) = linear(257:512) = linear(513:768) = hist(:);
endif

You can see that it receives the image as a parameter, so you need 2 instructions to call it.

 
% 1. Read the image
im = imread('/path/to/image');
% 2. Calculate the linear histogram with 256 bins
hist = linearHistogram(im,256);

In my last course of computer vision and learning, I was working on a project to recognize between two styles of paintings. I decided to use the Adaboost algorithm [1]. I am going to describe the steps and code to make the algorithm run.

Step 0. The binary classification

This is not a step, but you have to be clear that this algorithm is just for classifying two classes. For example, ones from zeros, faces from non-faces or, in my case, baroque from renaissance paintings.

Step 1. Prepare the files.

There are several ways of introducing the samples to the algorithm. I found that the easiest way was using simple csv files. Also, you DO NOT have to worry about dividing the samples in training or testing. Just put all in the same files, OpenCV is going to divide the set picking the training/testings samples automatically. Then it is a good idea to put all the samples of the first class at the beginning and the second class at the end.

The format is very simple. The first column is going to be the category (however you can specify the exact column if your file does not follow this format). The rest of the columns are going to be the features of your problem. For example, I could have used three features. Each of them represent the average of red, blue and green per pixel in the image. So my csv file should look like this. Note that in the first column I am using a character. I recommend to do that so OpenCV is going to recognize that is a category (again you could specify that this a category an not a number).

B,124.34,45.4,12.4
B,64.14,45.23,3.23
B,42.32,125.41,23.8
R,224.4,35.34,163.87
R,14.55,12.423,89.67
...

NOTE: For a very strange reason the OpenCV implementation does not work with less than 11 samples. So this file should have at leas 11 rows.  Just put some more to be sure and because you will need to specify a testing set as well.

Step 2. Opening the file

Let’s suppose that the file is called “samples.csv” This would be the code:

 //1. Declare a structure to keep the data
CvMLData cvml;
//2. Read the file
cvml.read_csv("samples.csv");
//3. Indicate which column is the response
cvml.set_response_idx(0);

Step 3. Splitting the samples

Let’s suppose that our file has 100 rows. This code would select 40 for the training.

 //1. Select 40 for the training
CvTrainTestSplit cvtts(40, true);
//2. Assign the division to the data
cvml.set_train_test_split(&cvtts);

Step 4. The training process

Let’s suppose that I got 1000 features (columns in the csv after the response) and that I want to train the algorithm with just 100 (the second parameter in the next code)

 //1. Declare the classifier
CvBoost boost;
//2. Train it with 100 features
boost.train(&cvml, CvBoostParams(CvBoost::REAL, 100, 0, 1, false, 0), false);

The description of each of the arguments can be find here.

Step 5. Calculating the testing and training error

The error corresponds to the misclassified samples. Then, there could be two possible errors: the training and the testing.

 // 1. Declare a couple of vectors to save the predictions of each sample
std::vector train_responses, test_responses;
// 2. Calculate the training error
float fl1 = boost.calc_error(&cvml,CV_TRAIN_ERROR,&train_responses);
// 3. Calculate the test error
float fl2 = boost.calc_error(&cvml,CV_TEST_ERROR,&test_responses);

Note that the responses for each samples are saved in the train_responses and test_responses vectors. This is very useful to calculate confusion matrix (false positives, false negatives, true positives and false negatives and roc curves. I ll be posting how to build them with R.

Step 6. Save your classifier!!

You probably wouldn’t mind at the beginning when it takes a few seconds to train something but you definitely don’t want to lost it after a couple of hours or days that you waited for the results:

 // Save the trained classifier
boost.save("./trained_boost.xml", "boost");

Step 7. Compiling the whole code

The whole code is pasted at the end. To compile it, use this

g++ -ggdb `pkg-config --cflags opencv` -o `basename main` main.cpp `pkg-config --libs opencv`;

Here is the file with my code.

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.9855

#main.cpp
#include <cstdlib>
#include "opencv/cv.h"
#include "opencv/ml.h"
#include <vector>

using namespace std;
using namespace cv;
int main(int argc, char** argv) {

/* STEP 2. Opening the file */
//1. Declare a structure to keep the data
CvMLData cvml;
//2. Read the file
cvml.read_csv("samples.csv");
//3. Indicate which column is the response
cvml.set_response_idx(0);

/* STEP 3. Splitting the samples */
//1. Select 40 for the training
CvTrainTestSplit cvtts(40, true);
//2. Assign the division to the data
cvml.set_train_test_split(&cvtts);
printf("Training ... ");

/* STEP 4. The training */
//1. Declare the classifier
CvBoost boost;
//2. Train it with 100 features
boost.train(&cvml, CvBoostParams(CvBoost::REAL, 100, 0, 1, false, 0), false);

/* STEP 5. Calculating the testing and training error */
// 1. Declare a couple of vectors to save the predictions of each sample
std::vector train_responses, test_responses;
// 2. Calculate the training error
float fl1 = boost.calc_error(&cvml,CV_TRAIN_ERROR,&train_responses);
// 3. Calculate the test error
float fl2 = boost.calc_error(&cvml,CV_TEST_ERROR,&test_responses);
printf("Error train %f n", fl1);
printf("Error test %f n", fl2);

/* STEP 6. Save your classifier */
// Save the trained classifier
boost.save("./trained_boost.xml", "boost");

return EXIT_SUCCESS;
}