Character Segmentation from the Image [Continued]

Yeah, this time I will introduce about character segmentation in details[Code] from the previous entry.
First of all,you should load this tester image for workshop.

image.jpg (11.77 kb)


OK, this workshop works on Python language with Python Image Library. http://www.pythonware.com/products/pil/index.htm

There are 5 parts in this workshop.
1. Preprocessing -> prepare image before do the main process.
2. Find histograms in both directions -> to be used in step 3.
3. Find pair of concave up and down points -> to find segmentation points.
4. Segmentation.
5. Show Results.

Recall from the previous entry.
In Preprocessing, we need to load the image in grayscale image. Next, clean the image. Then, convert to a binary image.


# load and convert image to a grayscale image.
im = Image.open('image.jpg').convert("L") 
(width, height) = im.size
# use median filter size 11x11 to clean image
mim = im.filter(ImageFilter.MedianFilter(11))
# extract foreground from background by using threshold value.
# using threshold = 8, then use median filter to clean image again.
cim = mim.point(lambda i: i*8,"F").convert("1").filter(ImageFilter.MedianFilter(11))

We will get the result like this. [Use cim.show() to show an image]

Next, we have to find histograms in horizontal and vertical lines.

#main process part
# load image to 2d
pix = cim.load()

#find histograms in both directions
hhist = []
s = 0
for h in range(height):
    for  w in range(width):
        if(pix[w,h]==0):
            s+=1
    hhist.append(s)
    s = 0
    
vhist = []
s = 0
for w in range(width):
    for  h in range(height):
        if(pix[w,h]==0):
            s+=1
    vhist.append(s)
    s = 0

Then, we have to find cutting point from the image by seeing from histogram value that goes up from 0 to upper and goes down from more than 0 to 0.

#find pair of concave up and down points
hup = []        #keep concave up index [horizontal]
hdown = []      #keep concave down index [horizontal]
s = len(hhist)
for x in range(1,s):
    if(hhist[x-1]>0 and hhist[x]==0):
        hdown.append(x-1)
    if(hhist[x-1]==0 and hhist[x]>0):
        hup.append(x)

vup = []        #keep concave up index [vertical]
vdown = []      #keep concave down index [vertical]
s = len(vhist)
for x in range(1,s):
    if(vhist[x-1]>0 and vhist[x]==0):
        vdown.append(x-1)
    if(vhist[x-1]==0 and vhist[x]>0):
        vup.append(x)

Finally, we will segment an image and then show the result of all characters. We have to create a box for crop an image (left,upper,right,lower) that upper and lower will use values from hup and hdown respectively.

#segmentation
# we know that we have only 1 line so, hup and hdown are not change.
# if we want to show all segmented images, it will change in vup and vdown indexes.

for i in range(len(vup)):
    box = (vup[i],hup[0],vdown[i],hdown[0])
    seg = cim.crop(box)

#show result
    seg.show()

                            

Now, we will know about the concept of image segmentation that is not hard for implement. It may expand your thought to develop a more complicated segmentation ^^.

Categories:   Image Processing
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | RSS


Character Segmentation from the Image [1 line]

Today, I will talk about how to segment character from the image by using MATLAB. At this time, I will show about the concept and the result first, next time I will show the code to implement this. In this entry, I will create automate segmentation step by step. 1. Read the grayscale image.

                               

 

2. Clean the image by using median filter: http://en.wikipedia.org/wiki/Median_filter There are a lot of method to reduce noise from the image but in this picture I found that median filter is the best way to reduce the noise from the image.

3. Now, we can extract foreground image from the background.

 

4. We will see that there are noise stay in the image so, we have to clean the image again by using median filter.

 

5. Find the projection in the vertical and horizontal to see the space between each character which we have to use this to separate each character.

 

Horizontal Histogram

 

Vertical Histogram

6. We have to match the boundary of each character by concave down and concave up of the histogram. [In horizontal, we have only 1 pair because we have only 1 line of characters.]

7. Finally, we cut the selected zone from the image to display a character.

 

8. Do these for all characters.

 

So, we will know the basic steps to segment the characters and the next time I will show and explain about the code to implement it. Hope this will guide you an idea.

 

 

Categories:   Image Processing
Actions:   E-mail | del.icio.us | Permalink | Comments (4) | RSS


Content-Aware Image Resizing using GDI+

After I've disappeared from this blog for a while, actually I went travelling at Phuket Island with my friends (may be next time I'll post some pictures, you'll love that place as I do). Today, my intention here is to show you some power of GDI+. Though GDI+ doesn't do well in image processing comparing to others like C++, DirectDraw but at least just make it in GDI+ with C#. You'll see that how can C# manipulate image data.

Content-Aware Image Resizing ,or in another word Seam Carving, concept is proposed by Avidan and Shamir presented at Siggraph 2007. You can see the original paper here : http://www.faculty.idc.ac.il/arik/SCWeb/imret/index.html.

I'm not an expert in image processing and I may make a mistake in this algorithm example here. Therefore, any mistakes you found here don't come from my intention and I appreciate to change and collect those : - )

Ok, let take a look some theories and we'll move ahead to implement the real code!

Firstly you have to understand fundamental GDI+, the way it processes and draws image. If we want to play with image and apply algorithm for it, we must know how to extract pixels and channels from the image. Before that there are some words are worth to look at : 

  • Scan0 - address of the first locked byte of array in memory. For a locked image, it's the first byte address of that image.
  • PixelFormat - pixel has different format that you can use. Format24bppRgb is popular one; 8bit for R(Red Channel) 8bit for G and 8 bit for B. There are many more format you can find here : http://msdn.microsoft.com/en-us/library/system.drawing.imaging.pixelformat.aspx. For this post, we'll use Format24bppRgb .
  • Stride - number of bytes width in one row of pixel data in a locked image array. Stride used multiple of 4 padding boundary that means if your image has 22 pixels in one row, that is,22 pixels * 3 bytes of RGB = 66 bytes for one row. But 66 is not a multiple of 4, we need 68 (68 is a multiple of 4 since 17*4 = 68). So, Stride is 68 not 66, leaving 2 bytes as unused space for efficiency reason.
  • Bitmap.LockBits(..) - locks a Bitmap into system memory. From this method, we'll get BitmapData that enables us to use futher. After we finish manipulating BitmapData, we have to unlock it by using Bitmap.UnlockBits(BitmapData) to release the resource in memory.
  • Therefore, we can do : Stride - (image.Width*3) = unused bytes or padding offset.
  • You can read more details about words here : http://www.bobpowell.net/lockingbits.htm

As noted from the image above, we can get padding = stride - no. of pixel in one row*3 RGB bytes and we have Scan0 as an address for our pointer *p. For playing with pinter in C#, you must set the property of a project to be able to compile unsafe block code.

We already know the detail how to prepare the image for manipulating. We are now able to adapt some techniques for our image. Before that, let have a look how seam carving algorithm can be used.

Calculate the energy : The idea is we have to perform edge detection. Edge detection technique will try to find the sharp change in image brightness to indicate the boundary of objects in the image. One picture is worth looking than 1000 words. So, let have a look.

For computing energy image we use the following formula : . We are using RGB, so we can translate the formula to be matched with our RGB format. Note that there are also other methodologies to find out energy image e.g. finding gradient magnitudes from gX and gY using Convolution Matrix.

The thing that you must know is Window will store BGR bytes in one pixel not RGB bytes. Remember that BGR not RGB. Thus, pointer at p[0] is B, p[1] is G, p[2] is R respectively. The code is look like here : 

public static Bitmap GetEneryMatrixs(Bitmap image)
{
    BitmapData bmData = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    //address of the first line. IntPtr is a platform specific type that represents pointer in C#.
    IntPtr ptr = bmData.Scan0;
    //one line length of an image in rgb pixel including padding.
    int stride = bmData.Stride;

    //if you notice, here we use 2 image. One image from method parameter and one to be an output image.
    //You can guess that at first the output image (newBm) is empty. 
    Bitmap newBm = new Bitmap(image.Width, image.Height);
    BitmapData newBmData = newBm.LockBits(new Rectangle(0, 0, newBm.Width, newBm.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    IntPtr newPtr = newBmData.Scan0;
    int newStride = newBmData.Stride;

    unsafe
    {
        //convert pointer ptr into byte* explicitly.
        byte* p = (byte*)ptr;
        int rgbWidth = image.Width * 3;
        int paddingOffset = stride - rgbWidth;

        byte* newP = (byte*)newPtr;
        int newRgbWidth = newBm.Width * 3;
        int newPaddingOffset = newStride - newRgbWidth;

        int rLrR = 0, gLgR = 0, bLbR = 0, rTrB = 0, gTgB = 0, bTbB = 0;
        byte result;
        double dIdx = 0, dIdy = 0;
        for (int j = 0; j < image.Height; j++)
        {
            for (int i = 0; i < image.Width; i++)
            {
                //at the topmost row we don't have p[-3] => B ,p[-2] => G, p[-1] => R. 
                //So we use p[0],p[1],p[2] to replace that respectively.
                int offSetT = 0, offSetB = 0;
                if (j == 0) offSetT = stride;
                else if (j == image.Height - 1) offSetB = -stride;

                bLbR = p[-stride + offSetT] - p[stride + offSetB];
                gTgB = p[-stride + 1 + offSetT] - p[stride + 1 + offSetB];
                rLrR = p[-stride + 2 + offSetT] - p[stride + 2 + offSetB];

                //handle leftmost and rightmost column.
                int offSetL = 0, offSetR = 0;
                if (i == 0) offSetL = 3;
                else if (i == image.Width - 1) offSetR = -3;

                //rLrR is Bleft - Bright, the same for the others.
                bLbR = p[-3 + offSetL] - p[3 + offSetR];
                gLgR = p[-2 + offSetL] - p[4 + offSetR];
                rLrR = p[-1 + offSetL] - p[5 + offSetR];

                dIdx = PythagoreanAdd(bLbR, gLgR, rLrR);
                dIdy = PythagoreanAdd(bTbB, gTgB, rTrB);

                //get the energy result from computing R,G,B channel vectors.
                result = (byte)Math.Round(dIdx + dIdy);
                //save the energy value to our new locked image (newBm) via its pointer (newP).
                newP[0] = result;
                newP[1] = result;
                newP[2] = result;

                //advance the pointers by 3 (R,G,B).
                p += 3;
                newP += 3;
            }

            //add paddingOffset to skip to the next row.
            p += paddingOffset;
            newP += newPaddingOffset;
        }
    }
    //release the resources.
    image.UnlockBits(bmData);
    newBm.UnlockBits(newBmData);

    return newBm;
}
public static double PythagoreanAdd(params int[] varList)
{
    long result = 0;
    for (int i = 0; i < varList.Length; i++)
    {
        //power two for each
        result += varList[i] * varList[i];
    }
    return Math.Sqrt(result);
}

 

Find the best seam : Seam carving has two main algorithms that are widely used for image resizing nowadays, Backward and Forward Energy. In this scenario, I'll use Backward Energy algorithm to find the best seam. In seam carving, we also use cummulative minimum energy or M Matrix and seam route tracking or K Matrix those two matrices sizes are equal to image width and height (not multiply by 3).

By using formula M[i,j] = EnergyImg[i,j] + min[M[i-1,j-1],M[i-1,j],M[i-1,j+1]], we can compute M and K Matrix as the following..

public override int[] ApplyCarvingAlgorithm()
{
    int height = _EnergyImg.Height;
    int width = _EnergyImg.Width;

    int[,] m = new int[height, width];
    int[,] k = new int[height, width];

    BitmapData bmData = _EnergyImg.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    //address of the first line.
    IntPtr ptr = bmData.Scan0;
    //one line length of an image in rgb pixel including padding
    int stride = bmData.Stride;
    unsafe
    {
        //convert pointer ptr into byte* explicitly
        byte* p = (byte*)ptr;
        int rgbWidth = width * 3;
        int paddingOffset = stride - rgbWidth;

        //initialize the first row of m and k matrix.
        for (int i = 0; i < width; i++)
        {
            m[0, i] = p[0];
            k[0, i] = 0;
            p += 3;
        }

        //find matrix m and k values.
        p = (byte*)ptr; //energy image pointer must be reset its position.
        int minIndex = 0;
        for (int j = 1; j < height; j++)
        {
            for (int i = 0; i < width; i++)
            {
                //check left
                if (i <= 0) minIndex = FindMinIndex(999999, m[j - 1, i], m[j - 1, i + 1]);
                //check right
                else if (i + 1 >= width) minIndex = FindMinIndex(m[j - 1, i - 1], m[j - 1, i], 999999);
                else minIndex = FindMinIndex(m[j - 1, i - 1], m[j - 1, i], m[j - 1, i + 1]);

                //minIndex has only 0,1,2 and it doesn't consider the current index, so we need to use minIndex + i(current index)
                m[j, i] = p[0] + m[j - 1, i + minIndex - 1];    //chosen m (***i + minIndex - 1) plus current energy bit.
                k[j, i] = minIndex + 1;   //either 1,2,3 (top row values are all zero)

                p += 3;
            }
            p += paddingOffset;
        }
    }

    _EnergyImg.UnlockBits(bmData);

    return CalculatePositionMatrix(m, k, width, height);
}

After we get the best seam like below, remove those seam from an image. Cut off seam for both _EnergyImg and our inputted image.

 

Remove/Insert seam : For me, I choose to compute positionMatrix from K Matrix, my positionMatrix contains the positions of my seam regarding image width and height. You may come up with your own solution to cut off seam. It's not that hard.

public static Bitmap CutSeam(Bitmap image, int[] positionMatrix)
{
    BitmapData bmData = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    //address of the first line.
    IntPtr ptr = bmData.Scan0;
    //one line length of an image in rgb pixel including padding
    int stride = bmData.Stride;

    Bitmap newBm = new Bitmap(image.Width - 1, image.Height);
    BitmapData newBmData = newBm.LockBits(new Rectangle(0, 0, newBm.Width, newBm.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    IntPtr newPtr = newBmData.Scan0;
    int newStride = newBmData.Stride;

    unsafe
    {
        //convert pointer ptr into byte* explicitly
        byte* p = (byte*)ptr;
        int rgbWidth = image.Width * 3;
        int paddingOffset = stride - rgbWidth;

        byte* newP = (byte*)newPtr;
        int newRgbWidth = newBm.Width * 3;
        int newPaddingOffset = newStride - newRgbWidth;

        for (int j = 0; j < image.Height; j++)
        {
            for (int i = 0; i < image.Width - 1; i++)
            {
                if (positionMatrix[j] <= i)
                {
                    newP[0] = p[3];
                    newP[1] = p[4];
                    newP[2] = p[5];
                }
                else
                {
                    newP[0] = p[0];
                    newP[1] = p[1];
                    newP[2] = p[2];
                }
                p += 3;
                newP += 3;
            }
            //advance p 1 pixel (account for r,g,b 3 pigments) because we newP has 1 more pixel than p.
            p += paddingOffset + 3;
            newP += newPaddingOffset;
        }
    }
    image.UnlockBits(bmData);
    newBm.UnlockBits(newBmData);
    return newBm;
}

 

public static Bitmap AddSeam(Bitmap image, int[] positionMatrix)
{
    BitmapData bmData = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    //address of the first line.
    IntPtr ptr = bmData.Scan0;
    //one line length of an image in rgb pixel including padding
    int stride = bmData.Stride;

    Bitmap newBm = new Bitmap(image.Width + 1, image.Height);
    BitmapData newBmData = newBm.LockBits(new Rectangle(0, 0, newBm.Width, newBm.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    IntPtr newPtr = newBmData.Scan0;
    int newStride = newBmData.Stride;

    unsafe
    {
        //convert pointer ptr into byte* explicitly
        byte* p = (byte*)ptr;
        int rgbWidth = image.Width * 3;
        int paddingOffset = stride - rgbWidth;

        byte* newP = (byte*)newPtr;
        int newRgbWidth = newBm.Width * 3;
        int newPaddingOffset = newStride - newRgbWidth;

        for (int j = 0; j < image.Height; j++)
        {
            for (int i = 0; i < image.Width + 1; i++)
            {
                if (positionMatrix[j] == i && j != image.Height - 1 && j != 0 && i != 0 && i != image.Width - 1)
                {
                    newP[0] = (byte)((p[-3] + p[0- stride]+ p[3 + stride]) / 3.0);
                    newP[1] = (byte)((p[-2] + p[1 - stride] + p[4 + stride]) / 3.0);
                    newP[2] = (byte)((p[-1] + p[2 - stride] + p[5 + stride]) / 3.0);
                }
                else if (positionMatrix[j] < i)
                {
                    //copy this lowest enery seam
                    newP[0] = p[-3];
                    newP[1] = p[-2];
                    newP[2] = p[-1];
                }
                else
                {
                    newP[0] = p[0];
                    newP[1] = p[1];
                    newP[2] = p[2];
                }
                p += 3;
                newP += 3;

            }
            //we use image.Width + 1 for newP, but for p it must rely on image.Width 
            //not image.Width + 1. So, jump back one pixel (3, from the pointer's point of view).
            p += paddingOffset - 3;
            newP += newPaddingOffset;
        }
    }
    image.UnlockBits(bmData);
    newBm.UnlockBits(newBmData);
    return newBm;
}

 

protected static int[] CalculatePositionMatrix(int[,] m, int[,] k, int width, int height)
{
    //find the minimum value of the last row.
    int minIndex = 0;
    //use oldMinIndex as the initializer.
    int chosenIndex = 1;

    for (; chosenIndex < width; chosenIndex++)
        if (m[height - 1, chosenIndex] < m[height - 1, minIndex])
            minIndex = chosenIndex;

    //mark the position of image bits that we are going to cut off or copy.
    int[] positionMatrix = new int[height];
    //go upward and find the route that has lowest enery.
    for (int y = height - 1; y >= 0; y--)
    {
        positionMatrix[y] = minIndex;    //save current chosen index
        if (k[y, minIndex] == 0) break;  //we reach the top row
        else if (k[y, minIndex] == 1) minIndex--;  //go left in the next loop
        else if (k[y, minIndex] == 3) minIndex++; //go right in the next loop
        //else use the same chosenIndex value
    }
    return positionMatrix;
}

Backward Energy is actually slower than Forward Energy algorithm if we compare. In forward, you don't need to compute energy image, and the result is also better too. Backward may cut off the straight line, for example, but Forward tends to avoid that.

Thanks for reading! I will finish with some result images from our code.


kick it on DotNetKicks.com

Categories:   Image Processing
Actions:   E-mail | del.icio.us | Permalink | Comments (2) | RSS


 

Tag Cloud