Image Processing tips for Computer Vision and Deep Learning tasks

I just finished term 1 of the Udacity self-driving car course. Term 1 has five projects and all of them required some form of image processing (to read, process and display images) as a pre-processing step for computer vision and/or deep learning tasks. The key to get better results for these tasks is to get the image processing done accurately. Often times this involves proper scaling between JPEG and PNG image formats or converting to grayscale and normalizing an image or simply using image processing packages OpenCV and Matplotlib correctly for the task at hand. In the following I address all these topics and provide recommendations. The projects mentioned in this write-up refer to Udacity term 1 projects. However, the following post does not require any knowledge of these projects.

When to use OpenCV and Matplotlib?

Both OpenCV and Matplotlib can be used to read and display images. OpenCV reads and plots an image in the BGR format while Matplotlib reads and plots in the RGB format. I like to use Matplotlib to read and display images as 1) it is convenient to plot inside a Jupyter notebook and 2) by reading in Matplotlib I don’t have to convert to a different color space for display. However, if I am processing both PNG and JPEG image formats, I use OpenCV (reasons discussed later). Here is some code to illustrate these points (Check my github repository for example code):

import cv2image = cv2.imread(‘images/test.jpg’)
cv2.imshow( “Display window”, image )
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
%matplotlib inline
image = mpimg.imread(‘images/test.jpg’)
  • OpenCV reads and plots an image in the BGR format while Matplotlib reads and plots in the RGB format. Hence if we read an image with OpenCV and to display with Matplotlib, we need to convert it from BGR to RGB format as follows:
imageBGR = cv2.imread(‘images/test5.jpg’)
imageRGB = cv2.cvtColor(imageBGR, cv2.COLOR_BGR2RGB)
  • As a side note, for grayscale images there is no issue reading them with OpenCV and displaying with Matplotlib as there is only one color channel and conversion is not needed:
image = cv2.imread(‘images/charlie_grayscale.jpg’)
plt.imshow(image, cmap = ‘gray’)

Why convert to grayscale from color?

Many image processing and computer vision algorithms (Canny, Hough, Sobel) use grayscale images rather than color images. These tasks involve edge detection and color information is not useful, hence grayscale is just fine. Also grayscale processing is at least three times faster than that of color image processing. This is because grayscale image has only one color channel as opposed to three in a color image. I have shown an example of this time savings in the example code.

Why normalize data before training?

When we normalize data we typically make the data have zero mean and unit variance with a formula such as:

How to deal with PNG and JPEG images?

We will have situations where we will have training data in PNG format and test data in JPEG format (Project 5 — Vehicle Detection and Tracking) or the reverse. As mentioned before, in such situations I prefer reading the image data in OpenCV instead of Matplotlib. The reason being OpenCV reads both PNG and JPG in 0 to 255 range while Matplotlib reads JPEG in 0 to 255 and PNG on 0 to 1 range! With Matplotlib if we do not scale the image data appropriately we will get strange results.

Test Image
Bounding boxes on unscaled Image
image= image.astype(np.float32)/255
Bounding boxes on scaled Image


These image processing tips mentioned above in this post are straightforward and easy to follow and will save time and frustration when debugging computer vision and deep learning algorithms.

Love all things product management: technology, business, design, customers. And golf.