An Introduction to Machine Vision A Tutorial

Machine vision allows you to obtain useful information about physical objects by automating analysis of digital images of those objects. This is one of the most challenging applications of computer technology. There are two general reasons for this: almost all vision tasks require at least some judgment on the part of the machine, and the amount of time allotted for completing the task usually is severely limited. While computers are astonishingly good at elaborate, high-speed calculation, they still are very primitive when it comes to judgment.

Machine-Vision System Components

A machine-vision system has five key components.

Illumination

Just as a professional photographer uses lighting to control the appearance of subjects, so the user of machine vision must consider the color, direction, and shape of an illumination. For objects moving at high speed, a strobe often can be used to freeze the action.

Camera

For many years, the standard machine-vision camera has been monochromatic. It outputs many shades of gray but not color, provides about 640 × 480 pixels, produces 30 frames per second, uses CCD solid-state sensor technology, and generates an analog video signal defined by television standards.

Color cameras have long been available but are less frequently used due to cost and lack of compelling need. Higher-resolution cameras also are available; the cost usually has been prohibitive, but that is expected to change soon.

Frame rates of 60/s are becoming common. CMOS sensor technology is challenging the long dominance of CCD, offering lower cost but not yet equaling its quality. Cameras producing digital video signals also are becoming more common and generally produce higher image quality.

Special features useful in machine vision include rapid-reset that allows the image to be taken at any desired instant of time and an electronic shutter used to freeze objects moving at medium speeds.

Frame Grabber

A frame grabber interfaces the camera to the computer that is used to analyze the images. One common form for a frame grabber is a plug-in card for a PC.

Computer

Often an ordinary PC is used, but sometimes a device designed specifically for image analysis is preferred. The computer uses the frame grabber to capture images and specialized software to analyze them and is responsible for communicating results to automation equipment and interfacing with human operators for setup, monitoring, and control.

Software

The key to successful machine-vision performance is the software that runs on the computer and analyzes the images. Software is the only component that cannot be considered a commodity and often is a vendor’s most important intellectual property.

In recent years, it has become more common to deliver these five components in the form of a single, integrated package. These systems often are referred to as machine-vision sensors to distinguish them from more traditional systems where each of the components is a discrete module. While the traditional systems are somewhat more versatile, the sensors generally are less expensive and easier to use.

Machine-Vision Methods

The discussion of machine-vision methods divides naturally into image enhancement and image analysis. Image-enhancement methods produce modified images as output and seek to enhance certain features while attenuating others. Image analysis interprets images, producing information such as position, orientation, and identity of an object or perhaps just an accept/reject decision. While the goal of machine vision is image interpretation, and machine vision may be considered synonymous with image analysis, image-enhancement methods often are used as a first step.

Image-Enhancement Methods Point Transforms

Point transforms produce output images where each pixel is some function of a corresponding input pixel. The function is the same for every pixel and often derived from global statistics of the image, such as the mean, standard deviation, minimum, or maximum of the brightness values. Point transforms generally execute rapidly but are not very versatile. Common uses include gain, offset, and color adjustments.

Thresholding

Thresholding is a commonly used enhancement whose goal is to segment an image into object and background. A threshold value is computed, above (or below) which pixels are considered object and below (or above) which pixels are considered background. Sometimes two thresholds are used to specify a band of values that correspond to object pixels.

Thresholds can be fixed but are best computed from image statistics or neighborhood operations. In all cases, the result is a binary image—only black and white are represented with no shades of gray.

When thresholding works, it eliminates unimportant shading variation. Unfortunately in most applications, scene shading is such that objects cannot be separated from background by any threshold. In addition, thresholding destroys useful shading information and applies essentially infinite gain to noise at the threshold value, resulting in a significant loss of robustness and accuracy. As a general rule, it is best to avoid image-analysis algorithms that depend on thresholding.

Time Averaging

Time averaging is the most effective method for handling very low-contrast images. The amplitude of uncorrelated noise is attenuated by the square root of the number of images averaged. When time averaging is combined with a gain-amplifying point transform, extremely low-contrast scenes can be processed. The principal disadvantages are the time needed to acquire multiple images and the requirement that the object be stationary.

Linear Filters

Linear filters amplify or attenuate selected spatial frequencies and achieve such effects as smoothing and sharpening. Figure 1a (see the September 2001 issue of Evaluation Engineering) shows a rather noisy image of a cross within a circle. A linear smoothing (low-pass) filter is applied, producing Figure 1b (see the September 2001 issue of Evaluation Engineering). Note how the high-frequency noise has been attenuated, but at a cost of some loss of edge sharpness.

Figure 1c (see the September 2001 issue of Evaluation Engineering) illustrates the effect of a bandpass linear filter. Both the high-frequency noise and the low-frequency uniform regions have been attenuated, leaving only the mid-frequency components of the edges.

Boundary Detection

Boundary detection refers to a class of methods whose purpose is to identify and locate boundaries between roughly uniform regions in an image. The methods range from simple edge detection to complex procedures that might more properly be considered under image analysis. Figure 1d shows the result of a simple boundary detector applied to a noise-free version of Figure 1a.

The shading produced by an object in an image is among the least reliable of an object’s properties, since shading is a complex combination of illumination, surface properties, projection geometry, and sensor characteristics. Image boundaries, on the other hand, usually correspond directly to object surface discontinuities such as edges, since the other factors tend not to be discontinuous. Image boundaries generally are consistent in shape, even when not consistent in brightness (Figure 2, see the September 2001 issue of Evaluation Engineering). Accordingly, boundary detection is one of the most important image-enhancement methods used in machine vision.

Crude edge detectors simply mark image pixels corresponding to boundaries. Sophisticated boundary detectors produce organized chains of boundary points with subpixel position and boundary orientation, accurate to a few degrees, at each point. The best commercially available boundary detectors also are tunable in spatial frequency response over a wide range and operate at high speed.

Nonlinear Filters

Nonlinear filters designed to pass or block desired shapes rather than spatial frequencies have been found useful for image enhancement. The first to consider is the median filter, whose effect, roughly speaking, is to attenuate image features smaller than a particular size and pass image features larger than that size.

Figure 1e (see the September 2001 issue of Evaluation Engineering) shows the effect of a median filter on the noisy image of Figure 1a. The noise, which generally results in small features, is strongly attenuated. Unlike the linear smoothing filter of Figure 1b, there is no significant loss in edge sharpness since all cross and circle features are much larger. A median filter often is superior to a linear filter for noise reduction; however, it takes more computational time than a linear filter.

Morphology

Morphology refers to a broad class of nonlinear shape filters, examples of which can be seen in Figure 3 (see the September 2001 issue of Evaluation Engineering). In the figure, the input image on the left is processed with the two filters shown in the center (called probes), resulting in the images shown on the right.

Imagine the probe as a paintbrush with the output being everything the brush can paint while placed wherever in the input it will fit, such as entirely on black with no white showing. Notice how the morphology operation with appropriate probes is able to pass certain shapes and block others. For simplicity, Figure 3 illustrates morphology as a binary (black/white) operation, but, in general, morphology operations can be used on gray-level images.

Digital Resampling

Digital resampling refers to a process of estimating the image that would have resulted had the continuous distribution of energy falling on the sensor been sampled differently. A different sampling, perhaps at a different resolution or orientation, often is useful.

Image Analysis

Basically, the fundamental problem of image analysis is pattern recognition, the purpose of which is to recognize image patterns corresponding to physical objects in the scene and determine their pose (position, orientation, and size). Often the results of pattern recognition are all that’s needed; for example, a robot guidance system supplies an object’s pose to a robot. In other cases, a pattern-recognition step is needed to find an object so that it can be inspected for defects or correct assembly.

Pattern recognition is hard because a specific object can give rise to a wide variety of images depending on illumination, viewpoint, camera characteristics, and manufacturing variation. In addition, similar-looking objects may be present in the scene that must be ignored, and the speed and cost targets may be severe.

Blob Analysis

Blob analysis is one of the earliest methods widely used for industrial pattern recognition. It classifies image pixels as object or background by some means, joins the classified pixels to make discrete objects using neighborhood connectivity rules, and computes various properties of the connected objects to determine position, size, and orientation.

The advantages of blob analysis are high speed, subpixel accuracy (in cases where the image is not subject to degradation), and the capability to tolerate and measure variations in orientation and size. Disadvantages include the inability to tolerate touching or overlapping objects, poor performance in the presence of various forms of image degradation, the inability to determine the orientation of certain shapes such as squares, and poor ability to discriminate among similar-looking objects.

Normalized Correlation

Normalized correlation (NC) has been the dominant method for pattern recognition in the industry since the late 1980s. It is a member of a class of algorithms known as template matching, which starts with a training step where a picture of an object to be located is stored. At run time, the template is compared to like-sized subsets of the image over a range of positions, with the position of greatest match taken to be the position of the object. The degree of match can be used as a measure of quality.

NC is a gray-scale match function that uses no thresholds and ignores variation in overall pattern brightness and contrast. It is ideal for use in template matching algorithms.

NC template matching overcomes many of the limitations of blob analysis. It tolerates touching or overlapping objects and performs well in the presence of various forms of image degradation. Also, the NC match value is useful in some inspection applications. Most significantly, perhaps, objects need not be separated from background by brightness, enabling a much wider range of applications.

Unfortunately, NC gives up some of the significant advantages of blob analysis, particularly the capability to tolerate and measure variations in orientation and size. NC will tolerate small variations, typically a few degrees and a few percent depending on the specific template. But even within this small range of orientation and size, the accuracy of the results falls off rapidly.

Hough Transform

The Hough transform is a method for recognizing parametrically defined curves such as lines and arcs as well as general patterns. It starts with an edge-detection step, which makes it more tolerant of local and nonlinear shading variations than NC. When used to find parameterized curves, the Hough transform is quite effective. For general patterns, NC may have speed and accuracy advantages as long as it can handle the shading variations.

Geometric Pattern Matching

Geometric pattern matching (GPM) is replacing NC template matching as the method of choice for industrial pattern recognition. Template methods suffer from fundamental limitations imposed by the pixel grid nature of the template itself. Translating, rotating, and sizing grids by noninteger amounts require resampling, which is time-consuming and of limited accuracy. Pixel grids represent patterns using gray-scale shading, which often is not reliable.

It avoids the need to resample by representing an object as a geometric shape, independent of shading and not tied to a discrete grid. Sophisticated boundary detection is used to turn the pixel grid produced by a camera into a conceptually real-valued geometric description that can be translated, rotated, and sized quickly without loss of fidelity. When combined with advanced pattern training and high-speed, high-accuracy pattern-matching modules, the result is a truly general-purpose pattern-recognition and inspection method.

A well-designed GPM system should be as easy to train as NC template matching, yet offer rotation, size, and shading independence. It should be robust under conditions of low contrast, noise, poor focus, and missing and unexpected features.

GPM is capable of much higher pose accuracy than any template-based method, as much as an order of magnitude better when orientation and size vary. Table 1 (see the September 2001 issue of Evaluation Engineering) shows what can be achieved in practice when patterns are reasonably close to the training image in shape and not too degraded. Accuracy generally is higher for larger patterns; the example in Table 1 assumes a pattern in the 150 × 150 pixel range.

Putting It All Together

In the following example, the goal is to inspect objects by looking for differences in shading between an object and a pre-trained, defect-free example called a golden template.

Simply subtracting the template from an image and looking for differences do not work in practice, since the variation in gray-scale due to ordinary and acceptable conditions can be as great as that due to defects. This is particularly true along edges, where a slight misregistration of template and image can create a large variation in gray-scale. Variation in illumination and surface reflectance also can give rise to differences that are not defects, as can noise.

A practical method of template comparison for inspection uses a combination of enhancement and analysis steps to distinguish shading variation caused by defects from that due to ordinary conditions:

A pattern recognition step such as GPM determines the relative pose of the template and image.
A resampling step uses the pose to achieve precise alignment of template to image.
A point transform compensates for variations in illumination and surface reflectance.
The absolute difference of the template and image is computed.
A threshold is used to mark pixels that may correspond to defects. Each pixel has a separate threshold, with pixels near edges having a higher threshold because their gray-scale is more uncertain.
A blob analysis or morphology step is used to identify those clusters of marked pixels that correspond to true defects.

Further Reading

Digital image processing is a broad field. This introduction could only summarize some of the more important methods in common use and may suffer from a bias toward industrial applications. We have entirely ignored 3-D reconstruction, motion, texture, and many other significant topics. The following are suggested for further reading:

Ballard, D.H. and Brown, C.M., Computer Vision, Prentice-Hall, 1982.
Horn, B.K.P., Robot Vision, MIT Press, 1986.
Pratt, W.K., Digital Image Processing, Second Edition, John Wiley & Sons, 1991.
Rosenfeld, A. and Kak, A.C., Digital Picture Processing, Volume 1 and 2, Second Edition, Academic Press, 1982.

About the Author

Bill Silver, who co-founded Cognex in 1981, is the company’s chief technology officer. His achievements include the development of Optical Character Recognition technology and PatMax®, a pattern-finding software tool. Mr. Silver holds bachelor’s and master’s degrees from the Massachusetts Institute of Technology and completed course requirements for a Ph.D. from M.I.T. before leaving to help establish Cognex. Cognex, 1 Vision Dr., Natick, MA 01760, 508-650-3231. e-mail: [email protected]

Return to EE Home Page

Published by EE-Evaluation Engineering
All contents © 2001 Nelson Publishing Inc.
No reprint, distribution, or reuse in any medium is permitted
without the express written consent of the publisher.

September 2001