Rover Visual Obstacle Avoidance

Rover Visual Obstacle Avoidance


2024年4月1日发(作者:回收苹果手机哪个平台比较好)

Rover Visual Obstacle Avoidance

Hans P. Moravec

Robotics Institute

Carnegie-Mellon University

Abstract

The Stanford AI Lab cart is a remotely controlled TV equipped mobile

robot. A computer program has driven the cart through cluttered spaces,

gaining its knowledge of the world entirely from images broadcast by the

onboard TV system.

The cart uses several kinds of stereo to locate objects around it in 3D

and to deduce its own motion. It plans an obstacle avoiding path to a

desired destination on the basis of a model built with this information.

The plan changes as the cart perceives new obstacles on its journey.

The system is reliable for short runs, but slow. The cart moves one

meter every ten to fifteen minutes, in lurches. After rolling a meter it

stops, takes some pictures and thinks about them for a long time. Then it

plans a new path, executes a little of it. and pauses again.

It has successfully driven the cart through several 20 meter courses

(each taking about five hours) complex enough to necessitate three or

four avoiding swerves. Some weaknesses and possible improvements were

suggested by these and other, less successful, runs.

disaster terminates the program. Figure la and lb document uie cart's

internal world model at two points during a sample run.

CAMERA CALIBRATION

The camera's focal length an geometric distortion are determined by

parking the cart a precise distance in front of a wall of many spots and one

cross. A program digitizes an image of the spot array, locates the spots and

the cross, and constructs a a two dimensional polynomial that relates the

position of the spots in the image to their position in an ideal unity focal

length camera, and another polynomial that converts points from the

ideal camera to points in the image. These polynomials arc used to correct

the positions of perceived objects in later scenes.

The algorithm begins by determining the array's approximate spacing

and orientation. It trims the picture to make it square, reduces it by

averaging to 64 by 64, calculates the Fourier transform of the reduced

image and takes its power spectrum, arriving at a 21) transform symmetric

about the origin, and having strong peaks at frequencies corresponding to

the horizontal and vertical and half-diagonal spacings. with weaker peaks

at the harmonics. It multiplies each point [ij] in this transform by point [-

j.t] and points [j-ij+ i and [i+jj-il effectively folding the primary peaks

onto one another. The strongest peak in the 90 degree wedge around the y

axis gives the spacing and orientation information needed by the next part

of the process.

The interest operator described later is applied to roughly locate a spot

near the center of the image. A special operator examines a window

surrounding this position, generates a histogram of intensity values within

the window, decides a threshold for separating the black spot from the

white background, and calculates the centroid and first and second

moment of the spot. This operator is again applied at a displacement

from the first centroid indicated by the orientation and spacing of the

grid, and so on, the region of found spots growing outward from the seed.

A binary template for the expected appearance of the cross in the

middle of the array is constructed from the orientation/spacing data from

the Fourier transform. l"he area around each of the found spots is

thresholded on the basis of the expected cross area, and the resulting two

valued pattern is convolved with the cross template. The closest match in

the central portion of the picture is declared to be the origin.

Two least-squares polynomials (one for X and one for Y) of third (or

sometimes fourth) degree in two variables, relating the actual positions of

the spots to the ideal positions in a unity focal length camera, arc then

generated and written into a file. The polynomials arc used in the

obstacle avoider to correct for camera roll, tilt, focal length and long term

variations in the vidicon geometry.

INTRODUCTION

A run of the avoider system begins with a calibration of the cart's

camera. The cart is parked in a standard position in front of a wall of

spots. A calibration program notes the disparity in position of the spots in

the image seen by the camera with their position predicted from an

idealized model of the situation. It calculates a distortion correction

polynomial which relates these positions, and which is used in subsequent

ranging calculations.

The cart is then manually driven to its obstacle course (littered whith

large and small debris) and the obstacle avoiding program is started. It

begins by asking for the cart's destination, relative to its current position

and heading. After being told, say, 50 meters forward and 20 to the right,

it begins its maneuvers. It activates a mechanism which moves the TV

camera, and digitizes nine pictures as the camera slides in precise steps

from one side to the other along a 50 cm track.

A subroutine called the interest operator is applied to the one of these

pictures. It picks out 30 or so particularly distinctive regions (features) in

this picture. Another routine called the correlator looks for these same

regions in the other frames. A program called the camera solver

determines the three dimensional position of the features with respect to

the cart from their apparent movement image to image.

The navigator plans a path to the destination which avoids all the

perceived features by a large safety margin. The program then sends

steering and drive commands to the cart to move it about a meter along

the planned path. The cart's response to such commands is not very

precise. The camera is then operated as before, and nine new images are

acquired. The control program uses a version of the correlator to find as

many of the features from the previous location as,possiblc in the new

pictures, and applies the camera solver. The program then deduces the

cart's actual motion dunng the step from the apparent three dimensional

shift of these features. Some of the features are pruned during this

process, and the interest operator is invoked to add new ones.

This repeats until the cart arrives at its destination or until some

INTEREST OPERATOR

The cart vision code deals with localized image patches called features.

A feature is conceptually a point in the three dimensional world, but it is

found by examining localities larger than points in pictures. A feature is

good if it can be located unambiguously in different views of a scene. A

uniformly colored region or a simple edge is not good because its parts are

785

indistinguishable. Regions such as corners, with high contrast in

orthogonal directions are best

New features in images are picked by a subroutine called the interest

operator. It tries to select a relatively uniform scattering of good features,

to maximize the probability that a few features will be picked on every

visible object by returning regions that arc local maxima of a directional

variance measure. Featureless areas and simple edges, which have no

variance in the directior of the edge, are thus avoided.

Directional variance is measured over small square windows. Sums of

squares of differences of pixels adjacent in each of four directions

(horizontal, vertical and two diagonals) over each window are calculated,

and the window's interest measure is the minimum of these four sums.

Features are chosen where the interest measure has local maxima. The

chosen features are stored in an array, sorted in order of decreasing

interest measure.

Once a feature is chosen, its appearance is recorded as series of excerpts

from the reduced image sequence. A window (6 by 6 in the current

implementation) is excised around the feature's location from each of the

variously reduced pictures. Only a tiny fraction of the area of the original

(unreduced) image is extracted. Four times as much of the x2 reduced

image is stored, sixteen umes as much of the x4 reduction, and so on until

at some level we have the whole image. The final result is a scries of 6 by 6

pictures, beginning with a very blurry rendition of the whole picture,

gradually zooming in linear expansions of two to a sharp closcup of the

feature.

At each pause on its computer controlled itinerary the cart slides its

camera from left to right on the 52 cm track, taking 9 pictures at precise

6.5 cm intervals. Points are chosen in the fifth (middle) of these 9 images,

either by the correlator to match features from previous positions.-or by

the interest operator. The camera slides parallel to the horizontal axis of

the (distortion corrected) camera co-ordinate system, so the parallax-

induced displacement of features in the 9 pictures is purely horizontal.

The correlator looks for the points chosen in the central image in each

of the eight other pictures. The search is restricted to a narrow horizontal

band. This has little effect on the computation time, but it reduces the

probability of incorrect matches. In the case of correct matches, the

distance to the feature is inversely proportional to its displacement from

one image to another. The uncertainty in such a measurement is the

difference in distance a shift one pixel in the image would make. The

uncertainty varies inversely with the physical separation of the camera

positions where the pictures were taken (the stereo baseline). ong

baselines give more accurate distance measurements.

After the correlation step the program knows a feature's position in

nine images It considers each of the 36 (= 9 choose 2) possible image

pairings as a stereo baseline, and records the estimated (inverse) distance

of the feature in a histogram. Each measurement adds a little normal

curve to the histogram, with mean at the estimated distance, and standard

deviation inversely proportional to the baseline, reflecting the uncertainty.

The area under each curve is made proportional to the product of the

correlation coefficients of the matches in the two images (in central image

this coefficient is taken as unity), reflecting the confidence that the

correlations were correct. The distance to the feature is indicated by the

largest peak in the resulting histogram,

;

f this peak is above a certain

threshold. If below, the feature is forgotten about.

The correlator sometimes matches features incorrectly. The distance

measurements from incorrect matches in different pictures arc not

consistent When the normal curves from 36 pictures pairs arc added up.

the correct matches agree with each other, and build up a large peak in

the histogram, while incorrect matches spread themselves more thinly.

Two or three correct correlations out of the eight will usually build a peak

sufficient to offset a larger number of errors. In this way eight

applications of a mildly reliable operator interact to make a very reliable

L

distance measurement

Weaknesses

The measure is able to unambiguously reject edges only if they are

oriented along the four directions of summation. Edges with intermediate

direction give non-zero values for all four sums, and arc sometimes

incorrectly chosen as interesting. The operator especially favors

intersecting edges. These arc sometimes corners or cracks in objects, and

are very good. Sometimes they arc caused by a distant object peering

over the edge of a nearby one and then they are very bad. Such spurious

intersections do not have a definite distance, and must be rejected during

camera solving.

CORRELATION

Deducing the 3D location of features from their projections in 2D

images requires that we know their position in two or more such images.

The correlator is a subroutine that, given a feature description produced

by the interest operator from one image, finds the best match in a

different, but similar, image. Its search area can be the entire new picture,

or a rectangular sub-window.

The search uses a coarse to fine strategy that begins in reduced versions

of the pictures. Typically the first step takes place at the xl6 (linear)

reduction level. The 6 by 6 window at that level in the feature description,

that covers about one seventh of the total area of the original picture, is

convolved with the search area in the correspondingly reduced version of

the second picture. The 6 by 6 description patch is moved pixel by pixel

over the approximately 15 by 16 destination picture, and a correlation

coefficient is calculated for each trial position. The position with the best

match is recorded. The 6x6 area it occupies in the second picture is

mapped to the x8 reduction level, where the corresponding region is 12

pixels by 12. The 6 by 6 window in the x8 reduced level of the feature

description is then convolved with this 12 by 12 area, and the position of

best match is recorded and used as a search area for the x4 level. The

process continues, matching smaller and smaller, but more and more

detailed windows until a 6 by 6 area is selected in the unreduced picture.

The window sizes and other parameters are sometimes different from the

ones used in this example.

Motion Stereo

After having determined the 3D location of objects at one position, the

computer drives the cart about a meter forward. At the new position it

slides the camera and takes nine pictures. The correlator is applied in an

attempt to find all the features successfully located at the previous

position. Feature descriptions extracted from the central image at the last

position arc searched for in the central image at the new stopping place.

Slider stereo then determines the distance of the features so found from

the cart's new position. The program now knows the 3D position of the

features relative to its camera at the old and the new locations. Its own

movement is deduced from 3D co-ordinate transform that relates the two.

The program first eliminates mis-matches in the correlations between

the central images at two positions. Although it doesn't yet have the co-

ordinate transform between the old and new camera systems, the program

knows the distance between pairs of feature positions should be the same

in both. It makes a matrix in which element [i,j] is the aosolute value of

the difference in distances between points i and / in the first and second

co-ordinate systems divided by the expected error (based on the one pixel

uncertainty of the ranging). Fach row of this matrix is summed, giving an

indication of how much each point disagrees with the other points. The

idea is that while points in error disagree with virtually all points, correct

positions agree with all the other correct ones, and disagree only with the

bad ones. The worst point is deleted, and its effect is removed from the

remaining points in the row sums. This pruning is repeated until the worst

error is within the error expected from the ranging uncertainty.

After the pruning, the program has a number of points, typically 10 to

20. whose position error is small and pretty well known. The program

trusts these, and records them in its world model, unless it had already

STEREO

Slider Stereo

7

86


发布者:admin,转转请注明出处:http://www.yc00.com/num/1711913608a1976691.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信