2024年4月1日发(作者:回收苹果手机哪个平台比较好)
Rover Visual Obstacle Avoidance
Hans P. Moravec
Robotics Institute
Carnegie-Mellon University
Abstract
The Stanford AI Lab cart is a remotely controlled TV equipped mobile
robot. A computer program has driven the cart through cluttered spaces,
gaining its knowledge of the world entirely from images broadcast by the
onboard TV system.
The cart uses several kinds of stereo to locate objects around it in 3D
and to deduce its own motion. It plans an obstacle avoiding path to a
desired destination on the basis of a model built with this information.
The plan changes as the cart perceives new obstacles on its journey.
The system is reliable for short runs, but slow. The cart moves one
meter every ten to fifteen minutes, in lurches. After rolling a meter it
stops, takes some pictures and thinks about them for a long time. Then it
plans a new path, executes a little of it. and pauses again.
It has successfully driven the cart through several 20 meter courses
(each taking about five hours) complex enough to necessitate three or
four avoiding swerves. Some weaknesses and possible improvements were
suggested by these and other, less successful, runs.
disaster terminates the program. Figure la and lb document uie cart's
internal world model at two points during a sample run.
CAMERA CALIBRATION
The camera's focal length an geometric distortion are determined by
parking the cart a precise distance in front of a wall of many spots and one
cross. A program digitizes an image of the spot array, locates the spots and
the cross, and constructs a a two dimensional polynomial that relates the
position of the spots in the image to their position in an ideal unity focal
length camera, and another polynomial that converts points from the
ideal camera to points in the image. These polynomials arc used to correct
the positions of perceived objects in later scenes.
The algorithm begins by determining the array's approximate spacing
and orientation. It trims the picture to make it square, reduces it by
averaging to 64 by 64, calculates the Fourier transform of the reduced
image and takes its power spectrum, arriving at a 21) transform symmetric
about the origin, and having strong peaks at frequencies corresponding to
the horizontal and vertical and half-diagonal spacings. with weaker peaks
at the harmonics. It multiplies each point [ij] in this transform by point [-
j.t] and points [j-ij+ i and [i+jj-il effectively folding the primary peaks
onto one another. The strongest peak in the 90 degree wedge around the y
axis gives the spacing and orientation information needed by the next part
of the process.
The interest operator described later is applied to roughly locate a spot
near the center of the image. A special operator examines a window
surrounding this position, generates a histogram of intensity values within
the window, decides a threshold for separating the black spot from the
white background, and calculates the centroid and first and second
moment of the spot. This operator is again applied at a displacement
from the first centroid indicated by the orientation and spacing of the
grid, and so on, the region of found spots growing outward from the seed.
A binary template for the expected appearance of the cross in the
middle of the array is constructed from the orientation/spacing data from
the Fourier transform. l"he area around each of the found spots is
thresholded on the basis of the expected cross area, and the resulting two
valued pattern is convolved with the cross template. The closest match in
the central portion of the picture is declared to be the origin.
Two least-squares polynomials (one for X and one for Y) of third (or
sometimes fourth) degree in two variables, relating the actual positions of
the spots to the ideal positions in a unity focal length camera, arc then
generated and written into a file. The polynomials arc used in the
obstacle avoider to correct for camera roll, tilt, focal length and long term
variations in the vidicon geometry.
INTRODUCTION
A run of the avoider system begins with a calibration of the cart's
camera. The cart is parked in a standard position in front of a wall of
spots. A calibration program notes the disparity in position of the spots in
the image seen by the camera with their position predicted from an
idealized model of the situation. It calculates a distortion correction
polynomial which relates these positions, and which is used in subsequent
ranging calculations.
The cart is then manually driven to its obstacle course (littered whith
large and small debris) and the obstacle avoiding program is started. It
begins by asking for the cart's destination, relative to its current position
and heading. After being told, say, 50 meters forward and 20 to the right,
it begins its maneuvers. It activates a mechanism which moves the TV
camera, and digitizes nine pictures as the camera slides in precise steps
from one side to the other along a 50 cm track.
A subroutine called the interest operator is applied to the one of these
pictures. It picks out 30 or so particularly distinctive regions (features) in
this picture. Another routine called the correlator looks for these same
regions in the other frames. A program called the camera solver
determines the three dimensional position of the features with respect to
the cart from their apparent movement image to image.
The navigator plans a path to the destination which avoids all the
perceived features by a large safety margin. The program then sends
steering and drive commands to the cart to move it about a meter along
the planned path. The cart's response to such commands is not very
precise. The camera is then operated as before, and nine new images are
acquired. The control program uses a version of the correlator to find as
many of the features from the previous location as,possiblc in the new
pictures, and applies the camera solver. The program then deduces the
cart's actual motion dunng the step from the apparent three dimensional
shift of these features. Some of the features are pruned during this
process, and the interest operator is invoked to add new ones.
This repeats until the cart arrives at its destination or until some
INTEREST OPERATOR
The cart vision code deals with localized image patches called features.
A feature is conceptually a point in the three dimensional world, but it is
found by examining localities larger than points in pictures. A feature is
good if it can be located unambiguously in different views of a scene. A
uniformly colored region or a simple edge is not good because its parts are
785
indistinguishable. Regions such as corners, with high contrast in
orthogonal directions are best
New features in images are picked by a subroutine called the interest
operator. It tries to select a relatively uniform scattering of good features,
to maximize the probability that a few features will be picked on every
visible object by returning regions that arc local maxima of a directional
variance measure. Featureless areas and simple edges, which have no
variance in the directior of the edge, are thus avoided.
Directional variance is measured over small square windows. Sums of
squares of differences of pixels adjacent in each of four directions
(horizontal, vertical and two diagonals) over each window are calculated,
and the window's interest measure is the minimum of these four sums.
Features are chosen where the interest measure has local maxima. The
chosen features are stored in an array, sorted in order of decreasing
interest measure.
Once a feature is chosen, its appearance is recorded as series of excerpts
from the reduced image sequence. A window (6 by 6 in the current
implementation) is excised around the feature's location from each of the
variously reduced pictures. Only a tiny fraction of the area of the original
(unreduced) image is extracted. Four times as much of the x2 reduced
image is stored, sixteen umes as much of the x4 reduction, and so on until
at some level we have the whole image. The final result is a scries of 6 by 6
pictures, beginning with a very blurry rendition of the whole picture,
gradually zooming in linear expansions of two to a sharp closcup of the
feature.
At each pause on its computer controlled itinerary the cart slides its
camera from left to right on the 52 cm track, taking 9 pictures at precise
6.5 cm intervals. Points are chosen in the fifth (middle) of these 9 images,
either by the correlator to match features from previous positions.-or by
the interest operator. The camera slides parallel to the horizontal axis of
the (distortion corrected) camera co-ordinate system, so the parallax-
induced displacement of features in the 9 pictures is purely horizontal.
The correlator looks for the points chosen in the central image in each
of the eight other pictures. The search is restricted to a narrow horizontal
band. This has little effect on the computation time, but it reduces the
probability of incorrect matches. In the case of correct matches, the
distance to the feature is inversely proportional to its displacement from
one image to another. The uncertainty in such a measurement is the
difference in distance a shift one pixel in the image would make. The
uncertainty varies inversely with the physical separation of the camera
positions where the pictures were taken (the stereo baseline). ong
baselines give more accurate distance measurements.
After the correlation step the program knows a feature's position in
nine images It considers each of the 36 (= 9 choose 2) possible image
pairings as a stereo baseline, and records the estimated (inverse) distance
of the feature in a histogram. Each measurement adds a little normal
curve to the histogram, with mean at the estimated distance, and standard
deviation inversely proportional to the baseline, reflecting the uncertainty.
The area under each curve is made proportional to the product of the
correlation coefficients of the matches in the two images (in central image
this coefficient is taken as unity), reflecting the confidence that the
correlations were correct. The distance to the feature is indicated by the
largest peak in the resulting histogram,
;
f this peak is above a certain
threshold. If below, the feature is forgotten about.
The correlator sometimes matches features incorrectly. The distance
measurements from incorrect matches in different pictures arc not
consistent When the normal curves from 36 pictures pairs arc added up.
the correct matches agree with each other, and build up a large peak in
the histogram, while incorrect matches spread themselves more thinly.
Two or three correct correlations out of the eight will usually build a peak
sufficient to offset a larger number of errors. In this way eight
applications of a mildly reliable operator interact to make a very reliable
L
distance measurement
Weaknesses
The measure is able to unambiguously reject edges only if they are
oriented along the four directions of summation. Edges with intermediate
direction give non-zero values for all four sums, and arc sometimes
incorrectly chosen as interesting. The operator especially favors
intersecting edges. These arc sometimes corners or cracks in objects, and
are very good. Sometimes they arc caused by a distant object peering
over the edge of a nearby one and then they are very bad. Such spurious
intersections do not have a definite distance, and must be rejected during
camera solving.
CORRELATION
Deducing the 3D location of features from their projections in 2D
images requires that we know their position in two or more such images.
The correlator is a subroutine that, given a feature description produced
by the interest operator from one image, finds the best match in a
different, but similar, image. Its search area can be the entire new picture,
or a rectangular sub-window.
The search uses a coarse to fine strategy that begins in reduced versions
of the pictures. Typically the first step takes place at the xl6 (linear)
reduction level. The 6 by 6 window at that level in the feature description,
that covers about one seventh of the total area of the original picture, is
convolved with the search area in the correspondingly reduced version of
the second picture. The 6 by 6 description patch is moved pixel by pixel
over the approximately 15 by 16 destination picture, and a correlation
coefficient is calculated for each trial position. The position with the best
match is recorded. The 6x6 area it occupies in the second picture is
mapped to the x8 reduction level, where the corresponding region is 12
pixels by 12. The 6 by 6 window in the x8 reduced level of the feature
description is then convolved with this 12 by 12 area, and the position of
best match is recorded and used as a search area for the x4 level. The
process continues, matching smaller and smaller, but more and more
detailed windows until a 6 by 6 area is selected in the unreduced picture.
The window sizes and other parameters are sometimes different from the
ones used in this example.
Motion Stereo
After having determined the 3D location of objects at one position, the
computer drives the cart about a meter forward. At the new position it
slides the camera and takes nine pictures. The correlator is applied in an
attempt to find all the features successfully located at the previous
position. Feature descriptions extracted from the central image at the last
position arc searched for in the central image at the new stopping place.
Slider stereo then determines the distance of the features so found from
the cart's new position. The program now knows the 3D position of the
features relative to its camera at the old and the new locations. Its own
movement is deduced from 3D co-ordinate transform that relates the two.
The program first eliminates mis-matches in the correlations between
the central images at two positions. Although it doesn't yet have the co-
ordinate transform between the old and new camera systems, the program
knows the distance between pairs of feature positions should be the same
in both. It makes a matrix in which element [i,j] is the aosolute value of
the difference in distances between points i and / in the first and second
co-ordinate systems divided by the expected error (based on the one pixel
uncertainty of the ranging). Fach row of this matrix is summed, giving an
indication of how much each point disagrees with the other points. The
idea is that while points in error disagree with virtually all points, correct
positions agree with all the other correct ones, and disagree only with the
bad ones. The worst point is deleted, and its effect is removed from the
remaining points in the row sums. This pruning is repeated until the worst
error is within the error expected from the ranging uncertainty.
After the pruning, the program has a number of points, typically 10 to
20. whose position error is small and pretty well known. The program
trusts these, and records them in its world model, unless it had already
STEREO
Slider Stereo
7
86
发布者:admin,转转请注明出处:http://www.yc00.com/num/1711913608a1976691.html
评论列表(0条)