A Stochastic Algorithm for Spline Fitting
A Senior Honors Thesis
Presented in partial
fulfillment of the requirements for graduation with a distinction in Computer
Science in the undergraduate colleges of the Ohio State University
By Aleksandr Dubinskiy
Ohio State University
June 2001
Honors examination
committee
Dr. Song Chun Zhu,
Project Adviser, Computer Science
Dr. Balakrishnan
Chandrasekaran, Computer Science
Dr. In Jae Myung,
Quantitative Psychology
One of the biggest problems
in Computer Vision is feature detection. A subset of that is animate object
classification, for objects from flora and fauna. For recognition or
classification of many animate objects, it is sufficient for a human eye to
only see a certain projection of an image – without texture, color, or depth
information. Certain projections, for example profiles and top-down views are
especially conducive to object recognition and classification. Due to the low
complexity of such projections compared to actual 3-D objects, it is natural to
start animate object classification with silhouettes or contours of frequently
occurring animal shapes viewed at angles easily recognizable to the human
eye. As we consider recognition of
contours, we realize that every class of objects (e.g. an eagle or a butterfly)
has a distinguishing set of features that separates it from objects in other
classes. Human perception seems to be extra sensitive to those features, and we
can recognize or classify objects based on very few cues. Depending on what
object classes are available to us to choose from, the set of features most
useful for classification of an animal contour will be different. In order to
decide what features are most informative, we first need to decide on a set of
features to choose from for a particular class. Thus our short-term task is to
design a “feature dictionary” for all object classes. This task requires
analyzing every pixel of the image and promises to be very computationally
intensive. In order to reduce the computational complexity, we attempt to
approximate each contour by using B-splines. This paper concerns itself
primarily with such dimension reduction in order to prepare our data for
further analysis. We use a probabilistic model of Monte-Carlo Markov Chains (MCMC)
in combination with Metropolis-Hastings algorithm in order to produce results
that are statistically robust.
The long-term goal of this
research project is learning the optimal vocabulary of features from a set of
animal contours in an unsupervised fashion. Instead of analyzing contours pixel
by pixel, we use a set of bases and control points to reduce the amount of
information necessary to represent a particular shape. This allows us to encode
and analyze meaningful features using less information. The problem then
becomes one of minimizing the error introduced by the use of the code.
Simultaneously we want to minimize the number of points it takes to approximate
the observed curve. These two goals are somewhat contradictory, since
increasing the number of points that generate the calculated curve to the
number of pixels on the observed curve bring the two curves as close together
as possible, and yet such a result does not lead to any dimension reduction.
Similarly if we minimize the number of points without regard for error, the
calculated curve will bear no resemblance to the observed. Thus we need to
strike a balance between the kind of precision we desire and the kind of
dimension reduction we will be satisfied with. These issues call for applying
the minimal description length principle (MDL), which will be discussed in
Section 5.
To generate an approximation to the
observed curve we use cubic B-splines. A B-spline is determined by a set of N
knots
and N control points
. The formula for every point on the spline[1]
is as follows:
(1),
where
and
![]()
All the index additions here are
subject to a modulus N operation for a spline of N nodes. The standard notation
for 3rd degree B-spline will be simplified by setting ![]()
The intuitive idea behind splines
is every control point controls a specific region of the spline, and we need to
optimize the control point locations, their number, as well as the regions they
control, determined by the knot values. The region controlled by control point
is
. For discussion of representation issues see Section 8
The original number of points can
be arbitrary, or proportional to the observed curve’s circumference. In either
case the order of points along the curve and their position are quite
important. We map
onto [0,1) line
segment, and set the
original control
points are approximately equally spaced along the observed curve with
for better
correspondence. The regions they control need to be set up in such a way as to
have the control point roughly in the middle. In order for
to be in the middle
of
, we need to set
close to the middle
between
and
, where
is the solution to ![]()
We take error of our
approximation of
by
to be the area enclosed between the two curves. Minimizing
the error then becomes equivalent to minimizing the area between the two
curves.
There are several ways to calculate are between two curves. One way is by using scan line algorithm from graphics, another is a Monte-Carlo simulation. One thing we need to keep in mind however, is that it is easier to minimize area between two curves, one of which is flexible, if we are able to control the area in a piecewise fashion. Then we can insert new points that generate the calculated curve (see death-birth process) where they are most needed.
The most natural way to
calculate area in a piecewise fashion is setting up correspondence between the
calculated and observed curves. Let us denote that correspondence by
. We take these curves to be closed and parameterized with
some
. Calculating total area then becomes equivalent to computing
. Since we are dealing with a computer simulation, all
continuous variables are discretized. We partition the calculated curve into M
points equally spaced along the curve with distance
between each adjacent
pair.
(2)
Here
is a discrete version
of
and it maps indices
of points on the calculated curve to indices of points on the observed curve.
Since correspondence is not the main focus of this paper, we put the discussion
of how to design the mapping function
in the Appendix A.
If we denote the set of N
points that encode
by W,
![]()
and the error attributable to W is Err(W)=Err(Γ), then according to MDL [1], [7] the total coding length (or energy) becomes
(3)
since along with the N
points of the code we need to encode all of the pixels of the error to restore
the observed curve
from W. A
particular coding length is only relevant within the context of an ensemble of
objects. Thus the free parameters κ and r are determined by the
prior probability P(W).[2] At this point we assume P(W) to be
independent identically distributed (IID), and k and r are
constants. The only prior knowledge we have about the contours is the rough
area of the region that contains the contour (denoted by Area) and
circumference (denoted by circumf or
).[3]
Minimizing Eng(W) becomes the new goal of this paper, while learning the
prior probability P(W) is the long-term goal for further work.
The equivalent for
finding the minimum of the energy function is maximizing the posterior
distribution of
, because according to the maximal entropy principle, such a
posterior distribution is proportional to the negative logarithm of the energy
function [1]:
, (4)
where K is some constant. We can express the solution in the following formula:
(5)
Here W is
the solution space defined as
, where
denotes the set of all possible states W of size n.
Note that
(see Fig 1).
|
|
|
Figure 1: The solution space W and MCMC Diffusion corresponds to moving
within the smaller sub-ellipses of a particular |
We do that by applying a stochastic method
called Monte-Carlo Markov Chains (MCMC). The algorithm has two parts: diffusion
and jumping. Diffusion refers to adjusting values of existing knots
and control points
and corresponds to
moving within the sub-ellipses of on Fig. 1. Jumping refers to addition (birth)
and removal (death) of knots and control points from the spline. As you can see
from Fig 1, the solution space W is interconnected, and any state can be reached from
any other state.
We
move (“diffuse”) control points and knots separately. The control points
need to be moved in two dimensions, while knots only need to
be moved along the parameterized curve. We select one of N existing knots or
control points and use the Gibbs sampler to determine its next position [2].
Points among n proposed are selected according to the following
probability distribution:
,[4]
where Z is
the normalization constant defined as
,
is the error
attributable to moving the current point to location
, and T is the
current temperature (see Section 6). In the simulation we propose positions of
the diffused control point on a 3x3 grid centered on the current location. The
positions for knots are selected on a 1-D grid of 9 points (4 to the left, 4 to
the right and itself).
While diffusion helps significantly reduce the error, using diffusion alone on the spline cannot approximate the observed curve without either underfitting or overfitting it, except in some degenerate cases. Thus we introduce the jumping process, by which we can introduce new points and eliminate existing ones. In order to do this in a systematic way, we use the Metropolis-Hastings algorithm. [6]
Let W be the state of the
system with N control points and W’ be the state of the system
with N+1 control points. ![]()
In the case of birth W is the current state and W’ is the proposed. Then, according to Metropolis-Hastings the probability of acceptance of the birth for the new point will be:
, where
![]()
Here
is the proposal
probability for birth of the new point
, while
is the proposal probability of retracting the newly
generated point. Their values will be discussed later.
The
values for
and
represent posterior
probabilities of spline states W and W’ respectively, given the observed image.
These probabilities cannot be calculated explicitly, so we use the Bayes rule.
![]()
![]()
Here
and
are prior
probabilities of having states W and W’ respectively regardless
of the observed image.
and
are likelihoods
of generating the observed curve, given
the approximation by W. Then the acceptance probability term becomes:
![]()
![]()
For lack of prior knowledge of what our images could look like, we take the prior probability to be IID, so

We design a penalty function for
birth of new points as follows:
![]()
Then
![]()
![]()
![]()
![]()
where
![]()
We define the conditional
probability
for any state W,
where
from equation 2 in
section 4. Then we obtain the following
![]()
Combining this with the prior term we get:
(5a)
![]()
We will propose a new knot first, and then the control that corresponds to it, which turns the joint probability into a product of conditional probabilities:
![]()
The probability of retracting the newly generated point will be
![]()
That is any point of the N+1 in the new state W’ has an equal chance of being retracted.
Adding a new knot and calculating
: The proposal of a
new knot will be based on the piecewise
error vector (error discretized over M partitions of the curve)
, where![]()
I will also be based on the current
partition of the spline (total of N knots) ![]()
Let the new knot be inserted at
with index
. The contribution of the new knot at point
will be
. For simulation
purposes we discretize this newly created spline basis into a vector
. We need to pick the knot, which will maximize the
coincidence between the newly created portion of the spline and the error. Thus
the best position for the new knot is determined by
![]()
Our new knot determines the
approximate location of the new control point. In order maximize the effect of
the control point on the region
, we select the new control point according to the following
rule
![]()
We select
according to the
Gaussian around the center point
,
Where
is the variance along
both the x- and the y- axes and I is the identity matrix. For the simulation we
use the following computation algorithm, where all values operated on are
discrete:
1)
![]()
2)
Figure out the index
based on the value of ![]()
3)
Calculate
by figuring out ![]()
4)
Let ![]()
5)
Find ![]()
6)
From the way we generate the new knot and control point, we can calculate the probabilities


In the case of death W’
is the current state and W is proposed.
According to the Metropolis-Hastings algorithm, the probability of acceptance of death is as follows:
, where
![]()
Using reasoning similar to above we arrive at the following:
![]()
![]()
![]()
![]()
Now we need to find
, which is the probability of getting back to
after it has been removed.

where
is the bin from which we just removed our knot
![]()

Here σ is the same variance as we use for generating new control points.
|
Table
1 – Parameters and their meanings |
|||
|
Parameter |
Its meaning |
Effect of increase of the
parameter’s value on the stochastic process |
Range of acceptable values |
|
κ |
Energy minimization / coding length constant |
Puts more emphasis on dimension reduction as opposed to precision of error. |
Not explicitly determined |
|
λ |
A rough threshold for the number of pixels by which the error needs to be reduced for birth to be accepted, or by which the error is allowed to be increased for death to be accepted |
Makes it more difficult to accept proposed births or deaths |
Near |
|
μ |
Correspondence coefficient |
Puts more emphasis on alignment as opposed to proximity in correspondence calculations. |
10.0-40.0 |
|
Tinit |
Initial temperature |
Increases acceptance of birth/death at initial stages, makes diffusion less discriminatory |
5.0-30.0 |
|
Σ |
Noise tolerance or variance |
Allows for higher tolerance of noise. |
1.0-7.0 |
|
M |
Number of partitions of the curve |
Increases the precision for error calculation as well as processing time. |
100-500 |
The final solution
depends on the following parameters:
. The following table explains each parameter’s effect on the
final solution.
The whole diffusion-jumping process
occurs in sweeps. During each sweep we perform diffusion on all knots
and control points, and at the end of each sweep we propose death or birth with
equal probability (in this simulation 30% each). The total number of sweeps is
set to 750, which is the point at which there is typically no more improvement
in the energy function. The simulation results (initial and final stages) for
several contours are shown on Fig 2. The temperature is initialized at T=Tinit
(see Section 7) and reduced at every step according to
, with
. This consistent temperature reduction is introduced to
create simulated annealing. It allows us to overcome local optima at the
beginning, while settling on the global optimum toward the end. The effect of simulated annealing on the
birth process is such that at the beginning the birth of new points is not very
discriminative with respect to the error function, but as the temperature is
lowered, new points are selected to decrease the error more precisely.