A Stochastic Algorithm for Spline Fitting

 

A Senior Honors Thesis

Presented in partial fulfillment of the requirements for graduation with a distinction in Computer Science in the undergraduate colleges of the Ohio State University

By Aleksandr Dubinskiy

Ohio State University

June 2001

 

 

 

 

Honors examination committee

Dr. Song Chun Zhu, Project Adviser, Computer Science

Dr. Balakrishnan Chandrasekaran, Computer Science

Dr. In Jae Myung, Quantitative Psychology

 


 

 

ABSTRACT

 

One of the biggest problems in Computer Vision is feature detection. A subset of that is animate object classification, for objects from flora and fauna. For recognition or classification of many animate objects, it is sufficient for a human eye to only see a certain projection of an image – without texture, color, or depth information. Certain projections, for example profiles and top-down views are especially conducive to object recognition and classification. Due to the low complexity of such projections compared to actual 3-D objects, it is natural to start animate object classification with silhouettes or contours of frequently occurring animal shapes viewed at angles easily recognizable to the human eye.  As we consider recognition of contours, we realize that every class of objects (e.g. an eagle or a butterfly) has a distinguishing set of features that separates it from objects in other classes. Human perception seems to be extra sensitive to those features, and we can recognize or classify objects based on very few cues. Depending on what object classes are available to us to choose from, the set of features most useful for classification of an animal contour will be different. In order to decide what features are most informative, we first need to decide on a set of features to choose from for a particular class. Thus our short-term task is to design a “feature dictionary” for all object classes. This task requires analyzing every pixel of the image and promises to be very computationally intensive. In order to reduce the computational complexity, we attempt to approximate each contour by using B-splines. This paper concerns itself primarily with such dimension reduction in order to prepare our data for further analysis. We use a probabilistic model of Monte-Carlo Markov Chains (MCMC) in combination with Metropolis-Hastings algorithm in order to produce results that are statistically robust.

1           Background

The long-term goal of this research project is learning the optimal vocabulary of features from a set of animal contours in an unsupervised fashion. Instead of analyzing contours pixel by pixel, we use a set of bases and control points to reduce the amount of information necessary to represent a particular shape. This allows us to encode and analyze meaningful features using less information. The problem then becomes one of minimizing the error introduced by the use of the code. Simultaneously we want to minimize the number of points it takes to approximate the observed curve. These two goals are somewhat contradictory, since increasing the number of points that generate the calculated curve to the number of pixels on the observed curve bring the two curves as close together as possible, and yet such a result does not lead to any dimension reduction. Similarly if we minimize the number of points without regard for error, the calculated curve will bear no resemblance to the observed. Thus we need to strike a balance between the kind of precision we desire and the kind of dimension reduction we will be satisfied with. These issues call for applying the minimal description length principle (MDL), which will be discussed in Section 5.

2           Representation

To generate an approximation to the observed curve we use cubic B-splines. A B-spline is determined by a set of N knots and N control points . The formula for every point on the spline[1] is as follows:

        (1),

 where  and

All the index additions here are subject to a modulus N operation for a spline of N nodes. The standard notation for 3rd degree B-spline will be simplified by setting

The intuitive idea behind splines is every control point controls a specific region of the spline, and we need to optimize the control point locations, their number, as well as the regions they control, determined by the knot values. The region controlled by control point  is . For discussion of representation issues see Section 8

3           Order of points and initialization

The original number of points can be arbitrary, or proportional to the observed curve’s circumference. In either case the order of points along the curve and their position are quite important. We map  onto [0,1) line segment, and set the  original control points are approximately equally spaced along the observed curve with  for better correspondence. The regions they control need to be set up in such a way as to have the control point roughly in the middle. In order for  to be in the middle of  , we need to set  close to the middle between  and , where  is the solution to

4           Error calculation

We take error of our approximation of  by to be the area enclosed between the two curves. Minimizing the error then becomes equivalent to minimizing the area between the two curves.

There are several ways to calculate are between two curves. One way is by using scan line algorithm from graphics, another is a Monte-Carlo simulation. One thing we need to keep in mind however, is that it is easier to minimize area between two curves, one of which is flexible, if we are able to control the area in a piecewise fashion. Then we can insert new points that generate the calculated curve (see death-birth process) where they are most needed.

The most natural way to calculate area in a piecewise fashion is setting up correspondence between the calculated and observed curves. Let us denote that correspondence by . We take these curves to be closed and parameterized with some . Calculating total area then becomes equivalent to computing . Since we are dealing with a computer simulation, all continuous variables are discretized. We partition the calculated curve into M points equally spaced along the curve with distance  between each adjacent pair.

           (2)

Here  is a discrete version of  and it maps indices of points on the calculated curve to indices of points on the observed curve. Since correspondence is not the main focus of this paper, we put the discussion of how to design the mapping function  in the Appendix A.

5           Stochastic process

If we denote the set of N points that encode  by W, 

and the error attributable to W is Err(W)=Err(Γ), then according to MDL [1], [7] the total coding length (or energy) becomes

     (3)

since along with the N points of the code we need to encode all of the pixels of the error to restore the observed curve  from W. A particular coding length is only relevant within the context of an ensemble of objects. Thus the free parameters κ and r are determined by the prior probability P(W).[2]  At this point we assume P(W) to be independent identically distributed (IID), and k and r are constants. The only prior knowledge we have about the contours is the rough area of the region that contains the contour (denoted by Area) and circumference (denoted by circumf or ).[3] Minimizing Eng(W) becomes the new goal of this paper, while learning the prior probability P(W) is the long-term goal for further work.

The equivalent for finding the minimum of the energy function is maximizing the posterior distribution of , because according to the maximal entropy principle, such a posterior distribution is proportional to the negative logarithm of the energy function [1]:

,        (4)

where K is some constant. We can express the solution in the following formula:

            (5)

Here W is the solution space defined as, where denotes the set of all possible states W of size n. Note that  (see Fig 1).

Figure 1: The solution space W and MCMC

Diffusion corresponds to moving within the smaller sub-ellipses of a particular . Birth is represented by transitions from to , while death is the reverse. MCMC connects all of the possible solutions, starting with those that only have one point to those that have the number of points equal to the number of pixels on the circumference, making search for optima global.

 

 We do that by applying a stochastic method called Monte-Carlo Markov Chains (MCMC). The algorithm has two parts: diffusion and jumping. Diffusion refers to adjusting values of existing knots  and control points  and corresponds to moving within the sub-ellipses of on Fig. 1. Jumping refers to addition (birth) and removal (death) of knots and control points from the spline. As you can see from Fig 1, the solution space W is interconnected, and any state can be reached from any other state.

5.1          Diffusion

We move (“diffuse”) control points and knots separately. The control pointsneed to be moved in two dimensions, while knots only need to be moved along the parameterized curve. We select one of N existing knots or control points and use the Gibbs sampler to determine its next position [2]. Points among n proposed are selected according to the following probability distribution:

,[4]

where Z is the normalization constant defined as ,

 is the error attributable to moving the current point to location , and T  is the current temperature (see Section 6). In the simulation we propose positions of the diffused control point on a 3x3 grid centered on the current location. The positions for knots are selected on a 1-D grid of 9 points (4 to the left, 4 to the right and itself).

5.2         Jumping (Death-Birth)

While diffusion helps significantly reduce the error, using diffusion alone on the spline cannot approximate the observed curve without either underfitting or overfitting it, except in some degenerate cases. Thus we introduce the jumping process, by which we can introduce new points and eliminate existing ones. In order to do this in a systematic way, we use the Metropolis-Hastings algorithm. [6]

Let W be the state of the system with N control points and W’ be the state of the system with N+1 control points.

5.2.1      Birth process

In the case of birth W is the current state and W’ is the proposed. Then, according to Metropolis-Hastings the probability of acceptance of the birth for the new point will be:

, where

Here  is the proposal probability for birth of the new point , while is the proposal probability of retracting the newly generated point. Their values will be discussed later.

The values for and  represent posterior probabilities of spline states W and W’ respectively, given the observed image. These probabilities cannot be calculated explicitly, so we use the Bayes rule.

Here  and  are prior probabilities of having states W and W’ respectively regardless of the observed image.  and  are likelihoods of  generating the observed curve, given the approximation by W. Then the acceptance probability term becomes:

For lack of prior knowledge of what our images could look like, we take the prior probability to be IID, so

We design a penalty function for birth of new points as follows:

Then

                                                   where

[5]

We define the conditional probability  for any state W, where  from equation 2 in section 4. Then we obtain the following

Combining this with the prior term we get:

(5a)

5.2.1.1  Proposal of new points        

We will propose a new knot first, and then the control that corresponds to it, which turns the joint probability into a product of conditional probabilities:

The probability of retracting the newly generated point will be

That is any point of the N+1 in the new state W’ has an equal chance of being retracted.

Adding a new knot and calculating :  The proposal of a new knot will be based on the piecewise error vector (error discretized over M partitions of the curve)

, where

I will also be based on the current partition of the spline (total of N knots)

Let the new knot be inserted at  with index . The contribution of the new knot at point  will be. For simulation purposes we discretize this newly created spline basis into a vector . We need to pick the knot, which will maximize the coincidence between the newly created portion of the spline and the error. Thus the best position for the new knot is determined by

Our new knot determines the approximate location of the new control point. In order maximize the effect of the control point on the region , we select the new control point according to the following rule

We select  according to the Gaussian around the center point

,

Where  is the variance along both the x- and the y- axes and I is the identity matrix. For the simulation we use the following computation algorithm, where all values operated on are discrete:

1)    

2)     Figure out the index based on the value of

3)     Calculate by figuring out

4)     Let

5)     Find

6)      

From the way we generate the new knot and control point, we can calculate the probabilities

5.2.2      Death process

In the case of death W’ is the current state and W is proposed.

According to the Metropolis-Hastings algorithm, the probability of acceptance of death is as follows:

, where

Using reasoning similar to above we arrive at the following:

Now we need to find , which is the probability of getting back to after it has been removed.

where is the bin from which we just removed our knot

Here σ is the same variance as we use for generating new control points.

6           Parameters and their meanings

Table 1 – Parameters and their meanings

 

Parameter

Its meaning

Effect of increase of the parameter’s value on the stochastic process

Range of acceptable values

κ

Energy minimization / coding length constant

Puts more emphasis on dimension reduction as opposed to precision of error.

Not explicitly determined

λ

A rough threshold for the number of pixels by which the error needs to be reduced for birth to be accepted, or by which the error is allowed to be increased for death to be accepted

Makes it more difficult to accept proposed births or deaths

Near of the given contour

μ

Correspondence coefficient

Puts more emphasis on alignment as opposed to proximity in correspondence calculations.

10.0-40.0

Tinit

Initial temperature

Increases acceptance of birth/death at initial stages, makes diffusion less discriminatory

5.0-30.0

Σ

Noise tolerance or variance

Allows for higher tolerance of noise.

1.0-7.0

M

Number of partitions of the curve

Increases the precision for error calculation as well as processing time.

100-500

The final solution depends on the following parameters:. The following table explains each parameter’s effect on the final solution.

 

7           Simulation overview and results

The whole diffusion-jumping process occurs in sweeps. During each sweep we perform diffusion on all knots and control points, and at the end of each sweep we propose death or birth with equal probability (in this simulation 30% each). The total number of sweeps is set to 750, which is the point at which there is typically no more improvement in the energy function. The simulation results (initial and final stages) for several contours are shown on Fig 2. The temperature is initialized at T=Tinit (see Section 7) and reduced at every step according to , with . This consistent temperature reduction is introduced to create simulated annealing. It allows us to overcome local optima at the beginning, while settling on the global optimum toward the end.  The effect of simulated annealing on the birth process is such that at the beginning the birth of new points is not very discriminative with respect to the error function, but as the temperature is lowered, new points are selected to decrease the error more precisely.