Interest Points at different scales
Published:
Interest points are defined as interesting pieces of information to solve an important problem in computer vision which is correspondence. Correspondence is in essence matching points, patches, edges or regions across images (e.g. finding the same object from different points of view).
Interest points
Interest points = Keypoints = features
Definition: They are points that are distinctive (i.e., can be easily distinguished from other texture) and repeatable (i.e., they can be identified even when the camera view changes- so with position, rotation and scale changes).
They have many applications:
- Tracking
- Recognition
- 3D reconstruction
Find a set of distinctive key points from both images (e.g. eye of the cat)
Define a region around each keypoint. (square)
Extract and normalize the region content. We have to be able to find a way to make them align to some canonical orientation. That way we’ll look much similar
Compute a local descriptor from the normalized region. (
and )Math local descriptors (
One very old example is the Harris Detector
Harris Detector
Widely used for corner detection in computer vision. Corners, as distinctive image features, are particularly important because they provide stable points that can be tracked between images. See Locating and Describing Interest Points
To fully understand the Harris corner detection, we’ll break down the math behind it and touch on concepts like eigenvalues and eigenvectors.
1. Second Moment Matrix (Auto-correlation Matrix)
At the heart of the Harris detector is the second moment matrix, also called the structure tensor. This matrix contains information about local image gradients (changes in pixel intensities) and captures how the intensity changes in a small neighborhood around a pixel.
The matrix is formed from the partial derivatives of the image intensities with respect to
The second moment matrix is denoted as:
To obtain this matrix over a neighborhood (not just at a single pixel), we smooth the gradients by applying a Gaussian filter. In other words, the matrix
Gaussian filter
Why two Different Scales: and ?
This distinction is important because the image structure (edges, corners, textures) can appear at different scales. For instance, you might want to compute derivatives on a finely detailed image (
2. Eigenvalues and Eigenvectors
To understand why we use eigenvalues, let’s first explain what they are.
- Eigenvectors are directions in which a linear transformation acts by only scaling them (i.e., their direction doesn’t change).
- Eigenvalues are the scaling factors corresponding to those eigenvectors.
The eigenvalues of the matrix
- If both eigenvalues are large, this indicates a corner (significant intensity change in both directions).
- If one eigenvalue is large and the other is small, this indicates an edge (intensity changes mainly in one direction).
- If both eigenvalues are small, this indicates a flat region (little or no intensity change in any direction).
So, the eigenvalues give us a measure of how the image gradients behave in a neighborhood. For corner detection, we are interested in areas where the intensity changes significantly in both directions, which corresponds to both eigenvalues being large.
3. Cornerness Function
The Harris detector doesn’t explicitly compute the eigenvalues but instead uses an approximation based on the determinant and trace of the matrix
Harris Detector: Mathematics
- We want large eigenvalues, and small ratio. The eigenvalues need to be strong in both axes. We can capture that with this function with the a function that combines the determinant and the trace
- We know:
- Harris detector defined via
k: empirical constant, k= 0.04-0,06 t: threshold
Automatic Scale selection
In many computer vision tasks, such as feature detection, it’s important to make the method scale-invariant—meaning the features we detect in an image should be identifiable regardless of the scale (or zoom level) of the image. The question of scalability and scale selection comes up when discussing how to detect corresponding features (like corners, blobs, or other keypoints) in images taken from different distances or perspectives.
The Problem: Scale-Invariance in Feature Detection
When you take pictures of the same object from different distances, the features (like corners, edges, and blobs) may appear at different scales. For example, a corner of a building in a wide shot might appear small, but the same corner in a zoomed-in shot will appear larger. If we simply apply feature detectors like the Harris detector at a fixed scale (i.e., using a fixed window size), we might miss features in one image that are clearly visible in another.
To address this, we need a method to:
- Detect features at multiple scales (so that the features can be recognized whether they appear large or small).
- Automatically select the appropriate scale at which a feature is most prominent, allowing us to match corresponding features across images taken at different zoom levels.
This is where automatic scale selection comes into play.
Solution: Calculating Features at Multiple Scales
A simple but effective way to detect features at different scales is to compute features at several scales and then select the optimal one. To do this, we introduce a scale parameter
!Pasted image 20241006194325.png
Scale-Sensitive Function
The next step is to come up with a function
What function ( ) should we use?
A commonly used function that fits this criterion is the Difference of Gaussians (DoG), which is an approximation of the Laplacian of Gaussian (LoG). The Difference of Gaussians is computed by subtracting two Gaussian-blurred versions of the image with slightly different values of
where
For each point in the image, we compute the DoG over multiple scales and track how the value changes as we vary σ\sigmaσ. The scale
By identifying the local maxima of this function in both position and scale, we get a list of
Summary
- Automatic scale selection is essential for detecting features in images at different zoom levels or resolutions.
- We calculate features at multiple scales by constructing a scale-space representation of the image, using Gaussian smoothing with different scales σ\sigmaσ.
- The Difference of Gaussians (DoG) is used as the function f(I(x,σ))f(I(x, \sigma))f(I(x,σ)), which detects features (blobs) and provides a scale-invariant description.
- By finding local maxima of the DoG function in position-scale space, we can identify features and their corresponding scales in both the original and zoomed-in images.
This approach allows us to match corresponding features across images, regardless of the scale at which they appear.
Orientation Normalization
In addition to scale, we also need to account for rotation to ensure that features are consistently detected regardless of how the object is oriented in the image. The idea is to normalize the orientation of the detected features.
Here’s the process:
Gradient Calculation: After detecting a feature and selecting the appropriate scale, we compute the image gradients (changes in intensity) within a window around the feature. This gives us information about how intensity changes in different directions.
Dominant Orientation: From the gradients, we create a histogram of gradient orientations within the window. The direction with the highest number of gradient responses (the dominant orientation) is chosen as the main orientation for the feature.
Alignment: Once the dominant orientation is identified, the feature is rotated so that this direction points upward (or is standardized). This ensures that the feature will have the same orientation across images, regardless of how the object is rotated.
The SIFT descriptor uses this method to normalize the orientation, ensuring that features can be matched even if the object has been rotated between images. This rotation invariance, combined with scale invariance, makes the SIFT descriptor highly robust for feature matching across various transformations.
Self-Check Questions
- What is an interest point?
- Describe the interest point matching procedure
- How does the Harris point detector work?
- How can we “uniquely” identify the scale of a feature?
- How can we “uniquely” identify the rotation of a feature?
Comments