ECE4580: Formative Questions

The questions below are meant to highlight key aspects or principles related to the reading. Being able to answer them prior to the lectures of that week should provide you with some nominal uderstanding of what will be covered so that you may focus on understanding the material and listening, as opposed to the written material on the board. Some of the questions may be slight variations on each other. Many involve answering multiple questions, so maybe we should think of each question more as a question set, where sometime the set consists of an individual element.

Topic 1: Image Formation

Question 1: Describing in English rather than in mathematical form, answer the following questions:

What differentiates the orthographic projection equations from the perspective projection equations?
When taking a digital picture, quantization of the sensed scene is necessary. Why is that? What does the quantization lead to as a measurement?

Question 2: Describing in English rather than in mathematical form, answer the following questions:

What are the perspective projection equations about?
What kind of functional relationship do they have as a function of distance?

Question 3: In English, or in more mathy terms if needed, answer the following to the best of our abilities:

What are homogeneous coordinates for rays?
What is homogeneous matrix representation?
Why is the homogeneous representation for rays so useful?

Question 4: What connection does ray homogeneous form have to the camera projection equations? (related to Question 4.3 part 3)

Topic 2: Camera Geometry

Question 1: In English, or in more mathy terms if needed, answer the following to the best of our abilities:

What are homogeneous coordinates for points?
What is homogeneous matrix representation?
Why is the homogeneous representation (for points/matrices) so useful?

Question 2: What connection does point/vector homogeneous form have to the camera projection equations?

Question X: I would consider the mathematical model for sensing of a point in space onto an imaging sensor to consist of three steps. What do the three steps do?

Question X: These three steps each involve parameters or constants that need to be known. One set of the constants are called “instrinic parameters” and the other are called “extrinsic parameters.” What are these two sets of parameters, and how to they relate to the three steps?

Topic 3: Camera Calibration

Question 1: In class, I will make some efforts to explain a linear calibration strategy. How is it that the traditionally non-linear projection equations would lead to a linear set of equations? In particular, what are the unknown values and how do they appear in the system of equations?

Question 2: There is a nice document in the reading schedule that covers the Direct Linear Transform (DLT). The DLT uses SVD, but not exactly like has been done in the homework. Rather the technique uses a trick to arrive at a zero solution.

What trick is used to generate a zero on one side?
The trick means that the matrix is no longer linear in the known variables. What kind of dependence on the variables does the matrix now have?

The recommended reading for the document is just up to Sections 2 and 3. Feel free to skim the remainder of the document since it has an interesting application (3D reconstruction of a face from two structured light views). There is a 3D scanner in the Inventure studio that uses a similar strategy (e.g., structured light) to generate 3D models of small objects. The early version of the Kinect depth sensor and several other depth sensors created since utilize this form of 3D measurement.

Question 3: Read up on QR decomposition (also called QR factorization). You can use wikipedia or the Appendix of Szeleski.

Given a real-valued matrix A, explain what the QR decomposition of the matrix A is.
How does the decomposition relate to the camera projection matrix M?
What utility would knowing about the QR decomposition have?

In my notes, I use the symbol M, however Szeleski (Sec 2.1.5) and the DLT reading use the symbol P to denote the camera (projection) matrix.

Topic 4: Stereo and Multiview Geometry

Question 1: Computing depth from stereo, or 3D point recovery from stereo, is known as triangulation. Why is that? How is this triangle visualized?

Question 2: Given stereo (or two-frame) views, computing the essential and fundamental matrix system of equations use the cross-product trick again. What two rays are used and what vector is used in the cross product? What property of stereo views is exploited to derive this cross product? What is the difference between the essential matrix and the fundamental matrix?

Topic 5: Images as Functions

Question 1: What is a convolution kernel? What relationship is there with Fourier analysis? What purpose would a convolution kernel serve in image processing?

Topic 6: Optimization in Computer Vision

Question 1: The simplest and most brute-force method for performing classification is known as nearest neighbor search. Given a collection of data with labels, how does nearest neighbor determine the labels of a new set of data?
There is an extension called k-nn (of k nearest neighbors). How does it differ from nearest neighbor?

Question 2: What is a Voronoi diagram (or Voronoi partition)? How does it relate to Lloyd's algorithm? What is Lloyd's algorithm?

Topic 7: Bayesian Statistics in Computer Vision

Question 1: What is Bayes' Rule? Why is it important to be aware of Bayes' rule when making decisions based on binary tests?

Topic 8: Optical Flow

Question 1: What is the optical flow constraint? Why is it ill-posed (e.g., degenerate)?

Question 2: What favorite trick do we apply to the optical flow constraint to obtain a well-posed optimization problem for dense optical flow?

Question 3: Optional: Sparse optical flow doesn't have this problem. Why not? (We covered it earlier in class)

Main

Patricio Vela: Course Wiki

Table of Contents