# Inner product spaces #1

A while ago I promised a series of posts about inner product spaces and visualization, and how, in particular, the right inner product space can help build a nice visualization of trajectory data. But before we get to that, I want to start with a quick refresher of orthogonality and least squares.

Let’s think of the simplest possible geometric problem in which this arises. I give you a point $(x_1, y_1)$ in $R^2$ and ask you to give me back the closest point to it, but constrained on being in a line given by $k . (x_2, y_2)$. You simply write out the distance as a function of k and minimize it:

$s(k)^2 = (k x_2 - x_1)^2 + (k y_2 - y_1)^2$

$= k^2 x_2^2 - 2k x_2 x_1 + x_1^2 + k^2 y_2^2 - 2k y_2 y_1 + y_1 ^ 2$

$ds/dk = 2 k (x_2^2 + y_2^2) - 2 (x_2 x_1 + y_2 y_1) = 0$

$k = x_2 x_1 + y_2 y_1 / (x_2 ^ 2 + y_2 ^ 2)$

This is trivial enough. However, it is in a sense the wrong way to go about solving this problem. I want to start showing a superficially harder way to solve it that show us a lot more of the geometry of euclidean spaces. Let’s say I give you a candidate point $c = (k_c x_2, k_c y_2)$. This point could be the minimizer of the distance or it could not. How can we know? Intuitively, we can jiggle k a little bit, and see if it changes things. If it does, we move in the direction it helps. If it doesn’t, we’re close to a minimum. There’s two important vectors here: the direction in which the candidate point can move (this is the tangent vector), and the vector that points to $(x_1, y_1)$, which is the direction where $c$ would like to go if it could (this is the error vector). However, it still is possible for c to improve a little, by moving as much in the direction of $(x_1, y_1)$ that it can. In particular, when the direction to $(x_1, y_1)$ is orthogonal to the directions that c can move, we will necessarily be at a minimum.

This is the better way of looking at this problem: instead of trying to minimize the (squared) distance, try to find the point whose error vector is orthogonal to the tangent vector. Denoting the error vector as $e(k) = (x_1, y_1) - k (x_2, y_2)$, we look for $k$ such that $\langle e(k), (x_2, y_2) \rangle = 0$ (the tangent space of a point in a line going through the origin is exactly the vector that defines the line). This gives us

$\langle e(k), (x_2, y_2) \rangle = 0$

$\langle (x_1, y_1) - k (x_2, y_2) , (x_2, y_2) \rangle = 0$

$k = \langle (x_1, y_1), (x_2, y_2) \rangle / \langle (x_2, y_2), (x_2, y_2) \rangle$

Notice how much simpler the calculation is: no mention of distances or derivatives. However, the most important point about the calculation is that we never had to write out those dot products: we just use the dot product’s linearity with respect to vectors to compute k. “But dot products are easy”, you say! Why go through this trouble to avoid writing out the simplest vector expressions we can think of?

It turns out that the common dot product everyone is familiar with is far from the only dot product we can define. In particular, (real) inner products exist as long as they satisfy the following properties:

$\langle x, y \rangle = \langle y, x \rangle$

$\langle x, x \rangle \ge = 0$, with equality iff $x = \vec{0}$

$\langle a x + b y, z \rangle = a \langle x, z \rangle + b \langle y, z \rangle$

It is trivial to see that given vectors $x = ( x_i )$ and $y = ( y_i )$, $s(x, y) = \sum_i x_i y_i$ is an inner product. Here’s a slightly more complicated inner product: $m(x, y) = x^T M y$, where $M$ is a symmetric, positive-definite matrix. But why should we care about these weirder inner products? The main point is that different inner product correspond to different notions of distances between points in space, and some distances are more useful than others.

Let’s think of distances between (grayscale) images. We can define the standard inner product between two images as

$\langle i_1, i_2 \rangle = \int i_1(x, y) i_2(x, y) \; dx\; dy$

However, this distance is not particularly useful. Consider the following examples:

It should be easy to see that all of these images stand in the same distance to each another. However, this is really not what one would intuitively expect. It is also not particularly useful when one cares about the distances between different points in the same image. One would hope that portions of the images that are close together would be factored in the distance metric, so that the distance between the first and the last image is greater than the distance between the first and the second. In the next series of posts, I will show how the right inner product makes talking about these distances meaningful, and finally how to make an inner product that works for trajectory data.