visualization, etc.

The cost of a sample

January 23, 2010 · Leave a Comment

Over at “Please scoop me”, Jonathan Chang analyses the optimal number of samples one should use to estimate an expectation, when you assume that obtaining samples incurs cost.

Someone should write a paper looking at the implications of this type of analysis for distribution ray tracing and Metropolis light transport.

→ Leave a CommentCategories: vis
Tagged:

A simple explanation of the asymptotic decider

January 21, 2010 · Leave a Comment

The asymptotic decider is one of the standard ways of removing ambiguities in Marching Cubes. It’s a neat trick that uses the fact that bilinear interpolation across a face always creates hyperbolas, and so we can pick the value of the interpolation on the asymptote to help decide which of the two possible cases we have. The times I taught SciVis or have seen people teach it, students have always had a hard time understanding it.

It just occurred to me a simple way to explain it that I haven’t heard before. What the asymptotic decider is really doing is performing a single data-driven subdivision of the domain into smaller rectangles, such that the cases inside each subrectangle are unambiguous. The right place to subdivide the domain so that all the cases are necessarily unambiguous for a bilinear interpolant is exactly the asymptote.

→ Leave a CommentCategories: vis
Tagged:

Free beer!

January 11, 2010 · 7 Comments

Pop quiz: what’s the relation between the two images below? If you’re the first person to get it right (and do it before I get bored and tell you what it is), I’ll buy you a beer next time we chat.

EDIT: Erik the party pooper got it right, but I’ll not reveal it just yet.

→ 7 CommentsCategories: vis

Matching Shapes using the Current Distance

January 7, 2010 · Leave a Comment

This is the paper I intend to digest for the next few days: Matching Shapes using the Current Distance, by Sarang Joshi, Raj Varma Kommaraju, Jeff Phillips and Suresh Venkatasubramanian. The current distance is where what I wrote about comparing trajectory data is eventually going. Be sure to read this in case you were interested in what I have been blabbering about here (and want to see what is actually interesting about it :) .

→ Leave a CommentCategories: vis

Arduino hacking: differential drive for piezos and the Tone Library

December 26, 2009 · Leave a Comment

My Christmas present this year was a bunch of Arduino gear, so I have spent some time playing with it. I bought a piezo to see what beeping noises I could make. To get accurate tones, it turns out that you need to use the hardware timers to generate the square waves. The Tone library does exactly that, with a nice interface to boot.

To learn my way around larger pieces of arduino code, I decided to change the tone library so that it optionally uses differential drive. This is how you get twice the voltage, and twice the volume, of a regular pin-to-ground piezo setup. By connecting both ends to different pins and making sure their outputs alternate, we can get 10V edges out of the arduino’s 5V pins. While it is well short of my piezo’s 30V range, it’s loud enough to hear it across the apartment (ie sufficiently loud to annoy your SOs).

Anyway, here’s the patched Tone.cpp file. When creating a Tone object, you have the option of passing an extra parameter which is the second pin. If you ignore it, the code won’t tie up any extra pins (but it will use more memory).

→ Leave a CommentCategories: etc

Interlude: Distances between point sets

December 18, 2009 · Leave a Comment

Before I keep writing about the visualization of trajectory data, I want to show something cool about inner products that I came across quite by accident today.

Imagine you have two sets of points in R^d, a and b, and you want to measure the distance between the two sets of points globally. This comes up all the time when doing clustering. One way of doing this is to consider the distance between the two point sets to be the minimum distance between any point in a and any other point in b: d_M(a, b) = min_{i,j} d(a_i, b_j). This is fine, but it is not a metric: two point sets might be different and have distance zero. You can also look at the max-min distance: d_H(a,b) = max_j min_i d(a_i, b_j). This is not symmetric, but we can then use the symmetric version d_S(a, b) = max(d_H(a, b), d_H(b, a)). This is fine, but it’s not what I want to write about today.

Yet another idea of computing the distance between clusters is to somehow take the average distance between points in a and b. We’ll look at the sum for now, but all of this applies to the average up to a constant factor, so d(a, b) = \sum_i \sum_j d(a_i, b_j). This seems like a reasonable way of computing the distance between clusters, but it takes quadratic time to compute. Can we fix this? Pretty much, and this is what we will do now.

Let’s start thinking about a slightly different distance function, namely, d(a,b)^2 = \sum_i \sum_j d(a_i, b_j)^2. It turns out that this sum can be evaluated exactly in linear time, and it’s very simple to show that.

First, recall that ||x||^2 = \langle x, x \rangle, and so ||a - b||^2 = \langle a - b, a - b \rangle = \langle a, a \rangle - 2\langle a, b \rangle + \langle b, b \rangle. Then,

d(a,b)^2 = \sum_i \sum_j \langle a_i - b_j, a_i - b_j \rangle

d(a,b)^2 = \sum_i \sum_j \langle a_i, a_i \rangle + \sum_i \sum_j \langle b_j, b_j \rangle - 2 \sum_i \sum_j \langle a_i, b_j \rangle

The first two terms of the right hand side can be evaluated in linear time, and what’s interesting about the last sum is that it turns out to be exactly zero. To see that, we first write a_i and b_j in terms of cluster mean and their displacement: a_i = \mu_a + \delta_{ai}. The trick we will use is that \sum_i \delta_{ai} = 0, and, in particular, \sum_i \langle k, \delta_{ai} \rangle = \langle k, \sum_i \delta_{ai} \rangle = \langle k, 0 \rangle = 0. With this in mind everything else is trivial:

\sum_i \sum_j \langle a_i, b_j \rangle = \sum_i \sum_j \langle \mu_a + \delta_{ai}, \mu_b + \delta_{bj} \rangle
\sum_i \sum_j \langle a_i, b_j \rangle = \sum_i \sum_j \langle \mu_a , \mu_b \rangle + \langle \mu_a, \delta_{bj} \rangle + \langle \mu_b, \delta_{ai} \rangle + \langle \delta{ai}, \delta_{bj} \rangle

The last three terms are zero, and so

d(a,b)^2 = |b| \sum_i \langle a_i, a_i \rangle + |a| \sum_j \langle b_j, b_j \rangle - |a| |b| \langle \mu_a, \mu_b \rangle

With a little bit of rearranging, you can also write this as

d(a,b)^2 = |b| \sum_i ||\delta_{ai}||^2 + |a| \sum_j ||\delta_{bj}||^2 + |a| |b| d^2(\mu_a, \mu_b)

This is kind of neat: even though the original sum had a quadratic number of terms, most of them cancel out nicely. However, we can go further. This function we have defined is not really a metric, because d(a, a) \neq 0. We can, in fact, fix this, and it turns out that the natural fix to make it a metric will end up looking very familiar. Looking at that function above, it seems like what we want to do is to somehow get rid of the first couple of terms: they do not actually measure any distance from a to b at all.

We were trying to define the distance between two point sets by summing over the distance of the points. We know, however, how to compute distances from inner products. So let’s instead define an inner product between two point sets, and see what kind of distance function falls out of it. In particular, let’s just use an inner product in perfect analogy with the way we were trying to define distances: \langle a, b \rangle = \sum_i \sum_j \langle a_i, b_j \rangle. Now, we look at what happens with d(a, b)^2 = \langle a, a \rangle + \langle b, b \rangle - 2\langle a, b \rangle:

d(a, b)^2 = \sum_i \sum_j \langle a_i, a_j \rangle + \langle b_i, b_j \rangle - 2 \langle a_i, b_j \rangle

We do the same trick of writing the elements in terms of their means and displacements:

d(a, b)^2 = \sum_i \sum_j \langle \mu_a + \delta_{ai}, \mu_a + \delta_{aj} \rangle + \langle \mu_b + \delta_{bi}, \mu_b + \delta_{bj} \rangle - 2 \langle \mu_a + \delta_{ai}, \mu_b + \delta_{bj} \rangle

Now the biggest difference between this expression and the one before it is that all of the inner products have \delta terms in them, which all cancel out! So the whole thing, in the end, is:

d(a, b)^2 = \sum_i \sum_j \langle \mu_a, \mu_a \rangle + \langle \mu_b, \mu_b \rangle - 2 \langle \mu_a, \mu_b \rangle = \sum_i \sum_j d(\mu_a, \mu_b)^2

So this fancy-looking inner product between clusters we were trying to define turns out to define a distance that is exactly the distance between the two cluster centroids! This was a surprise to me. In a sense, this is a long-winded justification for using the distance between cluster centroids as a distance measure between the entire clusters.

Wait a minute, you say! This means that two clusters will have zero distance even though they are not the same! Yes, this is because that inner product defined about defines, in fact, a semi-Hilbert space, instead of a Hilbert space. This is an inner product space where some nonzero vectors are allowed to have length zero. This happens, essentially, because if you squint hard enough, you can think of the inner product that we wrote as a^T M b, where M is a matrix entirely composed of ones. This means we can also write that inner product as a^T M_1 M_2^T b = (M_1^T a)^T (M_2^T b), where M_1 and M_2 are vectors entirely composed of ones. Notice that the whole inner product becomes an inner product of the sums of the points, which is exactly where the centroids come in. If this matrix M were full rank, then our point-set inner product would, in fact, be a true Hilbert space. More about that in future posts.

→ Leave a CommentCategories: vis

Inner product spaces #1

November 30, 2009 · Leave a Comment

A while ago I promised a series of posts about inner product spaces and visualization, and how, in particular, the right inner product space can help build a nice visualization of trajectory data. But before we get to that, I want to start with a quick refresher of orthogonality and least squares.

Let’s think of the simplest possible geometric problem in which this arises. I give you a point (x_1, y_1) in R^2 and ask you to give me back the closest point to it, but constrained on being in a line given by k . (x_2, y_2). You simply write out the distance as a function of k and minimize it:

s(k)^2 = (k x_2 - x_1)^2 + (k y_2 - y_1)^2

= k^2 x_2^2 - 2k x_2 x_1 + x_1^2 + k^2 y_2^2 - 2k y_2 y_1 + y_1 ^ 2

ds/dk = 2 k (x_2^2 + y_2^2) - 2 (x_2 x_1 + y_2 y_1) = 0

k = x_2 x_1 + y_2 y_1 / (x_2 ^ 2 + y_2 ^ 2)

This is trivial enough. However, it is in a sense the wrong way to go about solving this problem. I want to start showing a superficially harder way to solve it that show us a lot more of the geometry of euclidean spaces. Let’s say I give you a candidate point c = (k_c x_2, k_c y_2). This point could be the minimizer of the distance or it could not. How can we know? Intuitively, we can jiggle k a little bit, and see if it changes things. If it does, we move in the direction it helps. If it doesn’t, we’re close to a minimum. There’s two important vectors here: the direction in which the candidate point can move (this is the tangent vector), and the vector that points to (x_1, y_1), which is the direction where $c$ would like to go if it could (this is the error vector). However, it still is possible for c to improve a little, by moving as much in the direction of (x_1, y_1) that it can. In particular, when the direction to (x_1, y_1) is orthogonal to the directions that c can move, we will necessarily be at a minimum.

This is the better way of looking at this problem: instead of trying to minimize the (squared) distance, try to find the point whose error vector is orthogonal to the tangent vector. Denoting the error vector as e(k) = (x_1, y_1) - k (x_2, y_2), we look for $k$ such that \langle e(k), (x_2, y_2) \rangle = 0 (the tangent space of a point in a line going through the origin is exactly the vector that defines the line). This gives us

\langle e(k), (x_2, y_2) \rangle = 0

\langle (x_1, y_1) - k (x_2, y_2) , (x_2, y_2) \rangle = 0

k = \langle (x_1, y_1), (x_2, y_2) \rangle / \langle (x_2, y_2), (x_2, y_2) \rangle

Notice how much simpler the calculation is: no mention of distances or derivatives. However, the most important point about the calculation is that we never had to write out those dot products: we just use the dot product’s linearity with respect to vectors to compute k. “But dot products are easy”, you say! Why go through this trouble to avoid writing out the simplest vector expressions we can think of?

It turns out that the common dot product everyone is familiar with is far from the only dot product we can define. In particular, (real) inner products exist as long as they satisfy the following properties:

\langle x, y \rangle = \langle y, x \rangle

\langle x, x \rangle \ge = 0, with equality iff x = \vec{0}

\langle a x + b y, z \rangle = a \langle x, z \rangle + b \langle y, z \rangle

It is trivial to see that given vectors x = ( x_i ) and y = ( y_i ), s(x, y) = \sum_i x_i y_i is an inner product. Here’s a slightly more complicated inner product: m(x, y) = x^T M y, where M is a symmetric, positive-definite matrix. But why should we care about these weirder inner products? The main point is that different inner product correspond to different notions of distances between points in space, and some distances are more useful than others.

Let’s think of distances between (grayscale) images. We can define the standard inner product between two images as

\langle i_1, i_2 \rangle = \int i_1(x, y) i_2(x, y) \; dx\; dy

However, this distance is not particularly useful. Consider the following examples:

It should be easy to see that all of these images stand in the same distance to each another. However, this is really not what one would intuitively expect. It is also not particularly useful when one cares about the distances between different points in the same image. One would hope that portions of the images that are close together would be factored in the distance metric, so that the distance between the first and the last image is greater than the distance between the first and the second. In the next series of posts, I will show how the right inner product makes talking about these distances meaningful, and finally how to make an inner product that works for trajectory data.

→ Leave a CommentCategories: vis

Since the ACM envies the bad press the IEEE gets…

November 23, 2009 · Leave a Comment

I have ranted aplenty previously about the sheer stupidity of IEEE Xplore, and it now seems that ACM is trying to steal some of that spotlight. The ACM publications board has decided to shut down some of the most useful resources available on the web for indexing recent conference papers, such as Tim Rowley old famous SIGGRAPH paper webpages, and Ke-Sen Huang’s more recent incredible effort. ACM Publications Board, meet the Streisand effect. I’m sure you’ll get along very well.

Drs Boisvert, Rushmeier and Ozsu:

I have just finished grad school and have started receiving invitations to join the ACM as a member. As I have been a student member of ACM SIGGRAPH for the last couple of years, I have given serious thought to becoming a full member of ACM. I did so despite there not being a single ACM service I could think of that I routinely used day-to-day. Google has won, and the services it provides me are better than your indexing tools. Still, I understand and agree with the ACM goal, stated on your website, to “provide [...] its members and the computing profession with leading-edge publications, conferences, and career resources.”.  As I am sure you are aware and agree with me, computer science research has a huge impact in industry, which in turn changes people’s lives. This is meant to be, and is a good thing. All dissemination of computer science research is morally right. Stopping it is wrong. The goal of the ACM is to push computing forward. With your decision of disallowing listing of recent publications, you are holding it back. You are putting the ACM in a morally wrong position, one which will ultimately lose, and most importantly, will simply not achieve your goals. In fact, what are your goals, exactly?

With this recent decision of disallowing linking to webpages that contain the author’s original work and is clearly meant for dissemination, you put me in the uncomfortable situation of wishing you didn’t exist. If the ACM were not here right now, Google would still exist, and I would still find those papers (as a matter of fact, I still can!). So you turned ACM into a tool for making it harder to disseminate research. Please realize that the proverbial horse has left the barn, about ten years ago. ACM should be setting the example, and it seems to be utterly failing to do so.

Please make me want to be an ACM member again – make me want you to still exist. Reconsider this wrong, ineffective, poorly-conceived decision. People vote with their feet. I know I will.

Yours,

Carlos Scheidegger

Update: Holly Rushmeier (very quickly) replied to my letter saying “a resolution is in the works”, and that we should expect to see the results soon.

→ Leave a CommentCategories: academia · etc

CUDA 2.3 on Ubuntu 9.10

November 16, 2009 · 3 Comments

In case you’ve been pulling hairs over this like me, NVIDIA’s CUDA 2.3 doesn’t work on any platforms that ship with gcc 4.4, here’s how to make it work without having to uninstall gcc 4.4 (which is what most of the NVIDIA forum posts suggest).

NVIDIA seems to hardcode “/usr/bin/gcc” in their tools, in spite of providing hand-written makefiles with the option of changing CC. So even if one installs gcc-4.3 and says “CC=gcc-4.3″ in various different places over their own handwritten makefiles (because why would their own makefiles respect their conventions instead of simply overwriting the options? Much more fun to be had that way), one still gets this:

simpleGL.cpp: In function ‘void runAutoTest()’:
simpleGL.cpp:349: warning: deprecated conversion from string constant to ‘char*’
/usr/include/string.h:43: error: inline function ‘void* memcpy(void*, const void*, size_t)’ cannot be declared weak
/usr/include/string.h:64: error: inline function ‘void* memset(void*, int, size_t)’ cannot be declared weak
/usr/include/bits/string3.h:49: error: inline function ‘void* memcpy(void*, const void*, size_t)’ cannot be declared weak
/usr/include/bits/string3.h:78: error: inline function ‘void* memset(void*, int, size_t)’ cannot be declared weak
/usr/local/cuda/bin/../include/common_functions.h:59: error: inline function ‘void* memset(void*, int, size_t)’ cannot be declared weak
/usr/local/cuda/bin/../include/common_functions.h:62: error: inline function ‘void* memcpy(void*, const void*, size_t)’ cannot be declared weak
/usr/local/cuda/bin/../include/math_functions.h:412: error: inline function ‘int __signbit(double)’ cannot be declared weak
/usr/local/cuda/bin/../include/math_functions.h:417: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbit(double)’ cannot be declared weak
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/usr/include/bits/mathcalls.h:350: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
/usr/include/bits/mathinline.h:36: error: inline function ‘int __signbitf(float)’ cannot be declared weak
/usr/include/bits/mathinline.h:42: error: inline function ‘int __signbit(double)’ cannot be declared weak
/usr/include/bits/mathinline.h:48: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
/usr/local/cuda/bin/../include/math_functions.h:442: error: inline function ‘int __signbitl(long double)’ cannot be declared weak
make[1]: *** [obj/release/simpleGL_kernel.cu.o] Error 255

which are errors related to gcc 4.4 (sensibly) not liking inline functions declared as weak symbols. However, how is nvcc even seeing gcc 4.4, you ask? Ah, someone hardcoded /usr/bin/gcc elsewhere in nvcc. So you have to use the command-line option –compiler-bindir (incidentally, you have to specify the full binary path, and not the directory, despite the option being called “…-bindir”). So add “–compiler-bindir=/usr/bin/gcc-4.3″ to NVCCFLAGS in your makefiles,

$ sudo apt-get install g++-4.3

And then it all works. NVIDIA, your Linux driver team kicks ass (and is the primary reason I have been buying NVIDIA since the TNT2). Please tell them to have a chat with the CUDA people :)

→ 3 CommentsCategories: vis

A call for a visualization Twitter hashtag

October 19, 2009 · 4 Comments

The visualization community should have a de facto twitter hashtag for all things vis/infovis/vast/whateveryouwanttocallit. I find that I’m much more likely to post a link if I don’t have to blabber for a couple of paragraphs, and Twitter is perfect for that. And if we’re all posting about the same things on Twitter, we might as well decide on a hashtag. Robert Kosara’s #visweek suggestion was just about perfect, but I’m afraid #visualization is wasting too many characters. I’m fine with #vis, but it might offend the InfoVis folks (TJ, Robert: would you hashtag your posts #vis? I don’t want to have to search for #vis and #infovis, and I hate the distinction anyway)

Here are my suggestions: #visualization, #vis, #v13n.

→ 4 CommentsCategories: vis
Tagged: