And now for something slightly different: cscheid.net

I’ve decided to start my own domain and website, which will host all my content from now on. I hope to see all five of you at cscheid.net!

VisWeek 2010 Postmortem: First Thoughts

As I wait to board on the plane back from Salt Lake City, let me try to collect the last shreds of coherent thought I might have from this great week. Jerome Cukier mentioned on twitter he slept less than 30 hours throughout the whole week. I counted 35-40 hours over 7 days; not as bad, but still I can no longer think about anything very much.

One trend mentioned on the hallways, and at least unofficially confirmed by attendance data, is that SciVis seems to be having a harder time getting attention than InfoVis (I really don’t know about VAST). This seems to  match, at least in part, with my experience throughout the week. For example, Sam Gerber and co-authors’  work on high-dimensional visualization paper was presented at SciVis, but it seems broadly applicable to InfoVis as well. As a whole, however, we are doing very well, and had record attendance. While VisWeek is nowhere as large as UIST or CHI, if the trend continues we will be soon flirting with 1000 attendees. That’s really great.

To me, the most intriguing papers this year had to do with foundational issues. I already talked about the work of Wickham et al., which tries to bridge the visualization and statistical ways of attacking problems. But I was particularly surprised to learn about many papers which, in a sense, are actively looking for principles of vis effectiveness. A comment by Pat Hanrahan on the VAST panel was particularly important; he said, roughly, that “the community has had enough failures that it is now worth studying them in detail”.

As visualization becomes an approachable tool for data analysts “out there”, we will have to fundamentally shift our way of doing research. This is an exciting future, but it’s a different one than what I saw in maybe a third of the papers this year. The way in which the vis community realigns itself along the axis separating basic science from engineering practice is going to have a large impact in the field. We should all pay attention.

VisWeek papers 2: “Graphical Inference for InfoVis”

If I had to pick a single paper from the all the VisWeek papers I’ve read so far, it would easily be “Graphical Inference for InfoVis”, by Wickham, Cook, Hoffman and Buja.

What I like about this paper is that is neatly shows the basic difference between the visualization mindset and the statistical mindset. Visualization folks have the following meta-strategy: “Is there any way that I can build a picture in which the data will tell me something?” Statistics folks, on the other hand, tend to think: “Is there any way that these results about the data are fooling me?” While visualization is immensely helpful in actually finding these patterns in the data, it is just as important that we do not fool ourselves, and the fact that VisWeek this year is bringing back the “Vis Lies” session is a testament that we are all aware of this problem, even if only in the back of our minds.

This paper presents two neat experimental protocols which let you use the best of both worlds, using visualizations to bringing out patterns, while making sure that the patterns you do find are in the data itself and not an artifact of the way you decided to encode your variables. In passing, the paper also has a great explanation of statistical testing in general, and the single best accurate description of what p-values really are (if you’ve never heard a good explanation about them, you’re likely to get it wrong).

I can’t recommend this paper strongly enough. If I had to decide on a single paper this year which I think people will easily remember in 10 years, this would be it.

VisWeek 2010 local recommendations

Since I’ve spent a little bit of time in Salt Lake, I figure some of you reading this and attending VisWeek might be looking for a few tips. Roughly in increasing distance order from the Grand America, here are some places you should consider eating:

  • The Bayou: 200+ types of beers, killer sweet potato fries and alligator cheesecake. Sometimes they have good live music. Gets crowded.
  • Acme Burger: Ostrich burgers! It is meant to be a gourmet burger place, and it’s pretty good.
  • They closed! Sad.

  • Settebello: Great thin-style pizza. Their diavolo is awesome, you should go if only to try that.
  • Biaggi’s: Yeah, it’s a chain, but it’s affordable, nonthreatening, tasty Italian on the Gateway (the local “mall”). Their butternut squash ravioli is to die for, along with the Creme Brulée, which is cheap but easily beats many tens-of-dollars Creme Brulées we’ve had in fancier places.
  • The Counter: Another good burger place, lots of customization possibilities (you make your own, essentially). This is an example.
  • Wow, closed too. No one in Salt Lake likes burgers…

  • Red Iguana: Mexican place which specializes in moles. If you like spicy food, go for the mole amarillo made with habaneros. Otherwise, the mole poblano is where it’s at. This is the best Mexican I’ve ever had (and the place itself has a great hole-in-the-wall vibe), but be prepared for hour-long lines if you get there past 7:00PM.

The two non-food places you should not miss are the Salt Lake Public Library (fantastic architecture if you’re at all into this type of stuff) and Sam Weller’s, a great used and rare bookstore close by. They had the first edition of Darwin’s On the Origin of Species last time we went. Pretty great stuff!

VisWeek 2010 papers 1: continuous density plots

In this first post about the upcoming VisWeek 2010 papers, I want to talk about two papers in an area which I know a little bit about, and which I think is a very nice recent development. Back in 2006, Hamish Carr and two of his students, Brian Duffy and Barry Denby, noted one could compute histograms of scalar fields differently than the obvious way, and differently in an interesting way. Instead of counting the node or cell values, there is an equivalent integral expression in terms of a function defined on the isosurface. The original paper had a small bug in it, and in 2008 that got fixed. In parallel, Bachthaler and Weiskopf generalized the technique to multiple dimensions, which turned it into a real visualization technique. The basic idea of their Continuous Scatterplots is to think of the visualization as mapping some mass from the spatial domain to the data domain. Each point in space has a certain value, and densities of these values are different. For example, the density of data points with zero altitude is much larger than the one with an altitude of, say, ten thousand feet. The interesting derivation in their paper is the one for multidimensional datasets; it has now led to several followup papers, eg. using the same technique for parallel coordinates, and making it more efficient.

(Since these are all preprints, I will not post images of the papers; please refer to the linked files)

The first paper I want to talk about, “Discontinuities in Continuous Scatterplots” by Lehmann and Theisel, talks about a new twist on the idea of Continuous Scatterplots. One of the issues which arise when drawing these scatterplots is that the spatial domain gets mapped fairly arbitrarily to the data domain. It is sensible that one would like to have some general idea of the relation between those two domains, and one of the ways one can do that is to indicate the boundary of the spatial domain. Another interesting feature of these plots is that sometimes the spatial domain folds over itself when the mapping “changes direction” (for an appropriate definition of the term, obviously). It turns out both of these are manifestations of the differential structure of the plot breaking down; they are discontinuities. So the authors develop the math to detect these points and show how one can draw them in an NPR-like fashion. I think, personally, that the folding patterns are a more informative than the boundary ones (again, see the paper for the figures, but I’m thinking specifically about Figure 7c). All in all, nice combination of theory and pretty pictures, which is what vis research is all about :). One question I’d like to see answered is whether these extra lines help the user; since something very similar to continuous scatterplots are used extensively in transfer function widgets for volume visualization, it’d be interesting to see the relation between the boundary and fold lines and the transfer functions created by users.

The second paper I want to talk about, “On the Fractal Dimension of Isosurfaces” by Khoury and Wenger, has a more theoretical bent. One of the original reasons I became interested in the original 2006 paper of Carr, Denby and Duffy was that their argument makes it possible for us to actually do math on the histogram computations; we can compute things like the derivative of the histogram. It might also help explain a few questions we don’t quite know how to answer. In particular, Khoury and Wenger directly take on the original “mystery” of 2006’s paper, which was the observation that the observed rate of growth of isosurface statistics was much too large to be explained by what we thought was the behavior of isosurfaces inside volumes. In 2008, we noticed that scalar noise seemed to have a strong effect on these statistics, but we didn’t quite know how to analyze it directly. This analysis, I am very happy to say, is exactly what Khoury and Wenger did: they found the right way to compute the characteristics of a particular noise model which we used in 2008. With their analysis, they can explain and predict directly the impact of noise in a related statistic: the fractal dimension of a set. One interesting application of this is in noise reduction. One of the problems in denoising scalar fields is deciding when to stop: have we smoothed away all the good stuff, or have we only thrown away noise? Perhaps this framework can be used to inform the stopping time of these smoothing techniques.

One question I didn’t see answered in Khoury and Wenger’s work is a subtle point that we raised in our 2008 paper, perhaps not very clearly. Specifically, the issue is that there are many ways one can take “averages of all isosurfaces”, and unfortunately they generate different results. Counting all isosurfaces by sampling the isovalues uniformly is one possible definition, but it has the disadvantage that (for example), if one changes the scalar field by mapping all the scalars through f(x) = x^3, all the isosurfaces will be preserved, but the number you arrive at is different. This means that this number you just computed has something to do with the isosurfaces, but it also has something to do with the scalar distribution, and perhaps that’s not what is intended. I’d like to see the fractal dimension computation where the average is taken in a way that’s invariant to these mappings; would the results be different, and how so? The easiest way to do enforce this invariance is to just equalize the histogram of the scalar field before computing the statistics; in our 2008 paper we show why this is the case. If one is worried about quantization artifacts, there’s an alternative Monte Carlo integration technique as well, by uniformly sampling points in the scalar field and using their isovalues.

This is very nice work, and it shows how much there’s still out there left to understand about isosurfaces. One direct avenue for future work is to see if we can use this to understand noise in vector fields. Vector field experts might answer this question I have: what are good decompositions of vector fields in coordinate-free scalar fields? I know about the Helmholtz decomposition which works nicely in 2D, giving a potential field and a vorticity field. But in 3D it gives a scalar field and a vector potential, which does not solve the problem. However, 2D might already be a starting point.

Next up, we’ll turn to more traditional visualization topics, and in particular about evaluation of visualizations. This one paper I have in mind might be one of the best Vis papers I’ve seen in the last five years, so stay tuned!

Liveblogging and the iPad

While I wait for a few loose ends to tie before I post my first entry in the series of VisWeek 2010 papers, let me ask you all this question: have you tried liveblogging with your iPad? I’ve done that in the past with my laptop, and I am confident I can type fast enough to keep track of most of what is going on. However, I know my typing on the iPad is terrible. I was really not planning on taking my laptop with me this year, since I can do pretty much everything else on my iPad, but I think this might be a real problem.

Have you tried something like this with your iPad in another conference? Did you train yourself with some typing app before going? I know at least a handful of you have iPads, so I’d love to hear any ideas. 🙂

VisWeek papers

(I might as well start calling this the ‘VisWeek’ blog or something like that if I were to consider the bulk of the posting material)

The program for VisWeek 2010 is now out, and there are more interesting papers this year than I ever remember seeing. In no particular order, here’s a list of papers I’ll try to find and read before VisWeek:

  • Marc Khoury, Rephael Wenger. On the Fractal Dimension of Isosurfaces
  • Min Chen, Heike Jänicke. An Information-theoretic Framework for Visualization
  • Lijie Xu, Teng-Yok Lee, Han-Wei Shen. An Information-Theoretic Framework for Flow Visualization
  • Paul Bendich, Herbert Edelsbrunner, Michael Kerber. Computing Robustness and Persistence for Images
  • Samuel Gerber, Peer-Timo Bremer, Valerio Pascucci, Ross Whitaker. Visual Exploration of High Dimensional Scalar Functions
  • Dirk J. Lehmann, Holger Theisel. Discontinuities in Continuous Scatterplots
  • Usman Alim, Torsten Möller, Laurent Condat. Gradient Estimation Revitalized
  • Marco Ament, Daniel Weiskopf, Hamish Carr. Direct Interval Volume Visualization
  • Ziyi Zheng, Wei Xu, Klaus Mueller. VDVR: Verifiable Visualization of Projection-Based Data
  • Waqas Javed, Bryan McDonnel, Niklas Elmqvist. Graphical Perception of Multiple Time Series
  • Stephan Diehl, Fabian Beck, Michael Burch. Uncovering Strengths and Weaknesses of Radial Visualizations-an Empirical Approach
  • Lars Grammel, Melanie Tory, Margaret-Anne Storey. How Information Visualization Novices Construct Visualizations
  • Hoi Ying Tsang, Melanie Tory, Colin Swindells. eSeeTrack-Visualizing Sequential Fixation Patterns
  • Rita Borgo, Karl Proctor, Min Chen, Heike Jänicke, Tavi Murray, Ian M. Thornton. Evaluating the Impact of Task Demands and Block Resolution on the Effectiveness of Pixel-based Visualization
  • Hadley Wickham, Dianne Cook, Heike Hofmann, Andreas Buja. Graphical Inference for Infovis
  • David Feng, Lester Kwock, Yueh Lee, Russell M. Taylor II. Matching Visual Saliency to Confidence in Plots of Uncertain Data
  • Nicholas Kong, Jeffrey Heer, Maneesh Agrawala. Perceptual Guidelines for Creating Rectangular Treemaps
  • Zhicheng Liu, John T. Stasko. Mental Models, Visual Reasoning and Interaction in Information Visualization: A Top-down Perspective
  • Caroline Ziemkiewicz, Robert Kosara. Laws of Attraction: From Perceived Forces to Conceptual Similarity
  • Aritra Dasgupta, Robert Kosara. Pargnostics: Screen-Space Metrics for Parallel Coordinates
  • Stefan Jänicke, Christian Heine, Marc Hellmuth, Peter F. Stadler, Gerik Scheuermann. Visualization of Graph Products
  • Stephen Ingram, Tamara Munzner, Veronika Irvine, Melanie Tory, Steven Bergner, Torsten Möller . DimStiller: Workflows for Dimensional Analysis and Reduction
  • Yu-Hsuan Chan, Carlos D. Correa, Kwan-Liu Ma. Flow-based Scatterplots for Sensitivity Analysis
  • Daniela Oelke, David Spretke, Andreas Stoffel, Daniel A. Keim. Visual Readability Analysis: How to Make Your Writings Easier to Read

I don’t know if all of these papers are available online, but when I find out about each of the papers as I read them, I’ll put a link up. Let me know if you think I missed anything tremendously exciting: I collected these from colleague suggestions, previous knowledge of the work, but mostly from the title and authors. And, obviously, this list is biased. As it goes, there might be many other lists like it, but this one is mine.

Your graphics puzzle for the day

Just the other day, I wrote a trivial geometry shader to tesselate cubic 2D Bezier curves on the fly, so we get nice curved edges for free. Remembering that the geometry shader sits between the vertex shader and the fragment shader, and its purpose is to create new geometry on the fly, the code looked like this (my apologies to the direct threedeese readers – I only speak OpenGL):

vec4 pp[4] = ... // control points
for (int i=0; i<=10; ++i) {
  float u = i / 10.0f // interpolation control
  float omu = 1.0f-u;
  vec4 spline_p = pp[0] * omu * omu * omu +
                  pp[1] * omu * omu   * u +
                  pp[2] * omu   * u   * u +
                  pp[3] * u   * u   * u;
  EmitVertex();
}

This worked nicely, generated really good looking splines, and was fast. Then I wanted to backport this to the CPU in case the user doesn’t have geometry shaders. We could be using OpenGL evaluators, but it seemed simple enough, right? Like this:

float pp[4][2] = // control points
for (int i=0; i<=10; ++i) {
  float u = i / 10.0f // interpolation control
  float omu = 1.0f-u;
  float spline_p[2] = {
    pp[0][0] * omu * omu * omu +
    pp[1][0] * omu * omu * u +
    pp[2][0] * omu * u   * u +
    pp[3][0] * u   * u   * u,
    pp[0][1] * omu * omu * omu +
    pp[1][1] * omu * omu * u +
    pp[2][1] * omu * u   * u +
    pp[3][1] * u   * u   * u}
  glVertex2fv(spline_p);
}

No matter what I tried, this wouldn’t work – the lines come out garbled. The endpoints are in the right places, but the interior vertices tend to bend towards a common point. The bug is fairly obvious: the second and third cubic Bernstein polynomials are wrong (I was missing a factor of 3 there).

Now for the puzzle: why did the lines look right on the geometry shader?

The Euler characteristic of product spaces

The Euler characteristic is a nifty function that pops up all over the place in math and geometry, and has all kinds of neat properties. It gives any topological space (think triangle meshes and simplicial complexes, their n-dimensional friends) an integer, and, more importantly, it is a topological invariant: if two spaces can be bent into one another, then they will have the same Euler characteristic. The traditional formula for the Euler characteristic is an alternating sum of simplices, so for a triangle mesh M, \chi(M) = V - E + F. If you have tetrahedra, pentatopes, etc. in your mesh, they would just keep alternating signs. The Euler characteristic kind of looks like the cardinality of a set. First, it obeys the straightforward property that \chi(A \cup B) = \chi(A) + \chi(B) - \chi(A \cap B) (keeping the intuition of triangle meshes: simply count up the simplices).

Amazingly, it also shares the following property of set cardinality: \chi(A \times B) = \chi(A) \chi(B). This formula makes sense for finite sets (using the cartesian product), but is sort of crazy for more general spaces because it’s easy to come up with spaces of negative Euler characteristic. If you dig up the proofs for this, however, they tend to be a lot more complicated and give very little intuition. So here’s a simple proof that works for cw-complexes. CW complexes are sort of like simplicial complexes, but we let things like “4-sided triangles” exist. We define the “d-simplices” (they’re actually called d-cells) by saying that 1) points are just like 0-d simplices and 2) n-d cells are bent “hypercubes of n dimensions” that must have a union of (n-1)-d cells as their boundaries.

For CW complexes, the Euler characteristic is computed by an alternating sum of d-cells in the same way as for simplicial complexes. No big surprise, since simplicial complexes are CW complexes. In addition, products of CW complexes are much easier to think about than products of simplicial complexes. For example, consider two spaces A and B each consisting of a single edge and two vertices. The product AxB should be, intuitively, a “square” with 4 vertices, 4 edges, and an internal face. A simplicial complex can only have 2-dimensional triangles, and so the calculation becomes a little more complicated.

Now let’s raise the abstraction level just a little bit. Instead of thinking about the Euler characteristic directly as an alternating sum, we will think of it as an evaluation of a specially crafted polynomial computed from the space. This polynomial is really pretty idiotic: for the case of the spaces A and B above, the polynomial will be P(A) = P(B) = 2 + 1.d, that is, the degree of the monomial represents the dimension, and the coefficient represents the number of cells of that dimension. The Euler characteristic, then, is found by simply evaluating this polynomial at value d=-1.

Now for the cool part. Look at the polynomial P(A \times B) = 4 + 4d + d^2. If you think about how to create the product space, you will see that you get one (a+b)-dimensional cell for every a-dimensional cell in A and every b-dimensional cell in B. The simplest way to express this is as a product between the two polynomials! So P(A \times B) = P(A) P(B), and since I hopefully convinced you that the Euler characteristic is obviously just an evaluation of P, it is “obvious” that \chi(A \times B) = \chi(A)\chi(B). So the “magic” comes from the rule used to create the product space. Really neat.

(Props to Jacob for giving me the original idea for this)

2-adic integers are awesome

Over at A Neighborhood of Infinity, Dan Piponi has a post about a little bit of magic that happens when you ask gcc to do a division by 7.

This would have nothing to do with visualization aside from the realization I had in the shower, just the other day, that I believe the right way to pick good numbers for an axis scale is to think about the problem in the p-adic sense. “Small”, in this case, means pretty much exactly “close to a round number”. The usual descriptions of the algorithms (at least as I remember them!) for coming up with a good set of axis markings are pretty nasty; maybe p-adic numbers can help.