TDDE 131 Week 3: Data Visualization

[Critique of last week’s assignments]
[Data Visualization]
[Data Visualization Lab]

Pre-class reading:

6-7:15: Critique of Assignments

Adrian & Natasha:

  • continuous time record of one dog starting to move (binary data) and “deviation from synchronicity (binary data), somewhat obscuring the mapping of the motion of the other dog
  • dataset was “limited” to match the 30 data point benchmark – keep in mind that these are benchmarks, not strict numbers
  • it took some time to define what synchronicity meant – a different action (e.g., no movement by both was still synchronous

Duy & Eliott:

  • counted the number of “blips” (magnetic tears) in each 5 second period and the times at which one dog’s penis raised, as a visual cue as to when the dog was breathing
  • these are incompatible datasets – can’t simultaneously map without losing information – but in fact they are optimally suited to what was being measured
  • timing data made the regularity of breathing quite clear
  • the use of penis movement was an interesting choice in using a body motion to detect something not visible (breath)
  • bursts in tear rate – was this correlated with camera movement? turns out that white regions in film undergo magnetic tearing more often (Tara discussed all sorts of things about magnetic tape, heating for archival, etc.)

Melisa & Ryan

  • determined times of eye blinks from either dog (L or R) and ear flops (L or R) – measure of attentiveness
  • in timing data, there were duplicate events in a given time period (grouped in 1-second steps)
    • this was difficult from the perspective of sequencing
    • in truth film is encoded hour:minute:second:frame where frame can run up to 60 for very high cadence filming
    • good discussion on imprecision of measurement
    • part of this is a limitation in how Google docs encodes time data – an example of the tool shaping the technique
  • some of the physicists were uncomfortable with “words” (L dog, R dog) as measurement; convert to 1 and 2 and there is no problem, even L and R; an interesting observation in the nature and symbolic perception of data

Adriana & Aisha

  • measured 3 variables: distance between noses (ruler on screen), height of right dog (ruler on screen) and # of head turns as encoded by the time at which a head turn occurred
  • descriptions of variables made clear distinction of perspective: “by the dog on our left”
  • measurement units brought up considerable discussion as #1 was measured in cm and #2 in inches, and both in decimal format
  • Michael had a problem with inches in decimal as it is always quoted with fractions – a different numerical encoding for two measures of length (based 10 vs base 16)
  • uncertainties also included – estimated from uncertainty as to where to measure nose, where top of dog was, etc
  • some discussion on how the size of screen would change these uncertainties
  • counting uncertainties harder to estimate (would we have missed this?)
  • spreadsheet was also color-coded – an aesthetic/clarifying choice, although there was some critique on how colors should be placed (e.g., adding white space between variables)
  • Adam felt the need to correct an error in the time encoding

7:30-8:15: Data Visualization (Tara & Adam)

Gestalt theory of visualization

our experience of the world is neither unbroken continuum, NOT disorderly blot

in the act of percieving we organize our experience: similarity, proximity, continuity, and closure we look for patterns

Classic Gestalt image of duck / rabbit we can shift our perception to see either duck or rabbit, or both, or neither!

NOTHING has changed, other than our internal direction



Why data visualization?

See notes by Noah Iliinsky:

Tabular data is useful, but visualized data is more intuitive, allows us to discern trends, outliers & gaps – part of our pattern recognition system (for finding food and dodging tigers)

Screen shot 2013-04-19 at 1.43.32 PMScreen shot 2013-04-19 at 1.43.41 PM

Data visualization is different from infographics – the former is presenting data in its purest form, the latter generally adds additional graphical elements for aesthetic purposes (note: data visualization can still be highly aesthetic).

Data visualizations can be used to both inform and misinform (propagandize)


Cultural underpinnings

Acts of perception are culturally informed

  • we view from left to right
  • we look for information in certain places
  • we are triggered by certain colors (blue/pink in gender, red/blue in politics, orange/green if you’re Irish!)

Vietnam memorial is a good example of a data visualization project that has cultural consideration – how do we organize so many names?  Alphabetically, like a phonebook? There are 600 Smiths, 16 people named James Jones – this destroys the uniqueness of each loss.

Order chosen to be in time – when people died, what cohort they died with – time mapped in space (time series!)

EXPERIENCE of this data is very personal: each person has a location, a name that can be tactilely experienced; stepping back we gain the totality of the event (close-far dichotomy); choice of polished dark granite enforces gravity and encourages self reflection (literally and metaphorically)



Minard’s 1869 visualization of Napolean’s failed invasion of Moscow 1812-1813:

Starting from left we see the size of army as Napolean invaded Russia from Poland; brown indicates march in, black indicates retreat.  What is encoded?

  • Size of the army (width of band and numbers)
  • location in 2D space (map)
  • Direction of army movements (split groups, intersections)
  • Temperature during retreat (bottom axis, tagged to positions)
  • Locations of major geographical features and borders

The inclusion of temperature provides the visualization of causation, one of the hardest things to determine and measure;  each drop off in size of army is related to temperature.  Indeed, the attack on Moscow itself led to essentially no casualties!  We also see where major problems occurred (e.g. crossing Berezina, 44% of army is lost).  The use of size and color allows us to feel the full impact of this disaster – only 1 of every 40 soldiers returned.

This figures enhances the explanatory power of time series data is by adding a spatial dimension to the design; data are moving over time and space. This is a rich, coherent story made from and with numerical data.


Obfuscating data with hierarchy 

Powerpoint is a common tool for communication, and its very nature can lead to problems of hierarchy. Example is Columbia disaster, where foam piece damaged tiles of shuttle, ultimately leading to its disintegration upon reentry.

Screen shot 2013-04-19 at 2.57.13 PMBoeing tested impact of foam hitting tile, but on a very limited scale: 640x smaller volume than actual impactor – test was invalid

This is obscured in report – last line on this slide, third-level indentation – lower level bullets actually contained the most important information, whereas the higher-level, highlighted bullets were optimisitic

Visually, this information is lost

Langauge – use of “significant” minimizes danger; minimization of real events (“it” = damage to left wing)

Boeing recommended safe return, NASA went with it, and everyone died

Information architecture can mimic the hierarchical architecture of a large bureauacracy

Powerpoint in particular demands brevity, heavy use of acronyms, forces hierarchical structure

bullet point format is a poor substitute for analysis and technical (narrative) reporting

This FORM is not equipped to present and discuss the complexities of this particular engineering problem


Playing with perception

Screen shot 2013-04-19 at 3.14.47 PM

Tara’s favorite game Closure:

You can  run, jump, and hold a light only, and only the things that are lit up exist

This plays with the idea of object permanence, which is fundamental to the way we perceive and interact with the world

Like a child’s game of peek-a-boo

This is a fun way of rethinking, stimulating, reframing HOW we perceive information.


Practical choices for data visualization

Noah Iliinsky’s Visual Encodings:

Position is everything!

Screen shot 2013-04-19 at 3.20.07 PMcan encode spatial and temporal relationships, proximity, relations, trajectories, etc.

axes need not be directly mappable to space – tube map of London Underground does not map directly onto locations in the city, but approximate directionality, distance is preserved




Color is problematic!

Screen shot 2013-04-19 at 3.22.02 PMWhat is higher, red or purple?  Are they opposite?

How warm exactly is green?

You can guide the eye with a color key, but in many cases the color steps are not linear (see last few shades of red at left)

Shading (light to dark) is more effective at indicating relative ordering (ordinate data) than color, but still not precise

We have very subjective meanings to some color maps (politics, gender, etc)


SketchUp – free 3D visualization tool:

8:30-8:50:  Data Visualization Lab

Take data from assignment last week and create a visualization of the entire data set.


ASSIGNMENT for next week:

You are going to apply what you learned about data visualization and looking at relationships by taking actual astronomical data (of the spectral variety) and producing two pieces (1) a graphical visualization of the data (include all three columns!) and (2) a non-graphical, non-pictorial representation of the data (can be a performance, a “song”, even cake. mmmm, cake).

The data can be obtained from this link:

Since we didn’t get to it in class, you can learn a little bit about what a stellar spectrum is from this link:  and from the video you were supposed to watch for this week (or better yet, ask your Physics partner).

You can learn more about this source (a T8-type brown dwarf Adam discovered when he was a graduate student) at or in this paper:


%d bloggers like this: