Genetic Cage Scores

As we are preparing possible project ideas for our class next Spring, I had a chance to explore a simple Cage score (after John Cage) using as a driver not chance but genetic code.  Gene sequencing has exploded in the past few decades thanks to advancements in technologies and techniques, creating an entire industry (with occasionally dubious aims).  The pace of this advancement is remarkable:  the Human Genome Project took 13 years to construct a full sequence of the human genetic code from 1990-2003, at a cost of $3 billion; the same sequencing can now be done in 1 day for $1000.

The genetic code is simply that – a code; a sequence of 4 different bases (A, C, G and T, corresponding to the molecules adenine, cytosine, guanine and thymine) matched in pairs, all strung together along the double helix of the DNA molecule.  Among many functions, these bases define what kinds of proteins cells should make, with “words” of three bases (a codon) matching to a specific amino acid (the building blocks of proteins).  A long string of codons (a “gene“) corresponds to long strings of amino acids, which when connected together in the cell makes a protein critical for our body to function- hemoglobin, collagen, and insulin are examples of these kind of proteins. The human genetic code contains something like 3 billion base pairs (“letters”), or 1 billion codons (“words”) organized into 20,000 genes (“paragraphs”) on (usually) 23 chromosomes (“volumes”).  Stretched out, the DNA in one cell would be about 2 meters long; yet crumbled up it fits in a cell’s nucleus, 5 millionths of a meter in diameter.

Because genetic code is, at its heart, a sequence of instructions, it makes a particularly straightforward starting point for an artistic or performance score.  For this exercise, I decided to do a simple line drawing, using the 3-base codon as my directional “unit”.  Specifically, I made the following decisions based on the letters in a codon:

  • Letter 1: direction of the line: A = up and right, C = down and right, T = down and left, G = up and left (A-T and C-G were chosen to align as these are the base-matched pairs in the DNA molecule)
  • Letter 2: length of line: A = 1, C = 2, T = 3, G = 4
  • Letter 3: color of line: A = black, C = red, T = green, G = blue

Genetic sequences are pretty straightforward to find (just google “genetic sequence of caffeine” and see what comes up!), but my favorite repository is the Registry of Standard Biological Parts hosted at MIT.  The registry maintains a sizeable library of gene sequences (called “BioBricks“) created by various high school and college teams that participate in contests to create biological machines, organized by the  International Genetically Engineered Machine (iGEM) Foundation (this is a really cool and slightly scary idea – check one of the outcomes, the coliroid). A snapshot of what a BioBrick looks like is shown below (the sequence is obtained on the website by clicking on “Get Selected Sequence”).

Screen shot 2013-02-14 at 11.34.04 PM

I wrote a short program that then read in a sequence, parsed out groups of three bases into steps, and applied the rules listed above.  Here are the results from three examples:

Alcohol Acetyltransferase I; aka, the stuff that makes that banana smell (BioBrick by Andre Green II): this is a chemistry favorite – my dad used to make this stuff in high school as a prank (those crazy chemistry students!).  A nice short sequence, only 1581 base pairs (527 codons), makes a fairly compact little swirl with a tail:

Screen shot 2013-02-14 at 9.59.43 PM

The enzymes that make caffeine (BioBrick by Dennis Hell): oh how we love our caffeine, and this group has figured out a sequence that can generate the three enzymes that makes this stuff (put it into a yeast cell and viola, Starbucks is out of business!).   This score makes a couple of sharp turns when it hits each of the enzymes, each of which has its own degree of “crinkliness”.

Screen shot 2013-02-14 at 9.55.04 PM

First 100,000 codons of Human Chromosome 22 (sequence from the Ensembl website; all of the human chromosomes are available at this ftp site): I choose this chromosome because it is one of the smaller ones; still, at 51,304,566 base pairs, I couldn’t process the whole thing in a reasonable amount of time.  With a longer sequence we start to lose some of the detail in the twists, turns and colors, but there are still some interesting variations.  Note for example the long, red, up-right sequence (corresponding to AGCAGCAGC….) about two-thirds in; this may be inter-gene “junk” DNA that doesn’t actually make a protein (I’m just guessing here, though).

Screen shot 2013-02-14 at 9.59.06 PM

This is just one application of a genetic score; you can imagine a complementary set of rules for movement, such as:

  • Letter 1: direction to move (A = forward, C = left, T = backward, G = right)
  • Letter 2: size of movement (A = none, C = small, T = normal, G = large)
  • Letter 3: orientation after movement (A = same, C = turn left, T = turn around, G= turn right)

Or even a strange seated, vocalized “dance”:

  • Letter 1: raise a limb (A = left hand, C = left foot, T = right hand, G = right foot)
  • Letter 2: move the head (A = up, C = left, T = down, G = right)
  • Letter 3: make a sound (A = “ah”, C = “coo”, T = “tata”, G= “grr”)

Vocal scores, musical scores, percussion scores, facial expression scores, etc., are all directions this could be taken in.

And you might imagine the process could be reversed.  For example, one might analyze the choreography of Merce Cunningham’s 1943 “In the Name of the Holocaust” (with music by… John Cage – the two were romantic partners), infer a genetic sequence through a set of 3-letter rules, and see what kind of biological machine comes of it (on second thought, perhaps best to choose a more uplifting piece, otherwise we might create a plague).


%d bloggers like this: