Sunday, November 24, 2013

Lex's Evolutionary Biology Jargon Page

Like every scientific field, evolutionary biology has it’s own special thicket of jargon.  While reading papers I am occasionally stymied by this jargon.  It seemed the only way to improve the situation was to either commit the terms to memory or to have a handy look-up guide.  Given the state of my memory, I’ve gone with the latter.  Below is a list of some common jargon in evolutionary biology.  This list is a reflection of the sub-fields that interest me, and as such, it is incomplete.  It is also a work in progress.  I'll keep updating it as I get time.  I’ve also left out terms that I consider at least partially scrutable, for example “gene flow”.  Instead I’ve focused on terminology that I consider inaccessible, using myself as the standard by which this is judged.  (well not quite just myself, if you have any jargon suggestions send them my way and I'll add them to the list at the bottom of this entry.  Likewise, if I messed something up please point it out to me.)  

Spandrels of evolution:
Artsy Spandrels
This term refers to an analogy put forward by Gould & Lewontin in their now classic 1979 Proc Royal Acad Sci: B paper entitled The spandrels of San Marco and the panglossian paradigm: a critique of the adaptionist programme.  Despite the opaque title people read it.  The message of the paper is conveyed by analogy to an architectural feature known as a spandrel.   In architecture, a spandrel is a fitting that allows a curved shape, such as a dome or arch, to be connected to a rectangular structure.  Imagine setting a round dome on top of a square building, there are voids that need to be filled in to connect them or else things are going to be a little drafty.  Spandrels are the things that fill in these voids.  Over the millennia spandrels became highly ornate, often hosting paintings, statues, or carvings.  This, however, was secondary to their initial purpose, as first and foremost spandrels are a necessary work around to a geometric constraint.  Much in the way spandrels did not originate as sites for art, Gould & Lewontin imagined that many adaptive traits started as a work-around to some earlier constraint.   Sometime later the work-around became available for some new utility, in the same way that spandrels were eventually seen as convenient places to install art.   The “spandrel” message is two-fold.  First, some adaptations may have originated as the by-product of an earlier constraint, and second, because of this some adaptive traits may have an origin that is completely unrelated to their current utility.  In the spandrels paper, Gould & Lewontin use these points as a pretty stern critique of the “adaptionist programme”, that is researchers who assert that a trait arose more-or-less de novo as solution to the problem it currently solves.

A good example of a spandrel are the “horns” found on some dung beetles.  Armin Moczek and colleagues have shown that the horns of adult dung beetles are developmentally linked to an outgrowth that occurs on the head of the pupal dung beetles.  In the pupae of both horned and hornless dung beetles this outgrowth helps crack through the hard pupal head-case, allowing the adult beetle to emerge.  In hornless beetles this outgrowth is later reabsorbed, however in horned species it continues to grow.  At maturity these horns become weapons used in the fight for mates.  Thus we could say that the horns of adult beetles are spandrels born out of the beetle’s preexisting need to emerge from their pupal case. (read more about this cool story)


Dobzhansky-Muller Model, or Bateson-Dobzhansky-Muller Model, or DM-, or BDM-hybrid incompatibility

Some species, even some very closely related species, cannot form viable offspring when crossed.  One way this can happen is through harmful interactions between the parental genes found in the offspring.  A more technical terminology for this situation would be to call it a negative epistatic interaction between these parental genomes.  That is, Mom’s gene A and Dad’s gene B don’t play well together and the poor offspring is unfit or sterile.  This, in a nutshell, is the BDM-model of hybrid incompatibility. 

Interest in this model stems from the facts that it is commonly observed in nature and describes a plausible genetic route to speciation.  If we were going to “design” a speciation event, one of the first things we would want to do is cease gene flow between the two nascent species.   One way to do this is to make all of the offspring from unwanted matings either sterile or too unhealthy to make it to reproductive age.  That is, we’d want negative epistatic interactions to consistently crop up in the hybrids.  Or using the jargon, we’d want BDM-hybrid incompatibilities to genetically reinforce our attempt to cause speciation. 
  
Selective Sweeps, Soft Sweeps, Genetic Hitchhiking, Background Selection and the Hill-Robertson Effect

Wow, that’s a lot of jargon for one entry.  Good news, all of these ideas are linked by fairly simple underlying concepts.  If you can grasp the concepts, then you’ll see that all of this jargon is just special situations or consequences that emerge from these concepts. 

So let's start with the concepts.  First let's assert some fairly general rules that apply to many species.  1) Species have finite population sizes.  2) Species have fewer chromosomes than genes, so many genes must be physically linked on the same chromosome.  3) Different alleles at the same gene can have different fitness outcomes.  And 4) recombination can swap the physical links between genes creating new linked combinations.

All right, that’s a fairly manageable situation.  Let's now apply those rules to a very simple diploid species with one chromosome that has two linked genes, A and B. Got it?  Ok, let's start with the simple case where initially every individual in our species has the genotype ab for genes A and B, respectively.  Next let's imagine that a mutation occurs on one chromosome in one individual, and it produces a mutant form of a which we’ll call a’.  So now we have two genotypes to worry about, ab and a’b.  We’ll say that this new a’ is no more or less favored by selection than the original a allele, so we could call it a neutral mutation with respect to selection.  These ab and a’b genotypes drift (see the entry on drift for more detail) around in frequency for a while, until suddenly, one individual with the a’b genotype gets a mutation in b, we’ll call b’. This new mutant is really superior to b and individuals that get it tend to produce more and healthier offspring.  So now we have three genotypes ab, a’b, and a’b’.  Initially a’b’ is rare, but, because b’ improves fitness it is steadily becoming more common.   Eventually, in the absence of recombination between genotypes, we’re likely to end up with only on a’b’ chromosomes.  The moment b’ arose it started increasing in frequency, little by little replacing b, and in so doing it drug the a’ allele with it, replacing neutral diversity (the a allele) in the linked gene A.  A side effect of the strong advantage of b’, is that genetic diversity is being swept away, directly at gene B, but also at the linked gene A.  Using the jargon, we’d say the b’ allele is causing a selective sweep, or more specifically some might call this a hard selective sweep or hard sweep because it is being caused by the brand new b’ mutant.  So what about the lucky a’ allele?  We’d call it a genetic hitchhiker.  Remember, it was no better or worse than a, but it just happened to hook up with b’ and get swept along.

Ok, now let's replay the scenario, but this time, when b’ arises it too is neutral.  So we initially have ab, a’b, and a’b’.  They hang around for a while, and eventually in an individual recombination happens between ab and a’b’ that creates the forth possible genotype, ab’.  Now our 4 genotypes drift along again for some time, and then suddenly the environment changes dramatically.  Let’s say a meteor hits the earth.  As a consequence of this sudden change, b’, which had been neutral, suddenly becomes strongly favored.  This starts off another sweep, only this time b’ has managed to hook up with a and a’, so after the sweep were are left with ab’ and a’b’ genotypes (note both alleles still exist for gene A, but not for gene B).  This is a soft selective sweep.  It differs from the hard selective sweep in that the mutant b’ allele wasn’t preferred right away so it managed to associate with the neutral genetic diversity found at the gene A, not creating a single hitchhiker, but instead multiple hitchhikers, in effect preserving diversity in gene A.  Basically the difference between a hard and soft sweep is the timing of when the selection kicks and how varied and diverse the hitchhikers are (real hitchhikers are always varied and diverse).    

So that’s sweeps and hitchhiking.  They involve beneficial mutations, but what about detrimental mutations?  It’s the same idea, just backward.  Let’s just change the sign of the selection coefficient on b’ and say that it is deleterious.  This will tend to cause selection to remove b’ from the population, and in so doing take any linked alleles with it unless they can escape by recombining to a b allele background.   This type of erosion of genetic diversity is called background selection, it’s sort of the opposite of a selective sweep.   There’s a key distinction though.  In a selective sweep an allele goes from a low frequency, to a high one, so it’s going to have a massive increase in it’s frequency, and as a consequence it can knock out a lot of diversity.  On the other hand, under background selection, we usual imagine that the deleterious allele never makes it to a very high frequency, so it’s trip is from a low frequency to a frequency of zero (it goes extinct).  This is a much shorter journey, so this little blip of a bad allele doesn’t tend to knock out much diversity, because it was never linked to much in the first place.  The only places where background selection can really make a dent are areas with really low recombination and fairly high rates of deleterious mutation.  Here you can imagine waves and waves of background selection constantly removing variation, leaving this region with fairly low genetic diversity.

So, neutral genetic diversity can be removed from the population by not being linked to the good allele, or by being linked to a bad allele.  In both cases alleles at the gene under selection (gene B) are rapidly changing in frequency, and the only way to save linked diversity is for recombination to create new linkages.  Now imagine that from our ancestral ab genotype, an a’b and an ab’ genotype both arise, and the a’ and b’ alleles are both superior to the ancestral a and b alleles.  The unfortunate thing is that they are unlinked, because what your really want is the super-fit a’b’ genotype to emerge.  Without recombination you have to wait for a lucky mutation to either turn a’b to a’b’ or ab’ to a’b’, which may take a long time, or indeed never happen.  The key insight here is that recombination can play a role speeding this up.  With recombination, you can have an exchange between a’b and ab’ to create ab and the sought-after a’b’. This increase in the rate of evolution provided by recombination is called the Hill-Robertson effect, and some believe this may be the reason that recombination evolved in the first place. 

Linkage Disequilibrium
(to my mind this piece of jargon “would be the first against the wall when the revolution comes”, to quote the great Douglas Adams):

Linkage disequilibrium or LD is a complicated sounding bit of jargon that describes a simple concept.  If you understand correlation, then you get LD.  Sadly because the term is so awkward, you would have never guessed it was something you were already familiar with.

The concept LD aims to capture is the amount of nonrandom association between alleles at different loci.  It is usually studied within or between populations of a single species, and as such is a fundamental component of population genetics.  In the simplest case, imagine two loci, where both loci have two alleles each, we’ll call them A and a at locus #1 and B and b at locus #2.  If we sampled a haplotype from a population with these four alleles there are four combinations of the alleles we could find, namely AB, Ab, aB, and ab.  If these loci are in strong LD we’d expect to largely observe only 2 of the 4 combinations, perhaps AB and ab, though these symbols are arbitrary, so it could just as well be Ab and aB.  On the other hand, if LD is very low or absent, the latter being termed linkage equilibrium, we might expect to find all four allelic combinations in proportions corresponding to the products of their allele frequencies.  So what does that mean exactly??  Let's play a game.  I’ll give you a million dollars if you can guess the alleles at one gene by only knowing the alleles at another gene.  I get to pick one random individual, and we’ll sequence her genome.  I know all her alleles at all genes.  I’ll give you the alleles for any gene of your choice in the genome, and you have to guess the alleles at any other gene in the genome.  It’s up to you to choose those two genes.  How would you go about choosing the best pair of genes so that you can win the big prize?  The best way would be to choose two genes with very high LD between them, because knowing the alleles at one gene in strong LD with another gene basically tells you what that other gene will be without even looking.  Often genes with high LD are physically linked, however no requirement that genes in high LD be linked, and in fact, there are a few cases of LD between genes residing on different chromosomes.

There are several statistics used to quantify the amount of LD.  Two of the most popular, r2 and D’, yield values on a scale from 0 to 1, with 0 being complete linkage equilibrium, 1 being complete linkage disequilibrium, and intermediate values indicating partial LD.  As the notation suggests, r2 is derived from the statistical R2 (i.e. coefficient of determination), which is the square of the correlation coefficient.  So if you get correlation, then it’s simple to extend that understanding LD.

Maybe not jargon (or perhaps better titled: some of my pet peeves):

Genetic Map:
This is a characterization of a genome using genetic markers.  Various approaches are used to estimate the linear order of these markers along the chromosomes.  Often the markers are laid out in the order of their estimated genetic linkage, which is related to the fraction of meiosis that yield a recombination between the markers.  For this reason the distances on the resulting map are a function of the recombination rate between various points along the chromosome, and may not accurately represent the real physical distances between markers in base pairs.   Also, even the most complex genetic maps may contain only a few thousand markers, whereas most genomes contain millions or billions of nucleotides, so every genetic map is an incomplete characterization of the genome.

Genome Sequence:
This is a representation of the genome created by identifying the DNA nucleotide sequence at a single base pair resolution.  In a genome sequence the distance between loci is their physical distance, in base pairs.  To date, very few genome sequences are truly complete; they are often missing hard to deal with regions, like areas where there are lots of repeats.  That said, even a sparsely completed genome sequence will contain far more loci than a genetic map (which isn’t a statement about the utility of genetic maps, they have myriad uses).

Lex’s Map vs. Sequence Rant:
Often in the popular press you’ll find the terminology of genetic mapping and genome sequencing jumbled together.  For example the headline might read, “Scientists have begun sequencing the genome of species X”.  Then in the article you’ll find a nugget of wisdom like, “scientists believe that one of the best ways to fully unlock the potential of species X is to map its DNA”.  To a geneticist this use of the term “map” is quite confusing because it conjures the idea of a genetic map.  Instead, I think it’s intended as a crude way of equating a genome sequence with a traditional map, like a road map (remember those things that used to live in your glove box before we got iPhones).  They do both tell you where things are in relation to one another, but unfortunately this use of the word “map” muddles important concepts, and moreover, the road map analogy is a stretch at best.  A genome sequence would be like a road map that contains the exact order of every molecule of the road!  

Stuff for a later date!
genomic conflict or intragenomic conflict
Punq. Eq.
pre-adaptation and exaptation
genetic lesion / point mutation
Genetic accommodation 
Trivers-Willard
long-branch attraction
selfish gene
QTL
Wright's shifting balance theory
Hopeful monsters
genomic shock
neutral theory
inbreeding depression/heterosis/hybrid vigor
Mendelian trait
Quantitative trait
Hardy-Weinberg
Lamarkism
Lysenkoism
Dollo's law
Horizontal transfer
drift
balancing, purifying (negative), directional, disruptive, artificial, and stabilizing selection
mutation-selection balance
molecular clock
transition/transversion
handicap principle
effective population size
G X E
Batesian Mimicry
Mullerian Mimicry
Red Queen Hypothesis
Haldane's Rule
Bergmann's Rule
Baker's law
Fisher-Wright Model  
Lek Paradox
Haldane's Dilema
Haldane’s Sieve
2-fold cost of sex
Price Equation

1 comment:

  1. Add 'quasi-linkage equilibrium' to the list! http://en.wikipedia.org/wiki/Quasi-linkage_equilibrium

    ReplyDelete