Like every scientific field,
evolutionary biology has it’s own special thicket of jargon. While reading papers I am occasionally stymied
by this jargon. It seemed the only way
to improve the situation was to either commit the terms to memory or to have a
handy look-up guide. Given the state of
my memory, I’ve gone with the latter.
Below is a list of some common jargon in evolutionary biology. This list is a reflection of the sub-fields that
interest me, and as such, it is incomplete. It is also a work in progress. I'll keep updating it as I get time. I’ve also left out terms that I consider at least partially scrutable,
for example “gene flow”. Instead I’ve
focused on terminology that I consider inaccessible, using myself as the
standard by which this is judged. (well not quite just myself, if you have any jargon suggestions send them my way and I'll add them to the list at the bottom of this entry. Likewise, if I messed something up please point it out to me.)
Spandrels of evolution:
Artsy Spandrels |
A good example of a spandrel are the “horns” found on some dung
beetles. Armin Moczek and colleagues
have shown that the horns of adult dung beetles are developmentally linked to
an outgrowth that occurs on the head of the pupal dung beetles. In the pupae of both horned and hornless dung
beetles this outgrowth helps crack through the hard pupal head-case, allowing
the adult beetle to emerge. In hornless
beetles this outgrowth is later reabsorbed, however in horned species it
continues to grow. At maturity these
horns become weapons used in the fight for mates. Thus we could say that the horns of adult
beetles are spandrels born out of the beetle’s preexisting need to emerge from
their pupal case. (read more about this cool story)
Dobzhansky-Muller Model, or Bateson-Dobzhansky-Muller Model, or DM-, or BDM-hybrid
incompatibility
Some species, even
some very closely related species, cannot form viable offspring when crossed. One way this can happen is through harmful
interactions between the parental genes found in the offspring. A more technical terminology for this
situation would be to call it a negative
epistatic interaction between these parental genomes. That is, Mom’s gene A and Dad’s gene B don’t
play well together and the poor offspring is unfit or sterile. This, in a nutshell, is the BDM-model of
hybrid incompatibility.
Interest in this
model stems from the facts that it is commonly observed in nature and describes
a plausible genetic route to speciation.
If we were going to “design” a speciation event, one of the first things
we would want to do is cease gene flow between the two nascent species. One
way to do this is to make all of the offspring from unwanted matings either
sterile or too unhealthy to make it to reproductive age. That is, we’d want negative epistatic
interactions to consistently crop up in the hybrids. Or using the jargon, we’d want BDM-hybrid incompatibilities
to genetically reinforce our attempt to cause speciation.
Selective Sweeps, Soft Sweeps, Genetic Hitchhiking,
Background Selection and the Hill-Robertson Effect
Wow, that’s a lot of jargon for one entry. Good news, all of these ideas are linked by
fairly simple underlying concepts. If
you can grasp the concepts, then you’ll see that all of this jargon is just
special situations or consequences that emerge from these concepts.
So let's start with the concepts. First let's assert some fairly general rules
that apply to many species. 1) Species
have finite population sizes. 2) Species
have fewer chromosomes than genes, so many genes must be physically linked on
the same chromosome. 3) Different
alleles at the same gene can have different fitness outcomes. And 4) recombination can swap the physical
links between genes creating new linked combinations.
All right, that’s a fairly manageable situation. Let's now apply those rules to a very simple diploid
species with one chromosome that has two linked genes, A and B. Got it? Ok, let's start with the simple case where
initially every individual in our species has the genotype ab for genes A and B, respectively.
Next let's imagine that a mutation occurs on one chromosome in one
individual, and it produces a mutant form of a which we’ll call a’. So now we have two genotypes to worry about, ab and a’b. We’ll say that this new
a’ is no more or less favored by selection than the original a allele, so we could call it a neutral
mutation with respect to selection. These
ab and a’b genotypes drift (see the entry on drift for more detail) around
in frequency for a while, until suddenly, one individual with the a’b genotype gets a mutation in b, we’ll call b’. This new mutant is
really superior to b and individuals
that get it tend to produce more and healthier offspring. So now we have three genotypes ab, a’b,
and a’b’. Initially a’b’
is rare, but, because b’ improves
fitness it is steadily becoming more common.
Eventually, in the absence of recombination between genotypes, we’re likely
to end up with only on a’b’
chromosomes. The moment b’ arose it started increasing in
frequency, little by little replacing b,
and in so doing it drug the a’ allele
with it, replacing neutral diversity (the a
allele) in the linked gene A. A side
effect of the strong advantage of b’,
is that genetic diversity is being swept away, directly at gene B, but also at
the linked gene A. Using the jargon,
we’d say the b’ allele is causing a selective sweep, or more specifically
some might call this a hard selective
sweep or hard sweep because it is
being caused by the brand new b’
mutant. So what about the lucky a’ allele? We’d call it a genetic hitchhiker. Remember, it was no better or worse than a, but it just happened to hook up with b’ and get swept along.
Ok, now let's replay the scenario, but this time, when b’ arises it too is neutral. So we initially have ab, a’b, and a’b’.
They hang around for a while, and eventually in an individual
recombination happens between ab and a’b’ that creates the forth possible
genotype, ab’. Now our 4 genotypes drift along again for
some time, and then suddenly the environment changes dramatically. Let’s say a meteor hits the earth. As a consequence of this sudden change, b’, which had been neutral, suddenly
becomes strongly favored. This starts
off another sweep, only this time b’ has
managed to hook up with a and a’, so after the sweep were are left with
ab’ and a’b’ genotypes (note both alleles still exist for gene A, but not
for gene B). This is a soft selective sweep. It differs from the hard selective sweep in that the mutant b’ allele wasn’t preferred right away so it managed to associate
with the neutral genetic diversity found at the gene A, not creating a single
hitchhiker, but instead multiple hitchhikers, in effect preserving diversity in
gene A. Basically the difference between
a hard and soft sweep is the timing of when the selection kicks and how varied
and diverse the hitchhikers are (real hitchhikers are always varied and
diverse).
So that’s sweeps and hitchhiking. They involve beneficial mutations, but what
about detrimental mutations? It’s the
same idea, just backward. Let’s just
change the sign of the selection coefficient on b’ and say that it is deleterious.
This will tend to cause selection to remove b’ from the population, and in so doing take any linked alleles
with it unless they can escape by recombining to a b allele background. This
type of erosion of genetic diversity is called background selection, it’s sort of the opposite of a selective
sweep. There’s a key distinction
though. In a selective sweep an allele
goes from a low frequency, to a high one, so it’s going to have a massive
increase in it’s frequency, and as a consequence it can knock out a lot of
diversity. On the other hand, under
background selection, we usual imagine that the deleterious allele never makes
it to a very high frequency, so it’s trip is from a low frequency to a
frequency of zero (it goes extinct).
This is a much shorter journey, so this little blip of a bad allele
doesn’t tend to knock out much diversity, because it was never linked to much
in the first place. The only places
where background selection can really make a dent are areas with really low
recombination and fairly high rates of deleterious mutation. Here you can imagine waves and waves of
background selection constantly removing variation, leaving this region with
fairly low genetic diversity.
So, neutral genetic diversity can be removed from the population
by not being linked to the good allele, or by being linked to a bad allele. In both cases alleles at the gene under
selection (gene B) are rapidly changing in frequency, and the only way to save
linked diversity is for recombination to create new linkages. Now imagine that from our ancestral ab genotype, an a’b and an ab’ genotype
both arise, and the a’ and b’ alleles are both superior to the
ancestral a and b alleles. The unfortunate
thing is that they are unlinked, because what your really want is the super-fit
a’b’ genotype to emerge. Without recombination you have to wait for a
lucky mutation to either turn a’b to a’b’ or ab’ to a’b’, which may
take a long time, or indeed never happen.
The key insight here is that recombination can play a role speeding this
up. With recombination, you can have an
exchange between a’b and ab’ to create ab and the sought-after a’b’.
This increase in the rate of evolution provided by recombination is called the Hill-Robertson effect, and some believe
this may be the reason that recombination evolved in the first place.
Linkage Disequilibrium
(to my mind this piece of jargon “would be the first against
the wall when the revolution comes”, to quote the great Douglas Adams):
Linkage disequilibrium or LD is a complicated sounding bit of
jargon that describes a simple concept. If you understand correlation, then you get
LD. Sadly because the term is so
awkward, you would have never guessed it was something you were already
familiar with.
The concept LD aims to capture is the amount of nonrandom
association between alleles at different loci.
It is usually studied within or between populations of a single species,
and as such is a fundamental component of population genetics. In the simplest case, imagine two loci, where
both loci have two alleles each, we’ll call them A and a at locus #1 and B and b at locus #2. If we sampled
a haplotype from a population with these four alleles there are four
combinations of the alleles we could find, namely AB, Ab, aB, and ab. If these loci are in
strong LD we’d expect to largely observe only 2 of the 4 combinations, perhaps AB and ab, though these symbols are arbitrary, so it could just as well be
Ab and aB. On the other hand, if LD
is very low or absent, the latter being termed linkage equilibrium, we might expect to find all four allelic combinations
in proportions corresponding to the products of their allele frequencies. So what does that mean exactly?? Let's play a game. I’ll give you a million dollars if you can
guess the alleles at one gene by only knowing the alleles at another gene. I get to pick one random individual, and
we’ll sequence her genome. I know all
her alleles at all genes. I’ll give you
the alleles for any gene of your choice in the genome, and you have to guess
the alleles at any other gene in the genome.
It’s up to you to choose those two genes. How would you go about choosing the best pair
of genes so that you can win the big prize?
The best way would be to choose two genes with very high LD between
them, because knowing the alleles at one gene in strong LD with another gene
basically tells you what that other gene will be without even looking. Often genes with high LD are physically
linked, however no requirement that genes in high LD be linked, and in fact,
there are a few cases of LD between genes residing on different chromosomes.
There are several statistics used to quantify the amount of
LD. Two of the most popular, r2
and D’, yield values on a scale
from 0 to 1, with 0 being complete linkage equilibrium,
1 being complete linkage disequilibrium,
and intermediate values indicating partial LD.
As the notation suggests, r2 is derived from the
statistical R2 (i.e.
coefficient of determination), which is the square of the correlation
coefficient. So if you get correlation,
then it’s simple to extend that understanding LD.
Maybe not jargon (or perhaps better titled: some of my pet
peeves):
Genetic Map:
This is a characterization of a genome using genetic
markers. Various approaches are used to
estimate the linear order of these markers along the chromosomes. Often the markers are laid out in the order
of their estimated genetic linkage, which is related to the fraction of meiosis
that yield a recombination between the markers. For this reason the distances on the resulting
map are a function of the recombination rate between various points along the chromosome,
and may not accurately represent the real physical distances between markers in
base pairs. Also, even the most complex
genetic maps may contain only a few thousand markers, whereas most genomes
contain millions or billions of nucleotides, so every genetic map is an
incomplete characterization of the genome.
Genome Sequence:
This
is a representation of the genome created by identifying the DNA nucleotide
sequence at a single base pair resolution.
In a genome sequence the distance between loci is their physical
distance, in base pairs. To date, very
few genome sequences are truly complete; they are often missing hard to deal
with regions, like areas where there are lots of repeats. That said, even a sparsely completed genome
sequence will contain far more loci than a genetic map (which isn’t a statement
about the utility of genetic maps, they have myriad uses).
Lex’s Map vs. Sequence Rant:
Often
in the popular press you’ll find the terminology of genetic mapping and genome
sequencing jumbled together. For example
the headline might read, “Scientists have begun sequencing the genome of
species X”. Then in the article you’ll
find a nugget of wisdom like, “scientists believe that one of the best ways to
fully unlock the potential of species X is to map its DNA”. To a geneticist this use of the term “map” is
quite confusing because it conjures the idea of a genetic map. Instead, I think it’s intended as a crude way
of equating a genome sequence with a traditional map, like a road map (remember
those things that used to live in your glove box before we got iPhones). They do both tell you where things are in
relation to one another, but unfortunately this use of the word “map” muddles
important concepts, and moreover, the road map analogy is a stretch at best. A genome sequence would be like a road map that
contains the exact order of every molecule of the road!
Stuff for a later date!
genomic conflict or intragenomic conflict
Punq. Eq.
pre-adaptation and exaptation
genetic lesion / point mutation
Genetic accommodation
Trivers-Willard
long-branch attraction
selfish gene
QTL
Wright's shifting balance theory
Hopeful monsters
genomic shock
neutral theory
inbreeding depression/heterosis/hybrid vigor
Mendelian trait
Quantitative trait
Hardy-Weinberg
Lamarkism
Lysenkoism
Dollo's law
Horizontal transfer
drift
balancing, purifying (negative), directional, disruptive, artificial, and stabilizing selection
mutation-selection balance
molecular clock
transition/transversion
handicap principle
effective population size
G X E
Batesian Mimicry
Mullerian Mimicry
Red Queen Hypothesis
Haldane's Rule
Bergmann's Rule
Baker's law
Fisher-Wright Model
Lek Paradox
Haldane's Dilema
Haldane’s Sieve
2-fold cost of sex
Price Equation
Add 'quasi-linkage equilibrium' to the list! http://en.wikipedia.org/wiki/Quasi-linkage_equilibrium
ReplyDelete