Molecular Evolution- Exam 3 Flashcards

Population Genetics

- The study of patterns and structure of genetic diversity within interbreeding populations.

- Study of patterns and diversity at the population level.

ALDH2 example

Protein that helps us metabolize alcohol. In humans, there's clear distinct patterns based on geographic history (where your ancestors came from). People of European descendant, German and Finnish, are homozygous for the ALDH2*1 allele. However, there's another allele that is much less efficient at metabolizing ethanol. People that have one or two copies of the ALDH2*2 allele have trouble metabolizing ethanol. They metabolize it more slowly. We can say clearly where this new allele arised: far East. If we see clear geographic correlation between certain alleles and where those alleles are found, it's a good sign that the allele originated in that area and migration back and forth between other areas is fairly limited. This is the study of demography.

Which of the following is the most likely explanation for the distribution of the ALDH2*2 allele in the human population that was discussed in class?

It is a fairly recent mutation without a strong natural selection component and because of limited migration, has not yet spread widely from its orinal point of origin.

In class we discussed an allelic variant of alcohol dehydrogenase (ALDH2) that is less effective at metabolizing ethanol than the standard human allele. What does the distribution of this allele in the human population tell us about this allele?

It is a relatively recent mutation with only small detrimental effects.

The vast majority of genetic diversity in the human population shows no correlation with georgraphy.

True

What factors determine the average rate of coalescence?

- The higher the mutation rate, the slower the coalescence process.

- New mutations interrupt the coalescence process.

- The smaller the population the faster the coalescence process.

How can we measure the amount of genetic diversity in a population?

- By counting alleles. Look at polymorphisms across a chromosome.

Polymorphism

- Includes alleles, but refers to genetic changes and mutations that we wouldn't count as an allele.

- Polymorphisms that don't create a physical difference or impact the phenotype. Some polymorphisms do impact the phenotype and we call them alleles.

- Do genetic sequencing with adequate sampling of population so that we're able to capture most polymorphisms in a population.

Haplotype

Combination of alleles (variants of a gene) that are inherited together on the same chromosome.

Allele

Different versions of a gene or variants of a gene.

Homozygous

Having two copies of the same gene.

Heterozygous

Having different copies of a gene.

Dominant v. Recessive

- How traits (alleles) are expressed in individuals.

- Dominant traits are expressed/observed in an individual if they have ay least one copy of the dominant allele.

- Recessive traits are only expressed/observed in an individual if they inherit two copies of the recessive allele.

Incomplete dominance

Heterozygote, different phenotype

There's some intermediate phenotype if you're a heterozygote

Codominance

Both traits are expressed simultaneously

Dr. Terry is listening to a nature show and the announcer says, "due to a recent bottleneck, most African cheetahs share the same genes." He immediately turns red and steam starts coming out of his ears. What is the incorrect part of this statement that triggered Dr. Terry?

Within every species, individuals share the same genes. The announcer should have been more correct if he said "share the same alleles."

Hardy-Weinberg equation

p2 + 2pq + q2 = 1

Not in HW equilibrium example

AA: 40 Aa: 80 aa: 120

- What is the genotype frequency of this population?

40/240= 0.20 80/240= 0.30 120/240= 0.50

- What is the allele frequency?

f(A) or p: 2(40) + 80 = 160 160/480 = 0.33

f(a) or q: 2(120) + 80 = 320 320/240 = 0.67

- Is this population in HW equilibirum?

(0.33)² + 2(0.33)(0.67) + (0.67)² = 1

0.1089 + 0.4422 + 0.4489 = 1

Not in HW equilibrium

HW equilibrium example

AA: 40 Aa: 320 aa: 640

- What is the genotype frequency of this population?

40/1000= 0.04 320/1000= 0.32 640/1000= 0.64

- What is the allele frequency?

f(A) or p: 2(40) + 320 = 400 400/2000= 0.20

f(a) or q: 2(640) + 320= 1600 1600/2000= 0.80

- Is this population in HW equilibirum?

(0.20)² + 2(0.20)(0.80) + (0.80)² = 1

0.04 + 0.32 + 0.64 = 1

Yes, this population is in HW equilibrium.

Which of the following forces would cause an imbalance in the HW equilibrium for a trait of interest?

- If females selected males based on that trait

- If there was immigration from a distant population with different frequencies

- If natural selection was acting on that trait

Five factors that need to be in effect for a population to be at HW equilibrium.

1. Diploid, sexually reproducing

2. Random mating

3. No mutation

4. Infinite population size

5. No natural selection

6. No migration

Why does a small population size create a population that is not at HW equilibirum?

It could lead to a deviation from HW equilibrium due to genetic drift. Genetic drift is a random change in allele frequencies.

I count genotypes in a population with the following results.

AA = 300 Aa = 200 aa = 500

This population is in HW equilibrium.

False, it's not in HW equilibrium

Which of the following statements best explains the pattern observed in the gene in the population of chimpanzees for the second sample?

If natural selection has caused the imbalance it is most likely underdominance.

Stabilizing selection

- When natural selection favors an average phenotype by selecting against extreme variation (long curve in the middle)

Directional selection

Selects for phenotypes at one of the spectrum of existing variation (curve at one end of the graph)

Diversifying/Underdominance selection

- When natural selection selects for two or more distinct phenotypes that each have their advantages or the phenotypes at the different ends (more than one curve)

Why does a small population size create a population that is not at HW equilibrium?

Concern of genetic drift.

Genetic drift is a change in gene frequency that is the result of chance deviation from expected genotypic frequencies. This is a problem in small populations but it is minimal in moderate sized or larger populations.

What is genetic diversity like for most loci?

We see lots of variety.

Heterozygosity

What would be the best strategy to accurately estimate heterozygosity?

- An estimate of genetic diversity in a population. An overview of genetic diversity in a population and represents the proportion of individuals in a population that are heterozygous at a particular genetic locus (2pq).

- Overall estimate for levels of diversity in a population; the more data we have, the more precise our estimation will be.

- Pick random genomes from random places to see which mutation occurs the most, with few samples we can determine the overall heterozygosity of a population with a small degree of doubt.

What is meant by the effective population size (N_e)?

How is this different from census population size (N)?

- N_e is only the individuals that contribute to the next generation (reproducing).

- N is population size.

In natural populations, the effective size N_e is always less than the census population size N.

True

Which of the following statements is the best estimate of the number of new mutations that occur as two separate populations diverge?

The product of the effective population size and the mutation rate

The substitution rate combines the number of new mutations that occur in two diverging populations and the number of those mutations that are expected become fixed. This value can therefore be simply represented as ___.

mu, mutation rate

What loci in the human genome need to be considered differently?

Autosomal vs sex chromosomes?

Mitochondria vs nuclear?

- For autosomal genes, it is a straightforward calculation no matter what sex the individual is. But for the sex chromosomes it makes a difference because the Y is a place holder, there's not really genetic material on it so the frequency for males is different for females. Sex-linked genes are a violation of HW in individuals like humans that have chromosomal sex determination.

Review the term Q

What does it mean?

How is it estimated?

- Theta= estimate of genetic diversity.

- If everything is neutral, then theta should be four times the effective population size times the mutation rate.

- If we are seeing a genetic diversity that is not lining up with this then that could be a sign of some sort of selection force for that organism.

- Mutations are usually based on a mutation rate per base pair per generation.

SNP (single nucleotide polymorphism)

What percentage of genetic diversity in the human species is spread evenly across the entire population?

- SNPs are our largest sources of genetic data.

- SNPs are a location in the chromosome where there is variation amongst individuals.

- Caused by a single mistake during DNA replication.

- 85% are spread equally across the human population.

CNVs

- Copy number variants

- Result from gene duplications and maybe even a large number of copies that are varied.

Microsatellites

- Short tandem repeats

- Really evolve quickly because it's easy to make mistakes.

- Amongst the fastest evolving molecular markers but SNPs, microsatellites, and CNVs are all genetic markers of loci.

Review Tajima's D statistic

- A tool for finding selection on a region of DNA.

- A way to get two independent estimates of genetic diversity.

- See if the genetic diversity are statistically different from one another and suddenly there's some force (often natural selection) that has been acting on the locus/site on chromosome.

- Way to estimate whether or not there's been natural selection working on a certain site.

Review fixation index (F_ST)

How is it calculated and what does it tell you?

- A measure of population structure.

- Comparison of heterozygosity of total population compared to subpopulation.

- Estimate the heterozygosity is in one certain subpopulation and compare it to the overall genetic diversity in the entire population.

- If there's a discrepancy, that becomes a sign for some sort of selection evolutionary pressure that is acting on that subpopulation.

The inbreeding coefficient (F) is an estimate of ___

The difference between the expected heterozygosity and the observed heterozygosity.

When formatting his theory of natural selection, what part of genetic diversity was Darwin evaluating?

Does this part of genetic diversity represent a majority or minority of all genetic diversity?

- Morphological features.

- This part of genetic diversity is not the biggest part of genetic diversity.

- Natural selection is a small part of evolution overall.

Make sure you understand the effects of mutation, natural selection, and genetic drift on a population (gum wall analogy).

- Wall is genome of one individual.

- Piece of gum is a mutation, changes the population of organisms assuming it's fairly random. Mistakes that are made during replication of DNA. Every time wall gets a mutation, we are watching it through hundreds of generations.

- Two other forces to account for:

1. Steam cleaner- hoses off wall but gum still accumulates if he's the only one doing it. This is genetic drift- randomly removes some mutations and it keeps others but there's no reason for which it keeps or takes away.

2. Second cleaner- natural selection, also lazy and doesn't clean the entire wall but has a specific goal in mind. Depending on the environment, it allows certain mutations that are beneficial, that are conducive to survival in that environment and works at removing all of the others. Doesn't care about the gum on far away places on the wall that don't interfere without his art, these are neutral mutations.

Evaluating only new mutations, what are the expected relative percentages (ball park) for positive, negative, and neutral mutations?

- Positive= < 0.1%

- Negative= < 5%

- Neutral= > 95%

Natural selections works to remove the negative mutations.

Neutral mutation

If these mutations have no impact at all on the phenotype of an organism, why should we care about them?

- Mutations that are neither beneficial nor detrimental to the ability of an organisms to survive and reproduce. No impact at all for natural selection but genetic drift acts the same on negative, neutral and positive mutations.

Why might we expect most mutations to be neutral?

1. Non-coding DNA- most mutations in non-coding DNA are there but they don't impact the fitness of organism that carries mutation positively or negatively.

2. Degenerate nature of genetic code (synonymous mutations)- most mutations are silent because of the genetic code.

3. Mutations that change the AA sequence of a protein, but don't change the function of a protein

4. Nonsynonymous mutations with measurable but insignificant fitness impact

After mutation, genetic drift, and natural selection get done acting on a population what would we expect the distribution of advantageous, deleterious, and neutral mutations to be?

- Neutral > 95%

- Positive < 0.1%

- Negative < 5%

If natural selection works to remove deleterious mutations, why do we still see them in populations?

- Natural selection is not perfect, species with a "not so bad" mutation can still be viable enough to pass down mutation.

- Some negative mutations do stay in the population but are going to be shown later in life after reproducing is done, very small percentage.

Review the five predictions of neutral theory.

1. Parts of the genome that are under selective pressure will have different patterns than most of the genome.

- Ex: how are genes different than non-coding parts of the genome? Non-coding parts don't produce proteins

- Genomes are different than non-coding parts of the genome, differences can be seen

- Transcriptome- only the DNA that is being expressed and used being transcribed and translated into proteins 2. We can identify selective forces acting on entire genomes.

- Ex: GC bias, codon usage bias, isochore and codon correlation: isochore is part of chromosome that shows deviation.

3. Intraspecific polymorphism is correlated interspecific polymorphism. (correlated = neutral)

- Violation indicates some sort of selective pressure

4. Recombination and levels of polymorphism

- Isochore- different content (GS)

- Genetic hitchhiking- results in haplotype blocks which are genetically conserved

5. Molecular clock

- Estimate for divergence time that can predict how many mutations will stay in the population and how many will get lost. Calibrate molecular clock based on fossil data. Natural processes over a short period of time aren't really useful.

- Based on mutation rates, the best guess as to the time since the species diverged

What is the nature of the process that generates mutations?

- It is effectively random. There are some exceptions to that rule. Even though the mutations themselves may be random, perhaps proofreading mechanisms are not random so that might throw them off.

What is the chance of fixing a new mutation in a population?

If it's completely neutral, is 1/2N_e.As effective population size gets larger, the chance of fixing any new mutations are that much smaller.

Review the statement regarding the chance of fixing a neutral mutation in a population.

Especially recognize that it is inversely proportional to the effective population size.

...

What is the chance of fixing a mutation that is subject to natural selection?

1/2Ne

Review the statement regarding the chance of new mutations in a new population.

- 2muN_e

- Combining mutation rate and fixation.

- Substitution rate

Over long periods of time is the effective population size an important value for estimating the overall substitution rate between divergent populations? Why or why not?

...

Do all genomic regions reflect a molecular clock?

Be able to briefly outline how the molecular clock is calibrated.

- Not all genomic regions are going to reflect the molecular clock particularly genes that are under selection.

- Different evolutionary forces acting on a protein where we can't use the molecular clock.

- Calibrate the molecular clock to estimate the mutation rate: through a fossil records that represent common ancestors or levels of divergence over short/long periods of time and then estimate.

Review the relative rate test and make sure that you understand how its measured.

- How we measure the molecular clock

- All of the lines facing tracing to the same c

- Becomes a test for wether or not we can assume a molecular clock by seeing data sets or certain parts of a data set were evolving neutrally

What is the impact of generation time, metabolic rate, and DNA repair mechanisms on molecular clock estimates?

1. Generation time- k=mu/g, molecular clock was the same for organisms with many mutations vs organisms with slower reproduction and fewer generations, it followed an absolute time. Rate of nondeleterious mutations is roughly constant per calendar year.

2. Metabolic rate- organisms expected to have a faster running molecular clock because they have more mutations but this was disproven so metabolic rate has very little impact at all on molecular clock.

3. Efficiency of DNA repair- might play a role but when we're measuring mutations we are usually filtering out DNA repair mechanisms because we're not often gathering data in real time and looking at how mistakes are made in real time. We are looking at it after the potential repair mechanisms have had a chance to catch mutations. Plays very low role.

Nearly neutral theory

- The vast majority of mutations that accumulate in a population are neutral, larger population are nearly neutral, some are going to be positive where natural selection works to expand the, and a very small section are negative because natural selection are going to remove them.

- Used to infer molecular clocks.

Review the pie chart of mutations after genetic drift and natural selection act on them.

Compare to the pie chart of raw mutations and be able to explain why they are different.

- What we see after genetic drift and natural selection play a roles in a wide range of mutations.

Continuous trait

...

Directional Selection

- Shifts to the side that does well

- Two-allele counterpart= directional selection, reduce one of the allele frequencies in the population and increase the other allele frequency.

Stabilizing Selection

- Loss of the extremes and more individuals in the middle

- Two-allele counterpart: overdominance

- Ex: malaria and sickle cell anemia where heterozygotes do better when malaria is present than either of the homozygotes.

- Key is that you can't make a heterozygote without having some of either homozygotes in the population and so we reach a balance point.

- Intermediate type heterozygote

Diversifying Selection

- Individuals in the middle don't do well.

- Usually represents a change in the environment.

- Eventually going to subdivide the population into two different subpopulations.

- Two-allele counterpart: underdominance, loss of the heterozygote where both of the homozygotes become more common. Can be equal or unequal.

- Can be beginning of speciation event.

What direction does directional selection go?

How might genetics and the strength of selection affect this?

- Shifts to the right or left, depending on which side of the distribution does well.

- We would reduce and probably eventually eliminate the allele that is being selected against and we would get fixation of the other allele.

What is the null hypothesis upon which all estimates of selection are based?

Neutrality, assume that the value will be approximately one.

Ka/Ks vs d_N/d_S

What is the difference?

- Ka/Ks is the unadjusted ratio where we measure all of the nonsynonymous mutations that change the amino acid vs all the synonymous mutations and then make a ratio.

- Dn/Ds is the corrected version of the Ka/Ks, correct for the number of possible nonsynonymous and the numer of possible synonymous mutations.

What does it mean when the d_N/d_S is equal to one?

Less than one?

Greater than one?

- If it's one we assume that there is no natural selection, not significantly different

- Less than one = negative selection, because synonymous mutations which aren't really impacted under natural selection, only works in coding sequences

- More than one = positive selection, gene is probably important in one or more lineages and it has evolved and mutated and taken on a brand new function

How might we estimate different selection pressures on genes in a wide range of species?

1. Selective constraint changes over time

2. Map changes onto a phylogeny

3. Calculate Ka/Ks ratios for each branch

Could you do this by comparing both orthologous and paralogous copies?

...

Review the HIV example. Why were d_N/d_S values highest after a recent zoonotic transmission?

How might this relate to Covid19 infection in humans?

There is evidence of strong positive selection after it jumps over into humans which is a very common thing and probably occurring in the coronavirus.

Make sure you can explain how the McDonald-Kreitman test is performed.

Test amount of variation within species (P_N/P_S) and compares it to the amount of variation between species (d_N/d_S).

Calculate the d_N/d_Sratio between the species and then calculate the same value for the population within the species.

If diversity between populations is not equal to diversity within a population, what type of selection is indicated?

- Type of selection may vary, might have had strong purifying selection, positive selection, negative selection.

Microevolution vs. Macroevolution

Is there a difference?

- Microevolution = variation within a species

- Macroevolution = changes in evolution that occurs between separate lineages

- No real difference

What is meant by genetic convergence?

Review the lysozyme example and be aware there are many other examples (opsins and venoms).

- Clear pressure for very similar selection.

- Abilities happen independently.

- Digestion of cellulose, lysozymes where we can see similarities in the genes that have allowed organisms to digest cellulose. All mammals have that lysozyme but it doesn't have the ceullose digesting function that it has in the ruminants except in the small subset of primates.

Selective sweep

- When a new mutation occurs that is highly advantageous, it is going to be selected for sometimes incredibly strongly or softly.

- No selective sweeps in eukaryotes because of crossing over. Breaks up linkage.

- Genome wide in viruses and bacteria: flu

- Ex: persistent lactase expression in humans, kids rely on lactose, high levels of lactose intolerance because they don't use milk. Less genetic diversity near it.

Linkage disequilibrium (LD)

- The unexpected association of alleles or polymorphisms that are close to each other on the chromosome, more than we would expect.

- Especially pronounced after a selective sweep

Quantitative trait loci (QTL) analysis

Be able to describe how a QTL analysis is performed.

What biological process does it rely on? (If you have very high levels of crossing over, QTL analyses are less powerful).

- A region of DNA associated with a specific phenotype or trait that varies within a population.

- Cross-breeding.

- Can use it for any genetically determined trait.

- Needs lots of data and identify genetic markers.

Isochore

- Identifiable area that has a higher or lower GC content than the rest of the genome.

- Continuous sections of DNA with uniform GC values.

- Have specific genomic characteristics.

GC content bias

What are patterns of GC content bias like in prokaryotes and eukaryotes?

- Patterns of base composition.

- Bacteria range from 25% to 75% GC but have little intragenomic variation.

- Vertebrates have much more intragenomic variation.

If we cannot identify a selectionist explanation for an isochore, what might an alternative explanation be?

Neutralist: mutation bias, DNA repair bias.

- Mechanism of the way DNA is proofread that has lead to the bias.

- More stability in GC because of the three hydrogen bonds- selectionist

Codon usage bias

What is the explanation for codon usage bias? How is this related to the wobble hypothesis?

- Neutral theory: 4 codons, expect 25% for each

- Seen in highly expressed genes

- Selection for a more efficient translation process, positive selection skewed one way

- Wobble hypothesis- the ribosome does not completely eliminates all the other tRNAs, it doesn't have complete fidelity

- Increases translation efficiency

Deep homology

- Deep ancestral connection between the three

- Same family of genes tranducing the function

What are the three major types of fully functioning eyes found in the metazoan?

What groups are they found in?

1. Arthropods- characterized by complex multifaceted compound eye, complex picture.

2. Vertebrates- complex structures integrating nervous tissue, muscle tissue, and connective tissue. Lens shape is changed.

3. Mollusks- has refractive lens, does not move and cannot change shape

Are metazoan eyes homologous?

Yes, it is homologous to all animals but there's this independent single origin of a photoreceptor at the base of metazoa.

Be able to briefly describe morphological diversity in mollusks eyes, why are they a good group to study the evolution of eyes?

- Various groups have maintained these independent steps along the way to developing a fully formed eye.

- Have simple photoreceptor, and many other different ones

- Similar to the vertebrate eye in that it has a refractive lens but it does not change shape or move.

G-protein coupled receptors (GPCRs)

Opsin gene family, gene that is most closely associated with the eye and is responsible for light sensing in the ancestral form and image formation in derived forms.

Review the four subgroups in the Opsin gene gamily and know what species they are found in and what their function is.

1. Group 4 opsins

2. C type opsions- major visual opsins of vertebrates, human vision

3. R type opsions- major visual opsins for arthropods

4. Cnidops- very divergent in function, mostly expressed in the brain, evolved early on

Do all opsins have a light sensory function?

Why is estimating phylogenetic relationships of opsins tricky?

What is a feature of genes that is more conserved?

- There's a lot of convergence in function and location where they are expressed, it changes evolutionary quite easily.

- Because of gene duplications, convergence and saturation, rooting and some opsins have other signal transduction function.

Review the take home message for the evolution of opsins and how some arthropod groups are a good example of this pattern.

There is no clear pattern, we don't have a group of genes that are only eyes and then a group that are only in the brain. There's lots of flexibility and have been multiple gene duplications.

Venom

Is it homologous or analogous?

- Toxic substance that is used to be injected either offensively or defensively into another organism.

- Homologous because they come from a single common ancestor.

Proteome

Resulting protein product of transcriptome. All of the functional proteins that are present in a tissue sample or an organism.

Venome

Subcategory of proteome that is used as this toxic injection.

How is the evolution of the venome different than the evolution of opsins?

...

Review the phylogenetic relationships of snakes and lizards and how this informed our study of venom distribution and evolution.

The venom system is homologous or evolved once in an ancestor and then was elaborated on in the serpents and in the venomous lizards. Re-evaluate new phenotypes that have kind of just gone under the radar for a long time.

What was the ancestral venom delivery system? Review the more advanced forms and the groups they are found in.

- Vipers- hinged front fangs, hollow delivery system

- Elapids- smaller fangs, hollow fang delivery system but not hinged

What type of proteins might be preadapted as venoms?

- Preadapted- we have some features of a structure that are already in the ballpark of what their eventual form will evolve into.

- Digestive enzymes, salivary proteins

100

Gene recruitment, what does pleiotropy have to do with this?

- Gene recruitment (co-option)- where a gene gains a second function. It's used in one sort of tissue but then some mutation causes it to also be expressed in a second tissue.

- Pleiotroy- where one gene has two jobs.

Gene recruitment is what creates pleiotropy.

101

How might the constraint placed on a pleiotropic gene be reduced?

- They are constrained because as long as that gene is pleiotropic, it still has two distinct jobs and so it may not be able to evolve too much additional toxicity when it's been recruited as a venom.

- Remove the constraint through a gene duplication event where one copy maintains its old ancestral function and the other one can be expressed in the venom.

102

Does a protein acting as a venom need to take on a new function?

No, many of them were recruited because they were already preadapted for a digestive function or something else.

103

What type of ecological factors might increase the likelihood of convergence in the evolution of venoms in different species?

- See convergence when there is a common function whether it's been recruited for that or it was its original function.

104

C-value

The constant value of haploid DNA content per nucleus.

105

C value paradox

- Lack of correlation between biological complexity and the expected protein-coding genetic information or DNA content.

- Cow has 3Gb but amoeba has 670 Gb.

106

Non-coding DNA

We have so much of it because of

1. Structural effects on cells

2. Balance between adding and removing forces (artifact of how mutations work, some add DNA to the genome and some take DNA away, transposable elements)

107

Shotgun sequencing of genomes

Get lots of copies, break them into small pieces, sequence and assembly.

Parts are overlapped

108

What is the main explanation for the c-value paradox?

Non-coding DNA

109

G-value paradox, what three things explain the g-value paradox?

- The apparent disconnect between the number of genes in a species and its biological complexity. Resolution of the g-value paradox appears to rest on differences in genome productivity.

1. Regulatory networks- complexity, fewer genes can work in networks

2. Non-coding regulatory control- regulates area for proteins to find and turn on and off genes

3. One gene does not equal one protein- one gene can equal a lot of proteins, antibody gene has a lot of antibodies

110

How big is the human genome?

- Representation of all the DNA found in each of our cells. Other than a few random cells that don't have a nucleus and gametes, which have half of our overall genome, all of our cells have two copies of our genome plus mitochondrial genes.

- Represented by the haploid size of the DNA in a cell.

- Our genome size/C value is one half of our chromosomes, 3.2 billion nucleotides.

111

How are the X and Y chromosomes and mitochondrial genome different than the other chromosomes?

- Only males carry Y chromosomes, Y chromosomes do not recombine or swap DNA with another chromosome.

112

The mitochondrial genome is more similar to prokaryotic genomes than it is to the nuclear genome. Why is this?

They are small circular molecules of DNA like in prokaryote. Same in organization and mode replication.

113

Be able to briefly describe the differences and similarities expected in the genomes of two randomly chosen people.

You would see differences: allelic variation, polymorphisms, and major mutations.

114

What is the process for sequencing and assembling a genome?

Shotgun sequencing, cheaper and faster to do random little chuncks of the genome and do a whole bunch of them and then put them all together using a computation algorithm with a lot of overlapping pieces.

115

Contig

Assembling pieces into larger areas, some with lots of overlap some with little overlap.

- Fairly large piece of genome but not near an entire chromosome.

116

Scaffold

- Put Contigs together into a Scaffold.

- Very large segment of a chromosome, maybe an entire chromosome, and there might be gaps where we might now know if the gap represents a few missing DNA base pairs or thousands of missing DNA base pairs.

117

Mapped

- Map our Scaffold into reference genome.

- Identifying loci genes or other areas of interest on our chromosome (more commonly referred to as annotated).

118

Annotated

- Identifying any polymorphisms, genes, maybe even some information about what those genes are and how many different allelic variants they are.

- Filling in all the details of a genome.

119

Why do we need 50X coverage of a genome when sequencing?

- To make sure you have 99.9% of coverage of the genome.

- 50X whatever the haploid genome size is.

120

How is a transcriptome different than a genome?

Review the three different types of RNA, which one are we targeting when doing transcriptome sequencing?

- Starting material for genome is DNA, starting material for transcriptome is RNA.

1. rRNA-

2. tRNA-

3. mRNA- main target when doing transcriptome sequencing.

4. cDNA- RNA that has been reversed transcribed into DNA (complementary).

121

Compare and contrast genome and transcriptome assembly.

- Algorithmically, transciptome sequence assembly is very similar to genome sequence assembly but there's a different outcome.

- Trying to do the haploid copy of entire genome in the genome assembly- 23 different pieces that would represent entire genome.

- In transcriptome we are sequencing mRNA transcripts which are only the coding region. Might have these represented many times over (in genome only twice).

122

How might a transcriptome help identify genes in a new genome assembly?

We have gene annotation and identification even if we only have part of a gene. There are pipelines and programs to identify all of them in our transcriptome.

123

Do we need more or less coverage for transcriptome assembly when compared to genome assembly? Why?

Can't get away with a smaller size for coverage even though transcriptome is only 1.5% of the overall genome because some transcripts can be there at a very high concentration or very low concentration. If you don't have adequate coverage, going to over and over sequence transcripts at high frequency and may miss important transcripts that are not at a high frequency because it's random sampling.

124

What makes it difficult to come up with a single set value for coverage when doing transcriptome sequencing?

Transcriptomes tell us about the cell, what it's doing, type of cell that it was, and what is going on metabolically in the cell.

125

Be able to briefly describe the differences and similarities expected in the genomes of two different samples (ex: infected vs. non-infected, two different tissue types, different stages of development)

- If looking at liver cell after fasting for 36 hours going to see very different transcriptome than one right after eating.

Differences between tissues, same tissue types but under different metabolic activities.

126

Differential gene expression studies

An idea of what genes are being used in one cell versus another and at what levels.