front 1 What was the main impetus behind the development of Bayesian analysis? | back 1 - Very computationally intense, faster way to do models of evolution and use statistical approaches. - Response to the increasing intractability of maximum likelihood analyses. - Not an optimality method. |
front 2 How is the statistical calculation for the tree score different from ML analyses? L= Prob(Hypothesis|Data) | back 2 Hypothesis is the tree itself. |
front 3 Accuracy | back 3 True, the right answer |
front 4 Precision | back 4 - How certain you are or how small is your level of uncertainty. - Do people agree? - Precision may be a little wide. - How narrow is range of true/correct answer. |
front 5 How do we know if our phylogenies are accurate? | back 5 - We never know 100%. - We use simulated data sets that hopefully accurately represent genetic data. |
front 6 If we can never know for sure, what two approaches are used to see if the methods we use are good at accurately determining historical phylogenetic relationships? | back 6 - Known phylogenies like bacterial sets - Simulated data sets - Congruence of methods |
front 7 Prior probability | back 7 - Beginning guess, estimate the parameters of our model of evolution, starting point - If I go into an analysis with some information, then I have more power to help inform my subsequent analysis |
front 8 Posterior probability | back 8 - Information that comes from the prior probability - Becomes the prior probability of the next tree |
front 9 Burn-in period | back 9 - Plateau where no matter how many times we do it, we maxed out the scores of our trees, throw the bad trees away but save the good trees - Use trees as stepping stones for better estimates |
front 10 MCMC | back 10 - Way to rapidly estimate statistical scores of the trees - Shortcut to speed up estimations we have to do when there's too much (complex) - A quick way to stimulate a complex research space |
front 11 What is the final outcome of a Bayesian analysis and how is this represented in a phylogeny? | back 11 - Hundreds of thousands of trees from the plateau phase and create one big tree out of those - Does estimation faster than maximum likelihood and also does branching showing how often relationships were there - A consensus tree created from a very large set of phylogenies saved from after the "burn-in" period with numbers above each node representing the posterior probabilities of each relationship 1. Hundreds of thousands of trees 2. Consensus tree for all trees 3. Support values |
front 12 Bayesian analysis | back 12 Primary method of figuring out relationships. Advantages: - Faster than maximum likelihood - Get result and estimate of process (probability support value) - Bayesian analyses incorporate models of evolution in a statistical framework, but are more efficient than maximum likelihood methods |
front 13 Posterior probability values | back 13 - How confident we are about the relationship in each node - Measures precision - If it's 100 = very high percentage they were related |
front 14 Congruence | back 14 - Simply if two phylogenies match - Precision across our results |
front 15 What is the best method to find the phylogenetic relationships of a group and what does congruence have to do with this? | back 15 - Bayesian analysis - Congruence of methods shows accurate representation/relationships of many organisms |
front 16 Support measures | back 16 Estimate precision, how confident we are and how narrow our range is between the relationship |
front 17 What are branch support measures testing? | back 17 Precision more than accuracy |
front 18 Bremer support value | back 18 - Only for parsimony analysis - Take every node on the tree and find very best tree - Number represents difference between best tree and the tree that doesn't support the relationship - Not confident in relationship if it's a small number - Can't compare from one analysis to another |
front 19 Jackknife support value | back 19 - For any analysis except for parsimony - Take out one taxon and redo the analysis and see if everything looks the same - If it all looks the same it means that the species didn't have an effect and very certain of relation - Sensitivity to the taxa that are included - Originally in parsimony, get analysis and get best tree and then redo and take out species |
front 20 Taxon sampling, what is the effect of poor taxon sampling on a phylogeny? | back 20 - Process of selecting representative taxa for a phylogenetic analysis - Lack of information, may not be accurate, may be missing so much information we didn't see the connections, want to include every species but we can't do that for diverse samples, it can mean we are missing some part of connection between species |
front 21 Bootstrap support value | back 21 - Generally applicable - Pseudo replica- replicate we constituted from original data set - Recreate data set by sampling characters multiple times and some not at all - Some samples are not going to hold the characters to give support for that relationship - We get the best tree from thousands of trees by summarizing them - Numbers between 50-100 - Can compare one analysis to another because it's done on 100% scale - Widely applicable, used a lot - Form pseudoreplication, data points can be sampled more than once, sample with replacement - Tells how accurate data is across entire range |
front 22 Bootstrap | back 22 1. Randomly resample characters with replacement to make a new data set the same size as the original (homology= data, point of evidence) 2. Find best topology (phylogeny) using new data set (pseudo replica) 3. Repeat (replicates), take all trees and make consensus tree |
front 23 Posterior probability | back 23 Bayesian support measure from the last analysis |
front 24 Why is the bootstrap support value most widely used? | back 24 - Widely applicable to all methodologies - Relatively easy - No new data |
front 25 What are the bootstrap drawbacks? | back 25 1. Estimates precision, not accuracy 2. Tend to overestimate confidence 3. Assumes independence 4. Computationally inense |
front 26 Supertrees | back 26 - Doesn't gather any new data - Take analyses that have already been done and create a method to put them together - A topology composed of different formal analyses with or without some sort of formal analysis - Allows us to combine results from incompatible data sets - Finds out areas of consensus, what if we don't agree how to represent all species - Supertree methods ensure that we will find the true sets of relationships as long as the underlying assumptions are not violate (false) |
front 27 Review what is meant by a consensus approach and a total evidence approach in making phylogenies and how this relates to supertree methods. | back 27 - Consenus approach- supertree is the agreement between total analysis - Total evidence approach- take all the data and make a tree |
front 28 Why did people start creating supertrees? | back 28 1. The unwieldiness of analysis, gets harder to work with bigger data sets 2. Like to summarize what has already been done, more formal way to summarize analyses - They were originally created as a way to combine results from separate analyses where the underlying data was not congruent enough to assemble into a single data set |
front 29 What is an informal supertree? | back 29 - No objective way to put them together/analysis that occurs - Cut and paste, kind of know from other studies how some species are related, paste it with what is known with other species/groups - Not fullproof, gives overall picture, doesn't have second analysis |
front 30 Be able to outline the two processes used to make formal supertrees | back 30 - Agreement - Optimization via matrix representation (formal doesn't do well when there's conflict) |
front 31 Agreement | back 31 - Making a consensus tree - Here's phylogeny 1 and phylogeny 2, stick them together if they agree Drawback: removed from analysis because incorrect data gets piled |
front 32 Optimization via matrix representation | back 32 - Second round of analysis that goes on that synthesizes the original one, make a big matrix based on the trees (more objective way to help decide when conflict) |
front 33 What are the major criticisms of supertree methods? | back 33 Metanalysis, analysis of previous analyses without the original data, there is a lot of imprecision 1. No primary data 2. No "signal enhancement" 3. Novel clades not supported in source data (ended up with new relationships) 4. Inadvertent replication of source data (opposite of signal enhancement) |
front 34 What is the more recent reason why people have proposed a supertree approach to building phylogenies? | back 34 There is so much data available and it can be impossible to do single phylogenies |
front 35 Disk covering method | back 35 - Estimate relationships and then create supertree - Need to have vague idea of relationships - Gather new data and then use supertree methods to put them all together - Areas with overlap, better fit and less sampling |
front 36 Biclique method | back 36 - Find large data sets of what is currently available and put them together in sequence analyses that is based on the data - Reanalyze data that is already available and put them together - Put together matrix that represents different groups - Helps identify good data sets |
front 37 Reconstructing ancestral states | back 37 Look at the certain DNA for a certain sequence |
front 38 Synapomorphy | back 38 - Single mutation that has been passed on to all the descendants - Ex: feathers for birds - Ex: making milk in mammals |
front 39 Symplesiomorphy | back 39 - Important characteristic but is lost in their descendants - Tetrapods where whales don't have legs |
front 40 Convergence | back 40 - Similar characters derived independently - Ex: bat, bird, and insect wings all derived independently but same function |
front 41 Automorph | back 41 New characteristic but only in one species, not helpful |
front 42 Which of the above character patterns provides direct evidence for classification of species into higher taxa? | back 42 Synapomorphies. All the other ones are noise/problems. |
front 43 Fitch optimization | back 43 - Method to guarantee the most parsimonious mapping of complex characters - Step by step mapping to tell us where mutations happened |
front 44 Dollo parsimony | back 44 - Once a trait is lost it is not revolved - Dollo Parsimony is best applied to the origin and evolution of complex features such a wings - Ex: ancestors of stick insects are part of the winged insect group. The common ancestor of stick insects lost their wings but some current stick insects still have wings, they "revolved" them but basically they still carried the wing gene just turned it back on - Ex: loss of teeth in vertebrates, teeth evolved only once at the origin of vertebrates and were then lost multiple times in turtles, birds, seahorses |
front 45 Make sure you understand how mapping character traits onto a phylogeny allows us to reconstruct ancestral sites | back 45 Once we map a trait on a phylogeny it allows us to see variation |
front 46 How is inference of ancestral states different under a maximum likelihood assumption? | back 46 Take into account branch lengths and models of evolution, what types of mutation are more likely |
front 47 Convergence | back 47 - Bird, bat, and insect wings evolved independently but all used to fly - Complex eyes of vertebrates, cephalopods, jellyfish, and arthropods evolved separately but are associated with vision |
front 48 Be able to briefly outline Shimodaira-Hasegawa (SH) test | back 48 1. Uses bootstrap procedure - Creates spread of possibilities - Range of trees to accept or refuse 2. Tests wether an alternative hypothesis is significantly different than the best phylogeny |
front 49 SH test drawbacks | back 49 - Has to create a distribution - Anything that is a weakness of the bootstrap will be a weakness of SH test - Tests to overestimate - Not the best way to select which model of evolution to use for a genetic data set |
front 50 Likelihood ratio test | back 50 - Is very flexible - Uses a likelihood score - Is widely applicable |
front 51 What are the four primary uses of LRT? | back 51 1. Different phylogenies (is one phylogeny better than the other/significantly different?) 2. Molecular clock (can we use this data to estimate divergence time? only in limited situations, in data that's evolving naturally, can't predict in the short-term but can predict in the long-term) 3. Models of evolution (is this one better than this one?) 4. Looking for signs of natural selection in protein coding genes (is natural selection working on this gene or part of this gene?, purple= purifying selection, yellow= weak positive selection, red= strong positive selection) |
front 52 How does one select a model of evolution? | back 52 - Do a stepwise comparison of all the different models, which is statistically different, and determine which one to choose - Multiple tests to find best one - The more complex the model of evolution, the less accurate |
front 53 Orthology | back 53 - Two genes that can trace their common history back to a speciation event - Two homologous genes, their divergence can be traced back to an ancient speciation event that split the most recent common ancestor of the two species with these genes into separate branches |
front 54 Paralogy | back 54 - Two genes share common ancestor had gene duplication instead of speciation event - Two homologous genes, their divergence can be trace back to a gene duplication event that predates the most recent common ancestor of the two species in which we find the genes |
front 55 Xenology | back 55 - Horizontal gene transfer event can make a gene history not match the species history - Huge problem for bacteria phylogenies - Occurs but is very rare in eukaryotic genes - Two homologous genes, one of them went through a horizontal gene transfer event and is now part of the genome of an organism very distantly related to the organism that has the other gene |
front 56 Which of these subclasses are of use when trying to infer phylogenetic history? | back 56 Orthologous genes |
front 57 Lineage sorting | back 57 - If speciation process is short and coalescence is fast, there there is no problem - History of alleles doesn't trace the species history - More than one allele in a population, one will be lost because of genetic drift |
front 58 What are two processes that would make lineage sorting more likely? | back 58 Rapid speciation and long coalescence time |
front 59 What are the three things that can cause a gene tree to conflict with a species tree (even when both trees are reconstructed accurately)? | back 59 - Gene duplication - Horizontal gene transfer - Coalescence (line sorting) |
front 60 Gene duplication | back 60 Connected with paralogy |
front 61 Horizontal gene transfer (plasmid/transformation, vectors with virus, and pilus) | back 61 Connected with xenology |
front 62 Strict consensus tree | back 62 - Two, three, or more trees described together the trees agree with - Very little resolution |
front 63 Majority consensus tree | back 63 - How many trees show that relationship - Better resolution - D is more closely related to ABC than E |
front 64 Pseudogene | back 64 Duplicated gene that no longer functions (still in the genome but is part of noncoding DNA) |
front 65 Neofunctionalization | back 65 Duplicated gene now has a different function from what it did ancestrally (related to anagenesis) |
front 66 Anagenesis | back 66 - Process of generating of new potential and diversity within a species over time - Change in function over time but without any genes being created |
front 67 Cladogenesis | back 67 - Process of speciation events where we generate new clades from a single ancestral population carrying characteristics into lineages and new functions - Speciation process creating new clades or new groups |
front 68 Is there a single species definition that can define all species? | back 68 No because speciation is a process, not an event and different species may have different processes that help establish the separation of a population into different species |
front 69 Does this mean that the concept of a species is a human idea and not a biological reality? How can we reconcile this discrepancy? | back 69 Recognize it's a process and there are slight differences in some groups compared to others |
front 70 Morphological species concept | back 70 - Most widely used - Cats are different from dogs by looking at them - Weakness where there is not enough morphological diversity to tell them apart Strengths: - Simple and easy - Don't need special equipment, just need observation skills Weaknesses: - Need education on terms - Need to be careful when there's a wide range of characteristics |
front 71 Biological species concept | back 71 - Groups of actually or potentially interbreeding populations which are reproductively isolated from other such groups - Used by defining rates of gene flow - Can they exchange genetic material and at what level? Only relates to sexually reproducing species Strengths: - A little bit more scientific and objective Weaknesses: - Gene flow isn't 0 or 100, more in the middle - Takes a lot of time and resources to get data set - Can't use this for asexual species |
front 72 Phylogenetic species concept | back 72 - Smallest monophyletic group distinguished by a shared derived character - Only when other two don't work Strengths: - Very objective methodology Weaknesses: - Difficulties for asexual species - Takes time and effort but it is only one thing, not multiple |
front 73 Review the Wheeler paper and his arguments for the Phylogenetic Species Concept (PSC) being the single, unifying species concepts. | back 73 Even that has its own weaknesses because it’s a complex method when you can use simple morphological species concept that applies to species |
front 74 What biological process does the PSC have a particular problem with? | back 74 Asexual reproducers ex: e. coli Hybrids and horizontal gene transfer, interbreeding would mess things up because trees would turn into networks |