Guiding Principles Quiz Flashcards


Set Details Share
created 9 months ago by ChastainReagan
1 view
Biology Lab II Dr. McLoud & Dr. Baliraine
updated 9 months ago by ChastainReagan
show moreless
Page to share:
Embed this setcancel
COPY
code changes based on your size selection
Size:
X
Show:

1

1.

In any segment of DNA, typically only one frame
in one strand is used for a protein-coding gene. That
is, each double-stranded segment of DNA is
generally part of only one gene.

2

2.

Genes do not often overlap by more than a few
bp, although up to about 30 bp is legitimate.

3

3.

The gene density in phage genomes is very high, so
genes tend to be tightly packed. Thus, there are typically
not large non-coding gaps between genes.

4

4.

Most protein-coding genes will have coding potential predicted by
Glimmer, GeneMarkS (self), or GeneMarkHost (version 2.5). Start sites are
chosen to include all coding potential. These are, by far, the strongest pieces
of data for predicting genes.

5

5.

Many phage genes are unique, and will not have any
homologues in any databases. This is OK, and lack of similar
sequences in databases should not be the sole reason for
removing a Glimmer or GeneMark gene prediction from an
annotation

6

6.

Some protein-coding genes may not be predicted by
Glimmer or GeneMark. Therefore, all ORFs over 120bp
that fall into gaps in predicted genes in the annotation
should be carefully evaluated for similarity to genes in the
databases. In this case, evidence such as strong
sequence similarity to previously annotated genes in
GenBank or phagesdb.org, or a likely functional prediction
with HHPred is sufficient for inclusion in the annotation. If
you have no data to support the filling of a gap, do not
fill the gap.

7

7.

If there are two genes transcribed in opposite directions whose
start sites are near one another, there typically has to be space
between them for transcription promoters in both directions. This
usually requires ≥ 50 bp gap.

8

8.

Protein-coding genes are generally at
least 120 bp (40 codons) long. There are a
small number of exceptions. Genes below
about 200 bp require careful examination.

9

9.

Switches in gene orientation (from forward to
reverse, or vice versa) are relatively rare. In other
words, it is common to find groups of genes
transcribed in the same direction.

10

10.

Each protein-coding gene ends with a stop codon (TAG,
TGA, or TAA).

11

11.

Each protein-coding gene starts with an initiation
codon, ATG, GTG, or TTG. Note that ATGs account for 68% of
starts called in the Actinobacteriophage database of phage
genes, GTG for 26%, and TTG for 7%.

12

12.

An important task is choosing between different
possible translation initiation (i.e., start) codons. The best
choice of start site is gene-specific, and gene function and
synteny must be carefully considered. As phage genes are
frequently co-transcribed and co-translated, less weight may
be given to optimal ribosome binding site sequences in start
site selection. Identifying the correct start site is not always
easy and is predicated on the following sub-principles:

13

12a.

The relationship to the closest upstream gene is important.
Usually, there is neither a large gap nor a large overlap (i.e., more than
about 7 bp). If the genes are part of an operon, a 1 or 4bp overlap
(ATGA), where a start codon overlaps the stop codon of the upstream
gene, is preferred by the ribosome. Therefore, RBS scores may have
little bearing in this type of gene arrangement. (The 4bp overlap is
commonly found in the genes of the genomes in the
Actinobacteriophage database. This is demonstrated by the data: TGA
stops are the most commonly used codons at 65% of the time, with
TAG at 17%, and TAA at 18%.)

14

12b.

The position of the start site is often conserved among homologues of genes.
Therefore, the start site of a gene in your phage is likely to be in the same position as those in related genes in other genomes. But be aware that one or more previously annotated and published genes could be suboptimal, and you may have the opportunity to help change it to a more optimal one. Homologues in more distantly related genomes (those of a different cluster) may prove more informative because alternate incorrect start sites are less likely to be conserved. Use Starterator!

15

12c.

The preferred start site usually has a favorable RBS score within all
the potential start codons, but not necessarily the best. A notable
exception is the integrase in many genomes, which has a very low RBS
score. Our experimental data suggests that some genes do not have an
SD sequence.

16

12d.

Manual inspection can be helpful to distinguish between
possible start sites. The consensus is as follows: AAGGAGG – 3-12
bp – start codon.

17

12e.

Your final start-site selection will likely represent a compromise of these sub-principles. A corollary to the choosing start guidelines: Sometimes the best start leads to the choice between 2 tandem start codons (i.e. one is right after the other). From a small amount of mass spec data and some basic biology principles, always choose the second start codon. For example, the Met-Met “ATGATG” or Met-Leu “TGATGTTGA” start codons
• Important to check the six-frame translation!

18

13.

tRNA genes are not called precisely in the program
embedded in DNA Master, and require extra attention.

19

14.

Protein assignments require rigorous review of the ever-
increasing available data. At a minimum, each gene should be
evaluated using HHPred and BLASTP, as well as examined in
the context of the functions of the flanking genes (synteny).

20

15.

Iteration is key. Annotation is like writing a paper; after
you've made a rough draft, you will need to refine, revise, and
polish all your genes calls to produce a cohesive whole.