Amino acid sequences are determined by nucleotide
sequences in DNA, through their transcribed expression in mRNA and its subsequent
translation, codon after codon, determined by what is called the Genetic Code. In the
examples that follow, similar will be understood as having a
certain degree of colinearity. To understand this,
compare:
PRINCECHARLES
PRINCESSDIANA
You
will promptly recognise colinearity of the first 6 characters. Comparisons between
sequences (be it of amino acids or nucleotides) can only be made if they are aligned
together. Identity can be continuous or not. In this case, you might ask also whether
the first A in each sequence is "conserved" or a coincidence. Any biologist would tend
to argue for coincidence, but this is not always the case. It's a tough decision (see
final part of this answer).
So, at least the first 6 "amino
acids" are "similar" because they can be aligned. Let us now use the one-letter code for
each amino acid (clarifying that PRINCE = Pro-Arg-Ile-Asn-Cys-Glu). Reverse-translating
this, using the standard genetic code, will give
you
CCNCGNAUHAAYUGYGAR (or
CCNAGRAUHAAYUGYGAR)
(in RNA
sequences,
N = A/G/C/U, H = A/C/U, Y = C/U, R =
A/G)
thus realizing that, to produce PRINCE, some positions
in the mRNA (CC..G.AU.AA.UG.GA.) are invariable, and others variable. The code is said
to be degenerate, in the sense that most amino acids are specified
by more than one codon (only two exceptions in the standard code: methionine and
tryptophan). But for a given amino acid all codons are related to some
point. This is why the DNA would have to conserve at least those positions
unchanged in order to specify the conserved tract.
Now
consider another
sequence
TRENCHCHARTLESS
(=
Thr-Arg-Glu-Asn-Cys-His-Cys-His-Ala-Arg-Thr-Leu-Glu-Ser-Ser)
Quite
easily you will recognise that some degree of colinearity remains with PRINCECHARLES,
and note that .R.NC.CHAR-LES- are shared between the two. It would be a good exercise to
verify, considering only the CGN for Arg, CTN for Leu and TCN for Ser, that the mRNA
sequences are,
respectively,
CCNCGNAUHAAYUGYGARTGYCAYGCNCGNCTNGARTCN
and
ACNCGNGARAAYUGYCAYTGYCAYGCNCGNACNCTNGARTCNTCN
The
characters in bold represent nucleotides that must be
identical between both DNAs because they code for an identical amino acid (others are
not emphasised, because they are identical for other
reasons).
These examples should answer the
question.
Now for just a final point: why did I choose
"readable" amino acid sequences for this example? It can be a fun game to play, but the
real reason is that, in all this comparison work, sequences are
information. Bioinformatics is a discipline that struggles to "make
sense" of amino acid sequences by defining what are called protein
domains, that is, segments of the polypeptides to which a function can be
assigned. So, in the first pair of examples, PRINCE and PRINCESS are obviously related,
not only in structure but semantically. Indeed two different proteins can share a
related domain and be otherwise unrelated. If one has reason to decide that the common A
between CHARLES and DIANA is a remnant of shared function, then its DNA sequence
counterpart is said to be conserved phylogenetically (at least the
constant GC in the codons that specify alanine). Of course this decision is not
arbitrary and relies on many comparisons and consensual evolutionary
interpretations.
No comments:
Post a Comment