Wednesday, August 12, 2009

Python simulation of Needleman-Wunsch 5

To finish up with alignments, the code I posted previously can be called with a command-line argument that is the name of the target file containing two FASTA-formatted sequences.

Here is one of my favorite comparisons---I use it often in class. We compare the protein product of the yeast CDC28 gene with the human CDK2 gene. I always mention that the human gene complements the yeast cell cycle defect, and comment that "the cell cycle was obviously invented a long time ago."

Here is the output:


MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEG
M E N++++EK+GEGTYGVVYKA + + G+ VVALKKIRL++E EG
M--E--NFQKVEKIGEGTYGVVYKARN-K-LTGE-VVALKKIRLDTETEG

VPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYME
VPSTAIREISLLKEL NIV+L D++H++ +KLYLVFEFL DLK++M+
VPSTAIREISLLKELNHPNIVKLLDVIHTE-NKLYLVFEFLHQDLKKFMD

GIP-KDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGN
PL ++K ++ QL +G+A+CHSHR+LHRDLKPQNLLIN +G
ASALTGIPL--PLIKSYLFQLLQGLAFCHSHRVLHRDLKPQNLLINTEGA

LKLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIG
+KL DFGLARAFGVP+R YTHE+VTLWYRAPE+LLG K YST VD WS+G
IKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYYSTAVDIWSLG

CIFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSF
CIFAEM R+ +F GDSEIDQ+F+IFR LGTP+E +WP + +PD+KPSF
CIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF

PQWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQE-S
P+W R+D+S+VVP LD G LL ++L YDP RISA+ A HP+FQ+ +
PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVT