Saturday, May 10, 2008

Standalone BLAST

From the top of the main BLAST page click Help. This is the link. Download the tarred files labeled macosx-universal. It's a big download (44.7 MB, 98.1 MB after it's unpacked). I put the directory in /bin in my home directory. There is documentation for all the programs, and more information on the NCBI web site. For example, here is info on bl2seq, to compare two sequences using BLAST.

To show how this works, I wrote a script that loads my favorite DNA sequence from my desktop and mutagenizes it. This is the function mutagenize():
import os, random, subprocess
R = random.Random()

def mutagenize(seq, percent=10):
    L = list(seq)
    D = { 'A':'CGT', 'C':'AGT', 'G':'ACT', 'T':'ACG' }
    N = int(percent / 100.0 * len(seq))
    X = len(seq)
    for i in range(N):
        j = R.choice(range(X))
        nt = L[j]
        L[j] = R.choice(D[nt])
    return ''.join(L)

This is the code to call the bl2seq binary:
def blast2(infile1,infile2):
    program_directory = '~/bin/blast/programs/'
    exe = program_directory + 'blast-2.2.18/bin/bl2seq'
    cmd = exe
    cmd += ' -i ' + infile1
    cmd += ' -j ' + infile2
    cmd += ' -p ' + 'blastn'
    cmd += ' -o ' + '~/Desktop/blastresults.txt'

    p = subprocess.Popen(cmd, shell=True)
    sts = os.waitpid(p.pid, 0)
    return sts[1] # exit status

Of course, one could use the Biopython module to help with this. This is part of the file:
 Score = 1572 bits (793), Expect = 0.0
Identities = 1138/1253 (90%)
Strand = Plus / Plus


Query: 1 atgacccttttagcgctcggtattaaccataaaacggcacctgtatcgctgcgagaacgc 60
||||||||||||||||| ||| ||||||||||||||||||||| ||||||||||||||||
Sbjct: 1 atgacccttttagcgctaggtgttaaccataaaacggcacctgaatcgctgcgagaacgc 60


Query: 61 gtaacgttttcgccggacacgcttgatcaggcgctggacagcctgcttgcgcagccaatg 120
|||| |||||||| ||||| |||| | || | ||||||| |||||||||||||||||||
Sbjct: 61 gtaatgttttcgcgggacaggcttaaacaagtgctggacggcctgcttgcgcagccaatt 120

The results are saved to a file on my desktop. It is text, apparently there is no option as yet to save the results as XML.