ARS1 BLAT Search
 

BLAT Search Genome

Genome: Search ALLAssembly:Query type:Sort output:Output type: 
Paste in a query sequence to find its location in the the genome. Multiple sequences may be searched if separated by lines starting with '>' followed by the sequence name.

File Upload: Rather than pasting a sequence, you can choose to upload a text file containing the sequence.
Upload sequence:

Only DNA sequences of 25,000 or fewer bases and protein or translated sequence of 10000 or fewer letters will be processed. Up to 25 sequences can be submitted at the same time. The total limit for multiple sequence submissions is 50,000 bases or 25,000 letters.
A valid example is GTCCTCGGAACCAGGACCTCGGCGTGGCCTAGCG (human SOD1).

The Search ALL checkbox above the Genome drop-down list allows you to search the genomes of the default assemblies for all of our organisms. It also searches any attached hubs' blat servers. This shows you which organisms have the highest homology with your query sequence. The results are ordered so that the organism whose best alignment has the most hits is at the top, and shows the best region found. It makes quick approximate alignments based only on the raw hits, which are a perfectly matching short sub-sequence of a fixed size: 11 for DNA and 4 for protein. The entire alignment, including mismatches and gaps, must score 20 or higher in order to appear in the BLAT output. Having too few hits will often yield no BLAT results. Click the Assembly column link on the results page to see the full BLAT output for that organism.

For locating PCR primers, use In-Silico PCR for best results instead of BLAT.


About BLAT
 

BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 20 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein BLAT on land vertebrates.

BLAT is not BLAST. DNA BLAT works by keeping an index of the entire genome in memory. The index consists of all overlapping 11-mers stepping by 5 except for those heavily involved in repeats. The index takes up about 2 gigabytes of RAM. RAM can be further reduced to less than 1 GB by increasing step size to 11. The genome itself is not kept in memory, allowing BLAT to deliver high performance on a reasonably priced Linux box. The index is used to find areas of probable homology, which are then loaded into memory for a detailed alignment. Protein BLAT works in a similar manner, except with 4-mers rather than 11-mers. The protein index takes a little more than 2 gigabytes.

BLAT was written by Jim Kent. Like most of Jim's software, interactive use on this web server is free to all. Sources and executables to run batch jobs on your own server are available free for academic, personal, and non-profit purposes. Non-exclusive commercial licenses are also available. See the Kent Informatics website for details.

For more information on the graphical version of BLAT, click the Help button on the top menu bar or see the Genome Browser FAQ.