McClean:Searching a Sequence for All Database Primers
Because there are many varied database and sequence file formats as well as proprietary sequence editors, searching a sequence against our entire primer collection is not straightforward. However, it can be done and is well worth the effort. There is better copy of this write-up in Michael Patel's Z: drive folder, in the folder called 'Oligo Search Files.'
Convert List of Oligos in Database to FASTA:
- Make sure the Ruby environment is installed on your computer. http://rubyinstaller.org/ is the easiest way. When installing make sure all options are checked.
- Convert the oMM sheet from Access into an Excel spreadsheet by using the export tool within Access.
- Open this newly created file in Excel and ‘save as’ to a .csv file. (This really shouldn’t be multiple steps, so if you figure out a quicker way, change it.) Take this .csv file and delete the superfluous columns (all columns that aren’t the oMM title and oligo sequence.) There should only be two columns left. I call the resulting file oMM.csv.
- Open up Command Prompt and change the directory (the command ‘cd’) to include the folder where you saved oMM.csv, or you can create a new folder somewhere and work off of that. I find the latter to be better. Just make sure oMM.csv is in the folder where you choose to work.
- Enter the command “ruby -ne 'puts ">" + $_.split(",").first(2).join("\n")' oMM.csv > oMM.fasta” and hit enter. If you choose different file names, be sure to adjust accordingly.
- oMM.fasta should be in the same folder that the .csv file is. Open the new file and make sure the formatting is correct. Correct any mistakes that may appear.
- I have an already prepared oMM.fasta in my Z:\ drive folder (Z:\Michael Patel\Oligo Search Files\) It encompasses oMM001-oMM653.
Perform Search with UGENE:
- Install UGENE if you haven’t already: http://ugene.unipro.ru/download.html. If you’re familiar with VectorNTI, the layout is somewhat similar.
- On the main toolbar, go to ‘Tools > Workflow Designer.’ Once there, on the main toolbar go to ‘Actions > Load Schema.’ Choose the file ‘complete_oligo_search.uwl’ which I have included in my above mentioned Z:\ drive folder. Of course, you can double click on the file if it defaults to UGENE on your computer. Both methods will bring you the same screen.
- In the ‘Read Sequence’ box, click on the blue lettering that says ‘oMM159.gb’. The panel on the right will highlight the ‘Input File(s).’ Change to your desired reference file (the Genbank, or .gb, file format is preferred but not absolutely necessary.)
- In the ‘Smith-Waterman Search’ box, make sure the path to oMM.fasta is correct (the first blue highlighted words,) the percent similarity is 100%, and that you are searching both strands. The ‘output the regions found…” part you can rename to whatever you want if you choose.
- Click on the ‘Write Sequence’ box and click on the last set of blue lettering and adjust the ‘output file’ section to whatever path and filename you choose. Here I have it set as ‘C:/Scripts/query.gb’. Make sure the output file format remains Genbank. Right below these dropboxes you should see another dropbox called ‘Set of Annotations.’ Make sure all boxes are checked.
- Click on the orange Play button in the toolbar (or CTRL + R) to run the search. The search will take a few minutes or so depending on your computer.
- When finished, open up the output file. You should see the positive search results highlighted. If you hover your mouse over them, the oMM title will be in the dialog box that comes up.
Please feel free to post comments, questions, or improvements to this protocol. Happy to have your input!
- List troubleshooting tips here.
- You can also link to FAQs/tips provided by other sources such as the manufacturer or other websites.
- Anecdotal observations that might be of use to others can also be posted here.
Please sign your name to your note by adding '''*~~~~''': to the beginning of your tip.