I was wondering how I could setup a workflow to iterate through a list of FASTA files, in addition to a list of pairs (PE Illumina reads), and run a tool (bowtie2 for alignment) so that the first FASTA file is used with the first pair of reads, second FASTA file used for second pair of reads, and so on. I’ve attached some screenshots below with my current history structure for reference.
I couldn’t find any way to loop through collections (ideally using a “for”-style of loop). Closest i could find was the “Extract Dataset” tool which requires you enter an index; so this would become a pretty manual process if I were to do this for all 30+ items spread over 2 lists.
Just wanting to make sure that there is no way to do this within Galaxy already before I go out and start programming a custom tool for this.
Do you really need each PE pair to a specific reference? Are you doing this to check contaminations?
Is there a problem if you merge your fastas and run bowtie on the merged reference?
What about run [all the PE pairs] x fastas [separated, as collection] ?
Thanks for your reply. The reason I’m doing this type of read mapping specifically on a per-sample basis (and not combining all my FASTA files/contig as into one file) is part of the bacterial genome binning process and MAG assembly pipeline we are following.
If I merge my FASTAs and create a Bowtie2 index/reference from that, I run the risk of reads aligning to contigs present from another sample, potentially altering coverage which is used in the binning steps.
Thanks @David, I forgot about the API. I’ll consider it, although I think having something on the graphical front-end for users would be ideal. Shouldn’t be too hard to write a simple script to take care of it and wrap it as a Galaxy tool.