Unclear Bowtie2 unaligned read filenames

Hi all,

When using the write unaligned reads (in fastq format) to separate file(s) option in the Bowtie2 tool (triggering the --un-conc parameter) on a paired-end library, I get two unaligned read output files: unaligned reads L and unaligned reads R. I assume that these separate files are generated to distinguish which file contains the unaligned reads from the R1 and R2 input files but can anyone clarify which file (L or R) refers to which input file (R1 and R2)?

Thanks!

minor admin reformat for clarity

1 Like

Hi @Jroels

Unmapped L reads are from the R1 input (first input on the tool form)
Unmapped R reads are from the R2 input (second input on the tool form)

The read names can confirm this.

Thanks!

Hi,

So if I want to remap the unmapped reads using a different reference genome, do I have to consider the unmapped R reads as a new R2 bowtie2 analysis (in the place of FASTQ file 2) and the unmapped L reads as R1 (in the place of FASTQ file 1)?

I’m sorry for my stupid doubt, but Galaxy gives me an error and I can’t understand why.

Thanks

Hi @bioiz

Yes, that sounds right. Which tool and what kind of error? Have you tried a rerun? If not do that first.

You can share any persistent error messages and the history back here for more help. Troubleshooting errors

Hi @jennaj

Thanks for the answer.
I was using Bowtie2 and the Galaxy error was “Remote job server indicated a problem running or monitoring this job”. Today, I have ran the analysis again and the job was completed.
Checking the analysis parameters I noticed that in the first alignment I inverted the R1 and R2: I selected the unmapped reads R as R1 and the L ones as R2. So, today I reran the analysis with the correct files and the output is the same, even if I selected -fr as reads orientation parameter. Could you help me to understand why? Is the orientation parameter not so stringent?

Hi @bioiz

Glad the job didn’t fail again!

How to organize read pairs: R1 == forward and goes first/left. R2 == reverse and goes in second/right. This is true for all bioinformatics tools I can think of.

Click on the files and see the peek view. The sequence identifiers will usually have a 1 or a 2 in the names. The tools are reading that content in, and if it doesn’t find what is expected, will probably fail.

That’s good – a failure is a tool telling you the data couldn’t be understood and something needs to be fixed.

To avoid this kind of problem, pair up samples into a collection. That can be one or many pairs of reads, and means you only need to do the detail scanning one time, right at the start. Less clicking :slight_smile:

This specific tutorial covers the details: NGS data logistics