Batch concatenate reads

We have installed a custom galaxy instance and we would like to analyze 20 samples. However, each of these samples is spread over four lanes. For subsequent analysis we would like to merge them first.

I am aware of several similar questions on this topic and they all point to the Text Manipulation: Concatenate datasets tail-to-head tool. But to do this for a large amount of samples is quite tedious. Is there a better way?

I am quite proficient in the commandline, but I would really like it if the wetlab people could do this themselves.

Hi @BasH,
an approach would be to build a dataset list by using the Operation on Multiple Datasets option.

Screenshot from 2021-04-15 10-45-38

Then, you can use the Collapse collections into a single dataset in order to combine all of them into a single file dataset.

Regards.

Hi @gallardoalba,

Thanks for your answer. This is what I can cook up with with your answer (Each dataset contains the R1 from the 8 lanes).

image

After which I can collapse them with the collapse function.

After that I can Zip them again to make them paired-end datasets again and do all the usual stuff. Is that what you meant? This is still quite tedious to do if there are quite some samples, but already better than before.

Alternatively, would it be better to skip the merging at this step, and first do mapping and merge the BAM files per sample, or do I run into different problems down the line when I do that?

The end game is that we want to have at the end of the process a VCF file with SNPs in which each column is a single sample. It should not matter where a merge step happens, but I would like to find the best way :slight_smile: .

Best regards,

Bas

1 Like

Hi @BasH,
Yes. Both alternatives generate similar outputs. I created a test history in order to check it.

Regards

1 Like

Thanks a lot for thinking along!

Best regards,

Bas

1 Like