renaming reads with barcode using cutadapt

Rickpatbrown · May 25, 2021, 9:09pm

Hi there,

I’m trying to cut barcodes out of my reads and append them to the read name. Cutadapt clearly shows how to do this using the -u --rename function and then indicating barcode = {cut_prefix} (suffix, in my case). The cutadapt documentation is here Cutadapt Documentation.
Unfortunately, this functionality seems to be lost in the Galaxy wrapper. You are allowed to add a prefix or suffix to the name, but it only seems to put 1 text value in as the prefix. Does anyone know of a workaround or maybe I’m doing something wrong?

Barcode splitter seems to separate every barcode into different files. I have a random 9bp sequence set, so there would be 10’s thousands of files with 1 sequence in them each.

Thanks,
Rick

David · May 26, 2021, 1:13am

Welcome!
Which galaxy server are you using?
Maybe you can concatenate the data from the info file to reconstruct the reads and play with the adapter names:

The info file contains information about the found adapters. The output is a tab-separated text file. Each line corresponds to one read of the input file.

Columns contain the following data:

1st: Read name
2nd: Number of errors
3rd: 0-based start coordinate of the adapter match
4th: 0-based end coordinate of the adapter match
5th: Sequence of the read to the left of the adapter match (can be empty)
6th: Sequence of the read that was matched to the adapter
7th: Sequence of the read to the right of the adapter match (can be empty)
8th: Name of the found adapter
9th: Quality values corresponding to sequence left of the adapter match (can be empty)
10th: Quality values corresponding to sequence matched to the adapter (can be empty)
11th: Quality values corresponding to sequence to the right of the adapter (can be empty)
The concatenation of columns 5-7 yields the full read sequence. Column 8 identifies the found adapter. Adapters without a name are numbered starting from 1. Fields 9-11 are empty if quality values are not available. Concatenating them yields the full sequence of quality values.

If no adapter was found, the format is as follows:

Read name
The value -1
The read sequence
Quality values

Rickpatbrown · May 26, 2021, 2:23pm

I am using EU and US servers (mostly EU during the day time). They have versions 1.16.5 and 1.16.6 of the tool, respectively … but neither have --rename functionality.

Where do I find the info file? Can I feed it back into Galaxy? Or do I have to process it with another platform?

Sorry, I’m a chemist, so this is all very new to me!

Thanks,
Rick

David · May 26, 2021, 2:26pm

Rick, you can find a button to enable the info file as output of cutadapt in both Galaxy servers you’re using.
Cheers

Rickpatbrown · May 26, 2021, 3:56pm

Perfect! There is also a wildcard output file! I put in my 3’ adapter as ADAPTERNNNNNNNNN. The wildcard output is the Barcode sequences found for N’s along with the readnames. Now, I just need to figure out how to merge this with the trimmed sequences.

Thank you!

David · May 26, 2021, 5:21pm

You’re welcome. There are tools in Galaxy to work with, like, merge columns, cut columns, awk, etc.

gallardoalba · May 31, 2021, 9:43pm

Hi @Rickpatbrown,
I’m currently updating cutadapt in order to include the --rename option. It will be available in a few days.
Regards

gallardoalba · June 4, 2021, 4:07pm

Hi @Rickpatbrown, we have updated cutadapt recently. The last version will be available in usegalaxy.eu in a few days.

Regards

Rickpatbrown · June 4, 2021, 5:40pm

@gallardoalba Fantastic! Thanks!