HISAT2 on Cutadapt output

Henk · April 12, 2021, 12:40pm

Hi All,

I trimmed my paired datasets for adapters using Cutadapt. However, the output of the Cutadapt is not visible if I want to run HISAT2 subsequently. RNA STAR can easily use detect Cutadapt datasets for further processing. I can circumvent this problem by generating a new list of dataset pairs, but I assume there must be a faster way, because, when building a new list of dataset pairs, I also need to edit the pair name in order to keep a better overview further down the pipeline. Am I overlooking something?

Flow · April 12, 2021, 1:00pm

Hey Henk,
Probably HISAT2 has a problem to realize that your data is valid as an input. Please check that the data output of Cutadapt is in a fasta/fastq format and please also check if the data is not marked as hidden in Galaxy.

Henk · April 13, 2021, 9:00am

Hi F,

Thanks for your quick reply. The files are of fastqsanger.gz type and I was able to get the trimmed database visible for further analyses when I unhide them. Unfortunately, I do need to edit the attributes in order to keep better track of the individual pairs for downstream applications. Perhaps I do still overlook something or is this indeed a little flaw in a, for the rest, pleasant platform to work on.

Henk

jennaj · April 15, 2021, 11:53pm

Update 4/22/21

Re-tested with the prior test, plus created a new extracted workflow and reran that. Everything is working.

I had forgotten that Cutadapt will split a paired-end collection into two collections: one contains the forward reads, and one contains the reverse reads. Maybe this is the problem you had? And this example usage will help you to sort out the problems you were having? The end result from HISAT2 is a single collection.

The data is a bit large in the history, but you don’t need to actually import it – use the “view” function, all the details are available.

The workflow is very simple and doesn’t consume any extra quota space, so importing and perhaps using it or reviewing it in the editor will help even more. You’ll see that a single collection of paired-end reads was input to Cutadapt, then resulting collections of the forward and reverse reads were input to HISAT2, producing one final collection of mapped BAMs (one per pair).

History: https://usegalaxy.org/u/jen-galaxyproject/h/copy-of-test-cutadapt-listpaired--hisat2

Workflow: Galaxy

Hi @Henk

The data is in a paired-end dataset collection with the datasets inside it assigned the datatype fastqsanger.gz, correct? The output from CutAdapt?

HISAT2 should be able to recognize that dataset collection and process it directly.

The collection should be in the active history, with this option set on the HISAT2 tool form. Note: The screenshot is from loading the tool with a new, empty history that does not contain inputs of the proper datatype. It is a good way of discovering what the appropriate/recognized inputs are. When you run this, the input should be populated with one or more paired collections assigned one of those specified datatypes. Compressed will be recognized and uncompressed automatically as part of the pre-processing.

If you do not get a result like that, something is going wrong, and we can follow up on that. Rebuilding the collection or unhiding collection elements should not be needed (although they both are valid workarounds for this problem!). Let’s try to fix it at the source. Cutadapt had some issues with Workflows in the past – but am pretty sure those are resolved now (most current Galaxy release + most current tool version).

Few questions:

Where are you using Galaxy? URL if a public server. And are you using the most current tool versions available there? We can follow up from there – an example might be good to review (directly, not publically) – if needed, we’ll explain how.
If you own Galaxy (local, cloud), are you upgraded to the latest Galaxy version? Are the tools updated from the ToolShed? Has this worked before or it is a new problem? If using a Workflow, did this stop working after changes (not necessarily these steps)? You might even want to check to see if some sample of this works at one or any of the UseGalaxy.* servers (as a comparison).

Let’s start there. This was an issue before, so seems odd to have it come up again. I’ll also review/rerun our prior test cases at UseGalaxy.org and will write back with that result.

Thanks!

Topic		Replies	Views
Cutadapt not outputting paired dataset list in both history and workflow	0	499	October 3, 2020
Problem with paired end transcriptomic data processing in CutAdapt uploaded as collection usegalaxy.org support transcriptomics	6	1988	May 11, 2020
Paired-end Cutadapt Produces Empty List usegalaxy.org support cutadapt	4	473	March 4, 2024
Error with HISAT2 usegalaxy.org support transcriptomics	3	23	April 3, 2025
cutadapt Read output not working for RNA STAR usegalaxy.org support fastqgz , transcriptomics , cutadapt , rna_star	1	453	April 24, 2023

HISAT2 on Cutadapt output

Related topics