Paird-end Fastq-dump Manipulation - Fastq De-Interlacer

Sammy · May 6, 2019, 11:59pm

Hello,

I uploaded files in Galaxy with SRA accesion list. It created 2 files: Single-end (fastq-dump) and Paired-end (fastq-dump). My workflow accepted Single-end (fastq-dump) however Galaxy does not recognize the fastq-dump for paired-end as a collection. I assumed that is because of the batch mode file. Now, i am trying to separate the forward and the reverse reads and create two separated collection. I cannot find any function that would help me manipulate the fastq-dump. (Fastq splitter was a disappointment). Any hints where they might hide?

jennaj · May 7, 2019, 1:03am

Hi,

Try the tool Fastq De-Interlacer. It is covered in the NCBI SRA link below, but the others might help, too.

FAQs: https://galaxyproject.org/support/

It is not clear if you are having issues with collections or not. The tutorials below cover those plus ways of doing batch data uploads. https://galaxyproject.org/learn/

Dataset collections - modern studies usually include many samples. Collection are designed to simplify complex, multi-sample analyses as shown in this tutorial.
More from the Galaxy Training Network https://galaxyproject.github.io/training-material/topics/galaxy-data-manipulation/

Thanks!

Sammy · May 7, 2019, 8:50am

I have trouble with batch file mode. Is a dataset within a dataset (with paired-end fastq files and each of them contain 2 separate files (one forward and one reverse)). Apparently this is too much for Galaxy when I choose dataset collection as an input for my RNA-seq workflow. So I have to separate the forward and the reverse files in two separate collection. However, now I am trying the Fastq interlacer. It might work in Hisat2 as it has an option for paired-end Fastq files that are joined in one file.

Thanks for your answer.

jennaj · May 7, 2019, 4:07pm

The “type” of the dataset collection created will matter for the data to show up on an input field, as well as the “datatype” of the data inside the collection, and the “content” of that data inside each dataset (this part doesn’t have a distinct metadata attribute – could be forward fastq reads, reverse fastq reads, or interlaced forward/reverse fastq reads). Collections are built up from those to create four options.

Many options … but maybe the explaination below will help.

HISAT2

Four different content inputs variations are possible
Three can be given as one of three different dataset types: individual files, multiple individual files, or a dataset collection that fits the overall entry type.
One can be only given as a dataset collection (item 3 below).
That translates to 10 different choices, just for the fastq input.

Screenshots, then I’ll explain what each can contain:

The four input “content” types for this tool (many will only have the first three but expect that to evolve over time as interlaced inputs are added to more tools):

The tool form changes dynamically based on the choice made.

The three dataset types available for the first two and the last from the list above are in the screenshots below. The declared “collection” will only accept a paired-end collection (so, you won’t see the options below for it - a collection is already expected).

one-indiv-dataset-input
multiple-indiv-dataset-input
one-collection-input

Where:

Single-end: Single end sequences in each original individual dataset. Those can be entered as individual datasets, or selected as multiple datasets, or be put in a “list” type of dataset collection.
Paired-end: Paired-end sequences, forward in an original individual dataset and reverse in an original individual dataset. Those can be entered as individual datasets, or selected as multiple datasets, or be put into two distinct “list” type of dataset collections. One list for the forward, one list for the reverse.
Paired-end Dataset Collection: Paired-end sequences, forward and reverse added to a dataset collection that contains one pair of F/R reads or a combined “list of pairs” if you have more than one pair. The collection is created from data that has the forward reads in one or more datasets originally, each with a matching reverse read dataset).
Paired-end data from single interleaved dataset: Paired-end sequences, forward and reverse interleaved (interlaced) in an individual dataset (this is what you had/have, before “de-interlacing”). Those can be entered as individual datasets, or selected as multiple datasets, or be put in a “list” type of dataset collection.

Hope that helps!

Topic		Replies	Views
how to create a paired dataset collection from the files which are already paired collections	11	1032	July 28, 2023
SRR number not coming in as one single fastq dump file but single end file usegalaxy.org support	3	658	February 3, 2022
Problem with paired end transcriptomic data processing in CutAdapt uploaded as collection usegalaxy.org support transcriptomics	6	1985	May 11, 2020
Fastqcdump file issue usegalaxy.org support upload , troubleshooting	1	190	March 19, 2024
Unexpected result from tool Fastq Splitter usegalaxy.org support ncbi , fastq-splitter , fastq-deinterlacer	5	1268	February 26, 2019

Paird-end Fastq-dump Manipulation - Fastq De-Interlacer

Related topics