Paird-end Fastq-dump Manipulation - Fastq De-Interlacer

Hello,

I uploaded files in Galaxy with SRA accesion list. It created 2 files: Single-end (fastq-dump) and Paired-end (fastq-dump). My workflow accepted Single-end (fastq-dump) however Galaxy does not recognize the fastq-dump for paired-end as a collection. I assumed that is because of the batch mode file. Now, i am trying to separate the forward and the reverse reads and create two separated collection. I cannot find any function that would help me manipulate the fastq-dump. (Fastq splitter was a disappointment). Any hints where they might hide?

1 Like

Hi,

Try the tool Fastq De-Interlacer. It is covered in the NCBI SRA link below, but the others might help, too.

FAQs: https://galaxyproject.org/support/

It is not clear if you are having issues with collections or not. The tutorials below cover those plus ways of doing batch data uploads. https://galaxyproject.org/learn/

Thanks!

I have trouble with batch file mode. Is a dataset within a dataset (with paired-end fastq files and each of them contain 2 separate files (one forward and one reverse)). Apparently this is too much for Galaxy when I choose dataset collection as an input for my RNA-seq workflow. So I have to separate the forward and the reverse files in two separate collection. However, now I am trying the Fastq interlacer. It might work in Hisat2 as it has an option for paired-end Fastq files that are joined in one file.

Thanks for your answer.

1 Like

The “type” of the dataset collection created will matter for the data to show up on an input field, as well as the “datatype” of the data inside the collection, and the “content” of that data inside each dataset (this part doesn’t have a distinct metadata attribute – could be forward fastq reads, reverse fastq reads, or interlaced forward/reverse fastq reads). Collections are built up from those to create four options.

Many options … but maybe the explaination below will help.

HISAT2

  • Four different content inputs variations are possible
  • Three can be given as one of three different dataset types: individual files, multiple individual files, or a dataset collection that fits the overall entry type.
  • One can be only given as a dataset collection (item 3 below).
  • That translates to 10 different choices, just for the fastq input.

Screenshots, then I’ll explain what each can contain:

The four input “content” types for this tool (many will only have the first three but expect that to evolve over time as interlaced inputs are added to more tools):

The tool form changes dynamically based on the choice made.

The three dataset types available for the first two and the last from the list above are in the screenshots below. The declared “collection” will only accept a paired-end collection (so, you won’t see the options below for it - a collection is already expected).

one-indiv-dataset-input
multiple-indiv-dataset-input
one-collection-input

Where:

  1. Single-end: Single end sequences in each original individual dataset. Those can be entered as individual datasets, or selected as multiple datasets, or be put in a “list” type of dataset collection.

  2. Paired-end: Paired-end sequences, forward in an original individual dataset and reverse in an original individual dataset. Those can be entered as individual datasets, or selected as multiple datasets, or be put into two distinct “list” type of dataset collections. One list for the forward, one list for the reverse.

  3. Paired-end Dataset Collection: Paired-end sequences, forward and reverse added to a dataset collection that contains one pair of F/R reads or a combined “list of pairs” if you have more than one pair. The collection is created from data that has the forward reads in one or more datasets originally, each with a matching reverse read dataset).

  4. Paired-end data from single interleaved dataset: Paired-end sequences, forward and reverse interleaved (interlaced) in an individual dataset (this is what you had/have, before “de-interlacing”). Those can be entered as individual datasets, or selected as multiple datasets, or be put in a “list” type of dataset collection.

Hope that helps!

1 Like