qiime2 tools import

Hi there,
I am trying to import a collection of paired-end fastq files (374 files in total) of this type “XXXX_S1_L001_R1_001.fastq” but not sure what “type of data to import”.

When trying “EMP Paired-end sequences”, or"FeaturedData[PairedEndSequence], I got errors.

Thanks,

Raul

Welcome, @Raul_Carlos_Mainar_J

The qiime tools are expecting the fastq sequences to have a specific format in the @ sequence identifier line. The tool is reporting your sequence identifier as something different.

If you are not sure what “Casava” format means or looks like, this guide can help → FASTQ format - Wikipedia

There isn’t an automatic way to change the sequence strings but the sequencer would have output that format originally if you can still get access to that data.

Or, you can try using Manipulate FASTQ reads on various attributes. Scroll to the bottom of the form to see example usage plus a tutorial.

Hope this helps! :slight_smile:

Dear Jenna,

thank you for your quick reply.

I think my files are OK, in the proper format, they look like:

@M00725:48:000000000-K9MLY:1:1101:15565:1455 1:N:0:CTCGACTT+CTACTATA

and we have been able to run it in qiime2 out of Galaxy with the following input:

Type of data to import: SampleData[PairedEndSequencesWithQuality]

QIIME 2 file format to import from: Casava One Eight Single Lane Per Sample Directory Format

The problem we have now is that we created in Galaxy a collection of 186 pairs and when trying to use the

Select a mechanism: Use collection to import

it does not find the collection.

Any idea why this is happening?

Thanks a lot again,

Raúl

Hi @Raul_Carlos_Mainar_J

As a guess, the datatype could be the problem. On the tool form, once the input type is chosen, examine the “expected formats” drop down menu, then make sure your collection has one of those formats assigned (click to expand the folder). When those are the same, the dataset is available.

The next thing to check is the “shape” of the collection.

  • a nested collection will have two levels – the first will list out the element identifiers which are usually sample names parsed out from the pairs (the common part of the original file names – although you can adjust this)
  • a flattened collection has just one level, and there are two collection folders. One contains the “forward: sample name” type of element identifier and in the other will be the “reverse: sample name” . These are created by you from an original nested collection using Collection Operations → Flatten collection
  • then there is a list collection. This is a simple list of files. Some Qiime inputs require that the forward + reverse reads are put into an interleaved fastq file. The content for each sample is: read1 forward, read1 reverse, read2 forward, read2 reverse, repeat. There are many tools to do this, one is Seqtk.

Which to use depends on the input type selected, and this isn’t documented for all choices. So, Qiime is tricky since there isn’t a Galaxy example of all the different accepted formats from the tool authors … but if you want to share your history we can try.

Maybe try out the different collection shapes first, and make sure your datatype (format) is Ok first (fastqsanger or fastqsanger.gz is my first guess) and leave those tests in the history you are sharing so whoever reviews (probably me!) doesn’t have to do that first.

Whew! We can try to work this out and reach out to the developers at their forum for more help if needed. :hammer_and_wrench:

Hi again Jennifer,

I tried the seqtk_mergefa and the seqtk_mergepe commands but even so, Galaxy does not identify my collection in qiime2 tools import. And I think the files are correct (fastq.gz)

Here is the link to my history, there are two collections, one is a subsample of the other. I use the smallest one for quick running.

Again, many, many thanks for your input and have a good weekend!

Raul

Thanks for sharing the data @Raul_Carlos_Mainar_J

I got this to work with a bit of fiddling with the organization and the element identifiers. You could extract those steps into a simple workflow to apply to all of your data quickly, all in Galaxy. Bookmark that workflow and it will show up in the bottom of the tool panel just like a regular tool.

Form options that seem to fit your data best

History shared back https://usegalaxy.eu/u/jenj/h/copy-of-qiime2-training-httpshelpgalaxyprojectorgtqiime2-tools-import12641

Items to notice

  1. How the element identifiers need to be labeled in the notes under the input area. You don’t need to do exactly what I did in the history, so just consider it an example of possible manipulations.

  2. My test run just to get the tool to accept the collection worked as far as the job submission part, but then failed since the history didn’t have the two pieces of metadata still required: manifest and yaml. The Qiime2 docs are the best place to learn how to create those since the usage in Galaxy should be the same.

  3. When you create those files, just make sure the element identifiers in your collection are in the exact same format as in those other sample sheets. The tool wants to sort the files into samples, etc.

Hope this helps and have a great weekend too!

Dear Jennifer,

I have the metadata file but not the manifest file. I have been researching to see how I can get it but it looks like prety difficult (at least form me).
However, I used:

Type of data to import: SampleData__ob__PairedEndSequencesWithQuality
File format to import from: CasavaOneEightSingleLanePerSampleDirFmt

after your data transformation of my colection and it worked!

I compared results with those from qiime2 analysis we had and they match.
So I think I can go forward with my qiime2 analysis in Galaxy.

Thank you very much!

Raul

1 Like

Great news and thanks for letting me know :slight_smile: