the metadata and error of qiime2 import

Hi,
I uploaded a fastq.gz files to try qiime2 on Galaxy server. upload went fine, but when I try to import using Qiime2 tool: Import , I do not know how to fix the filepath since galaxy doesn’t work as local machine. I want to know how to fix the metadata txt to import the files?!

Thanks,

Welcome, @mat3ani

You do not need to worry about the file paths when working through the Galaxy application webpages. So, if you get some error message from a tool about anything technical about how the job was run: paths, warnings about dependencies, other technical side “issues” – those are likely spurious from the underlying tool, and something else is going wrong that the original tool could not capture and report correctly.

This is distinct from technical messages about the format of the data inside of the files. This can be a bit confusing, but you can always ask here if something is not clear. How to share your work for troubleshooting help is in the banner at this forum, also here → How to get faster help with your question.

When using tools, these are the most important things to pay attention to

  1. Content inside the files. This is the data inside of your files.

    • However the original tool was expecting the data to formatted, and the expected content within that formatting – all is the same when running that tool in Galaxy.
    • This means that the original tool documentation is the definitive guide about what is expected: inputs, parameters, and what is output at the end, and what that means. Publications are generally linked on the form, as well as these author resources.
    • You will also find “tips” on the form itself, and down in the Help section at the end. If the tool happened to be included in a Galaxy tutorial example, you’ll also find that linked here. And, of there were any questions here, you might also see those linked (or, just search here directly!).
  2. Metadata assigned to the files. Attributes such as the datatype format, and the database (dbkey) are assigned then interpreted by tools.

    • Datatype (file format)

      • The assignment must accurately describe the content inside the file (item 1).
      • Galaxy uses the “datatype” to tell tools what format the input file is in.
      • On each input area on the tool form, there is a toggle for “accepted formats”. This is a filter – when an accepted format matches a dataset file in the active history, that dataset file becomes “available” as a potential input.
      • You can adjust the datatype assignment but it is usually best to allow Galaxy to “guess” at the start, when using the Upload tool. (Other Get Data tools will also do this).
      • If the guess is wrong, that can be an important clue about that data: truncated, malformed, and related situations.
    • Database (fasta index, commonly named a “dbkey”)

      • Mapping tools will assign a database, most other tools will just chain together files after that to preserve the key.
      • Some tools interpret this, so be careful if you assign this directly. If incorrect, you’ll be using the wrong “fasta index” and the coordinates will be wrong with scientific impacts.
      • Reads are not associated with a specific genome assembly version until mapped. Why? The fasta index is about the data coordinates, not the bases. Avoid assigning this until the reads are tied to a particular assembly.
      • Want to track species, or sample, or other annotation? Use Tags instead of metadata. Metadata is interpreted, tags are not (unless they are a specific type of tag: a group tag). See tags
  3. Shape (individual dataset files, structured collections of files in a “folder”)

    • You can organize your dataset in the history into structured groups.
    • One type of group is a paired-end collection: forward and reverse reads from one or multiple samples.
    • The tool form input areas have options that perform another type of filter against the files in the history: the shape filter. One file, multiple files, or a collection of files.
    • More about collections → FAQ: Datasets versus collections


So, with that context:

The Qiime2 tools are using data inputs that are structured into qza artifacts. The first step is to load your data into an artifact using the import tool.

How to use it is on the tool form: the tool is expecting a specific name format on the files in order to build up the data and metadata that the initial artifact will contain. Then for the Galaxy part: make sure the datatype and shape match what the form is expecting.

You can alter the file names and shape all in Galaxy. Some tutorials are here, or you can search the tool panel with common command-line utility names to find the operations.

For an example, this topic has some more details about everything above, with actual data. Maybe helps? qiime2 tools import

Hope this helps! :slight_smile:

Hi there,
I think this message got the wrong recipient.
Cheers,
Raul

Hi @Raul_Carlos_Mainar_J If you want to adjust what messages you get from this forum, you can do that under your profile. Let us know if you need help. Thanks! :slight_smile: