Splitting interleaved/interlaced fastq data and Extracting fastq data from an sra archive

For your Trinity questions:

  1. Trinity requires that paired-end inputs are “matched pairs”. Meaning, both ends of the same read are input.
  2. If one of the ends fails QA/QC (Trimmomatic, Fastp, others), then the other associated end cannot be used, even if it happens to passes QA.
  3. When extracting from SRR, the original data will be paired.
  4. When running through QA tools, the data can become un-paired. That said, the output from QA tools, for example Trimmomatic, the data will be sorted into four datasets.
    1. Paired forward + Paired reverse = use these for assembly inputs
    2. Single forward + Single reverse = do not use these for assembly inputs. One end of the original pair did not pass QA, and the assembly will fail if input.

Please be aware of a few current factors that can impact assembly success/failures when using the public Galaxy Main https://usegalaxy.org server right now. There is a banner on the server explaining. More details:

  1. Trinity and Unicycler are running with reduced memory allocation at this time.
  2. Make sure to use the most current version of all tools, or unexpected problems can occur. The most current version of any tool’s form will load from the Tool Panel.
  3. If your job fails, confirm that you are using the most current tool version.
  • If not, rerun using the updated version.
  • If yes, then the failure may be due to the reduced memory resources. Try one rerun. If that fails again, there may be some other problem with your inputs. How to check for common input problems is discussed in the topic below.
  • My inputs are Ok – How to work-around the reduced memory allocation? a) Consider using an alternative public Galaxy server b) Decide if down/sub-sampling your inputs will meet your goals (see Seqtk tools).

Hope that helps!