What do these FastQC results mean?

Dear Community,

I need a bit of advice.

I got RNAseq data from a collegue from a few years ago, that they never used. It is interesting for my project so wanted to have a look.

Those are 7 samples from healthy humans. They collected saliva samples. Library was done with QuantSeq 3‘ mRNA-Seq Library Prep Kit FWD with UDI 12 nt Set B1 (Lexogen). They used an mRNA specific kit.
I had a look at the raw data with Fastqc and it looked weird to me. I hoped that trimming with Trimmomatic might fix it but afterwards it looked the same. Only the N-content was fixed.
Can someone please explaine to me what is wrong with the data and how this happend? :sweat_smile:
How do I best clean this?
Here are some pictures :





Thank you very much for your advice! :smiling_cat_with_heart_eyes:

Hi @Sabsida
You can find detailed description for FastQC report on the web, for example:

Hope that helps.

Kind regards,
Igor

Not really, I can interpret normal FAstQC results with no issues. And I have anlyzed sevreral RNA-Seqs wih Galaxy. However I have never seen this before and have not found an adequate example on the internet.
I don’t understand the weird TA sequence in the per base Sequence content. And i don’t understand why i can’t remove this unusally high adapter contamination.
I was hoping for some specifc help with what I am seeing´!

Hi @Sabsida

You could try a different trimming tool? You might need to input the custom adaptors used.

We have workflows that will enable you to review the FastQC and trimming tool reports all together.

More details are in this prior topic


So far, you know that this kit was used.

and that the reads ended up short, with adaptor detected at the end. This suggests that trimming with the automatic adaptors that Trimmomatic uses were not a match, and the trimming failed. You haven’t tried using CutAdapt or fastp yet. Both of those have an optional report that you can send to MultiQC. The example workflow above has an example that you can use as a template.

We can’t offer too much scientific advice at this forum (as @igor was clarifying) but we can help you to use the different tools in order to get all the results you might need for your own scientific review.

What to do:

  1. Start with the raw reads
  2. Determine what the original sequencing protocol was
  3. Applying the correct trimming
  4. Run FastQC again on the result
  5. Review the reports all together in MultiQC, make inferences, rerun until the data seems correct, then try the downstream steps.
  6. NOTE: When mapping to the human genome, you might be able to detect additional issues with the reads, in particular by reviewing the BAM inside of a genome browser like UCSC.

Hope this helps!

Hi @Sabsida,
As @jennaj suggested, check reads in BAM file and maybe start with raw data. For example, check CIGAR string and see if 5’ parts are soft-clipped or not. I am not familiar with the kit used for the library, but I am curious if the TA spike can be linked to UDI? If it is indeed the case, 5’ ends will be soft-clipped. If UDI is still present, demultiplex the data.
Kind regards,
Igor

1 Like