What do these FastQC results mean?

Sabsida · March 30, 2025, 6:41am

Dear Community,

I need a bit of advice.

I got RNAseq data from a collegue from a few years ago, that they never used. It is interesting for my project so wanted to have a look.

Those are 7 samples from healthy humans. They collected saliva samples. Library was done with QuantSeq 3‘ mRNA-Seq Library Prep Kit FWD with UDI 12 nt Set B1 (Lexogen). They used an mRNA specific kit.
I had a look at the raw data with Fastqc and it looked weird to me. I hoped that trimming with Trimmomatic might fix it but afterwards it looked the same. Only the N-content was fixed.
Can someone please explaine to me what is wrong with the data and how this happend?
How do I best clean this?
Here are some pictures :

Thank you very much for your advice!

igor · March 31, 2025, 12:41am

Hi @Sabsida
You can find detailed description for FastQC report on the web, for example:

Hope that helps.

Kind regards,
Igor

Sabsida · March 31, 2025, 4:34am

Not really, I can interpret normal FAstQC results with no issues. And I have anlyzed sevreral RNA-Seqs wih Galaxy. However I have never seen this before and have not found an adequate example on the internet.
I don’t understand the weird TA sequence in the per base Sequence content. And i don’t understand why i can’t remove this unusally high adapter contamination.
I was hoping for some specifc help with what I am seeing´!

jennaj · March 31, 2025, 6:03pm

Hi @Sabsida

You could try a different trimming tool? You might need to input the custom adaptors used.

We have workflows that will enable you to review the FastQC and trimming tool reports all together.

More details are in this prior topic

multQC issue and guidance?

How to get from the SRA “paired list” output collection shape to a simple “list” collection shape?

Use the Faster Download and Extract Reads in FASTQ format from NCBI SRA tool

Then, when running FastQC on a paired end collection, it is useful to first apply the Collection Operations -> Flatten Collection tool. This assigns a unique sample identifier to each end of the pair – giving the sequences a distinct collection identifier. MultiQC will then provide the results for each sequence end, and summarize correctly.

Single dataset QA example

Hands-on: Quality Control / Quality Control / Sequence analysis

Multiple paired end dataset QA example

Hands-on: Reference-based RNA-Seq data analysis / Reference-based RNA-Seq data analysis / Transcriptomics (see the initial steps)

More examples → Galaxy Training!

A demonstration is in the Quality Control Q20-L20 public workflow.
How to find it →

Trimming with different parameters/tools give vastly different end results.

Search here GalaxyProject.org → GTN Pan-Galactic Workflow Search

With the term Quality Control Q20-L20

So far, you know that this kit was used.

QuantSeq 3’ mRNA-Seq V2 Library Prep Kit FWD with UDI | Lexogen

and that the reads ended up short, with adaptor detected at the end. This suggests that trimming with the automatic adaptors that Trimmomatic uses were not a match, and the trimming failed. You haven’t tried using CutAdapt or fastp yet. Both of those have an optional report that you can send to MultiQC. The example workflow above has an example that you can use as a template.

We can’t offer too much scientific advice at this forum (as @igor was clarifying) but we can help you to use the different tools in order to get all the results you might need for your own scientific review.

What to do:

Start with the raw reads
Determine what the original sequencing protocol was
Applying the correct trimming
Run FastQC again on the result
Review the reports all together in MultiQC, make inferences, rerun until the data seems correct, then try the downstream steps.
NOTE: When mapping to the human genome, you might be able to detect additional issues with the reads, in particular by reviewing the BAM inside of a genome browser like UCSC.

Hope this helps!

igor · April 1, 2025, 12:11am

Hi @Sabsida,
As @jennaj suggested, check reads in BAM file and maybe start with raw data. For example, check CIGAR string and see if 5’ parts are soft-clipped or not. I am not familiar with the kit used for the library, but I am curious if the TA spike can be linked to UDI? If it is indeed the case, 5’ ends will be soft-clipped. If UDI is still present, demultiplex the data.
Kind regards,
Igor

Topic		Replies	Views
What is the use of QC reports (FastQC and MultiQC)? quality-control	1	380	January 29, 2024
RNASeq analysis, FastQC usegalaxy.org support quality-control	0	552	April 7, 2019
choosing the right QC option usegalaxy.org support filter , cutadapt , quality-control	1	760	November 23, 2022
Problems with the tutorial data gtn-tutorial , quality-control	4	1712	December 6, 2018
RNA Seq Analysis - Trimming, FastQC usegalaxy.eu support quality-control	1	807	April 5, 2024

What do these FastQC results mean?

Related topics