Mapping with STAR Error

Hi,
I recieved an error when mapping my RNAseq with STAR. I used paired-end (as collection) and Cutadapt on collection reads.

The adapters are the Illumina Truseq; adapter Read-1 and Adapter read-2.

Please, how can I navigated past this problem?

See the errow message below.

An error occurred with this dataset:
format bedgraph database hg19

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /cvmfs/data.galaxyproject.org/managed/rnastar/2.7.4a/hg19/hg19/dataset_d34482b9-5123-4ac6-b22d-89a2c5588299_files --sjdbOverhang 149 --sjdbGTFfile /jetstream2/scratch/main/jobs/55067750/inputs/

1 Like

Hi @paulineikpa

Do the job logs have any more information? The banner at this website explains where to find those, or if you already cleared that away, see this → How to get faster help with your question

We also have tutorials for using trimming tools. Maybe something went wrong? Please see → https://training.galaxyproject.org/training-material/search2?query=quality

And, simple mapping examples are here → https://training.galaxyproject.org/training-material/search2?query=mapping

The final item you can try is mapping against the current human genome → hg38.

Let’s start there :slight_smile:

Thanks for your response.
I am still having trouble with the mapping and trying to figure out where I went wrong with cutadapt.
So , I still need help solving my problem.
I have attached additional information of my cutadapt and mapping to this message., hopingfully it shows something.




Hi @paulineikpa

Read QA
It looks like Cutadapt didn’t fail but I’m curious about what the tool changed scientifically in the reads. You could run FastQC both before and after the trimming step to review what changes were made. An example of that is here: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#quality-control

Mapping
The Gencode annotation is the right choice for hg38 … but did you try removing the header lines from the GTF file? Not always needed but sometimes is! See https://training.galaxyproject.org/training-material/faqs/galaxy/analysis_differential_expression_help.html

Please give those a try.

  • If the QA isn’t what you expected (end up with empty files), then you’ll need to adjust that tool’s parameters (trial and error, and tutorials have examples). You could also try Fastp instead.
  • If the STAR job fails again, either link back the job logs from that run, or post back screenshots of the logs. The stderr and stdout logs – whole thing since where the message is can different depending on where the failure happened. Also post a screenshot of the annotation file (just the “peek view” of the first 4-5 lines is enough – click on the dataset on that job information view and it will expand a bit).
  • You are asking STAR to use “feature = exon” in the parameters – that is in the third column of the GTF, and should be in there if sourced directly from the public web links, so I don’t think that part is a problem offhand. This is why I thought the reads were the problem. But you could also run STAR without annotation to see what happens. It could be a clue if it either works or doesn’t work in your testing matrix.

To help quicker next time, sharing the screenshots along with the history share link will let me review the other details without having to ask. You can always unshare once we are done. I’d explain what I looked at. Your choice :slight_smile:

Dear Jennifer Hillman-Jackson,

Thank for your response and all your help so far.

here is the link to my galaxy analysis: https://usegalaxy.org/u/pikpa/h/ecoli-infection-project-dec2023-part-1

Unfortunately, my mapping failed again and multiQC looks good after cutadapt.

This time,I used cutadapt to just trim the last 20 bases instead of my trimming my adapter sequence like I did in the last mapping.

Yet, the mapping failed again but this time I see that only one pair of the 4 failed causing the whole mapping to fail.

I have tried eveything I know, but i seriously believe I am missing something.

Thanks,
Pauline.

| jennaj
January 24 |

  • | - |

Hi @paulineikpa

Read QA
It looks like Cutadapt didn’t fail but I’m curious about what the tool changed scientifically in the reads. You could run FastQC both before and after the trimming step to review what changes were made. An example of that is here: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#quality-control

Mapping
The Gencode annotation is the right choice for hg38 … but did you try removing the header lines from the GTF file? Not always needed but sometimes is! See https://training.galaxyproject.org/training-material/faqs/galaxy/analysis_differential_expression_help.html

Please give those a try.

  • If the QA isn’t what you expected (end up with empty files), then you’ll need to adjust that tool’s parameters (trial and error, and tutorials have examples). You could also try Fastp instead.
  • If the STAR job fails again, either link back the job logs from that run, or post back screenshots of the logs. The stderr and stdout logs – whole thing since where the message is can different depending on where the failure happened. Also post a screenshot of the annotation file (just the “peek view” of the first 4-5 lines is enough – click on the dataset on that job information view and it will expand a bit).
  • You are asking STAR to use “feature = exon” in the parameters – that is in the third column of the GTF, and should be in there if sourced directly from the public web links, so I don’t think that part is a problem offhand. This is why I thought the reads were the problem. But you could also run STAR without annotation to see what happens. It could be a clue if it either works or doesn’t work in your testing matrix.

To help quicker next time, sharing the screenshots along with the history share link will let me review the other details without having to ask. You can always unshare once we are done. I’d explain what I looked at. Your choice :slight_smile:

Hi @paulineikpa

The second FastQC run didn’t have the input collection flattened before the run, so you only have a report for one end of each pair.

Here is the history I am working it. A copy of yours with a few tests still running. Galaxy

  1. Exact rerun of the failed pair
  2. Cleaned up GTF, then used that with the failed pair in a rerun
  3. FastQC/MultiQC plus Fastq info on the pair (pre-QA) just to show steps
  4. FastQC/MultiQC run on the post-Cutadapt reads (flattened)

One of those should let us know what is going on. I wait in the queue just like everyone else … so you might see the results before I will! Hopefully helps!