RNA STAR error in the log output file

I did the mapping using the star tool of 6 samples .only 3 samples give me a good result whereas the other three ones didn’t give anything. I tried to change the Length of the genomic sequence around annotated junctions (--sjdbOverhang): 577 since I have a read length of 578 but this is the output of the star tool

Hi @maram_Nh
Have you done QC, e.g., FastQC, for three “unmapped” samples? What is the read length distribution?
Kind regards,
Igor

the read length distribution is 301pb

I already set the Length of the genomic sequence around annotated junctions to 300pb and still the same problem persist and also I tried the default value to 100pb but it is the same problem. I don’t know what to do .

Can you share the history, so I can check what is going on. It is confused: the STAR summary says reads are too short, while you says reads are 301 nt. Also, STAR summary shows read length as 578, I guess this is based on genomic positions, so reads were mapped through introns. Maybe it is job’s setting.

Have you tried other tools, for example, HiSAT2?

yes i tried it and it didn’t give me any relevant results

I checked the trimmed file of these three samples and it turns out that they didn’t remove either the adapter nor the low quality reads. So I redo the trimming using trimmomatic tool on galaxy and I redo the alignment using rna star and still the same problem.

I notice, average mapped length is 18 bp. Are you sure the reads and the genome assembly are from the same species?

Hi @maram_Nh
Maybe share the history via URL, so I can check what is going on. History options (three bars icon at the top right corner of the history panel) > Share or publish > In the middle window Make history accessible > Copy the URL and paste into reply. If the history is big, copy datasets for a single sample into a new history and share it.
Kind regards,
Igor

thank you so much for the help. I actually got wronged with accession number of the sample.Therefore the mapping step has given such results.
But now I have another question I did the mapping of the samples some of them have a percentage of mapped reads (for the 3 infected samples )from 57% to 62% and the 3 remaining (which are non infected) have a mapping percentage from 86% to 89%.
My question is as follow: does the 3 infected samples that have the percentage mentionned above can cause problem later on in the analysis ?

Hi @maram_Nh
I am glad you solved the mapping issue.
As for the difference in % of the mapped reads: the question is not directly related to Galaxy. Generally, you cannot make a serious conclusion on a single metrics. For example, infected samples may contain “foreign” RNA, so, smaller % of mapped reads in the infected samples might be a good sign, plus you got consistency within both conditions for this memtrics, which is also a good sign. You may need to dig more and look at other metrics, for example, check PCA plots. Maybe check unmapped reads, eg run FastQC on unmapped reads and check the abundand unmapped reads, like Blastn to the “infection agennt” genome. Generally, this type of questions requires deep knowledge of the project, but, as I said, this is somewhat outside of issues discussed on this forum. I am sorry I am of no help here.
Kind regards,
Igor

1 Like