I am analyzing 3′ Tag RNA-seq data using UMIs. I used UMI-tools extract or fastp to extract the UMIs from the read sequence and append them to the read name. However, during the mapping step with STAR, the extracted UMI is removed from the read name, causing the deduplication tool to fail.
I have tried multiple solutions, but none have resolved the issue.
Would you like to share your history? How to get faster help with your question. With the example we can help to suggest modifications to the protocol to avoid duplicated read naming.
Let’s start there! It is hard to guess more for this one.
I see the problem now. Yes, RNA-Star is removing everything after the / delimiter in the sequence lines! This Star tool has an option to control for this (a “keep everything” option) but it is not available in the Galaxy wrapper (yet!). I’ll make an enhancement request but you don’t need to wait for that cycle to complete.
Instead, I’ll suggest building the identifier reformatting into the very start of your workflow when the accessions are extracted from SRA and written into the history.
I created a small test here with my suggestions below. Technical details are sometimes better with an example! This worked already but a clean example seemed good too!
No special options! The UMI is not being interpreted or added as a tag to the BAM output.
For UMI tags added to the BAM, consider using RNA StarSolo instead.
4
UMItools → Umi extract method
Any in the suite can be configured the same way to interpret a UMI in the sequence names. Your sequence names have an _ underscore between the label and the UMI string (from fastp).
Click on the i-info icon for one of the red datasets.
Scroll down into the detailed Tool Standard Output (stdout) log. These are technical/processing errors discovered by the Galaxy wrapper.
Also see the Tool Standard Error (stderr). This where to find reports about the processing details discovered by the underlying tool. Examples are content and parameter issues.
These sections expand if you click on them!
If the stderr has content, go into the Error tab and see if the Galaxy Wizard can describe what is happening.
The Wizard did answer this one correctly (there wasn’t a fastq file to sort into a collection) but the message could be clearer about why and what to do, so I’m glad you asked! We’ll get that tuned up!
Example of what to review. Whenever this is seen, the problem is either with the accessions (do not exist) or the SRA service itself.
stdout
Downloading accession: SRR19543607…
Failed to call external services.
Prefetch attempt 1 of 3 exited with code 1
Failed to call external services.
Prefetch attempt 2 of 3 exited with code 1
Failed to call external services.
Prefetch attempt 3 of 3 exited with code 1
extra whitespace (tabs, lines) will be stripped by our wrapper but you could also clean it up with a tool like Convert delimiters to TAB followed by Cut to isolate a single column
If this is all correct or this same query worked previously, you can proceed directly to trying again! Waiting 10-15 minutes is usually enough.
Please give this a try and see how it works now!
Note: I do see a problem with the final tool in my testing history above. Now that the tool is finding UMIs, it needs to know how to group them. How to group is a scientific decision for the protocol. I had used the exact same parameter as you were using, and the log message is stating that a different parameter combination is needed. I would try the suggestion! Once it works, you can modify a workflow to suit you goals (using my template or extract your own!).