Usage help -- single end versus paired end BAM options

I’m a very new user and so I’m following this video exactly (I have samples from two mice, so it’s even the same built-in genome as what’s used in the video, mm10):

When I get to the featureCounts section (15:01 in the video), it gives me the following error:

An error occurred with this dataset:

format tabular, database mm10

========== _____ _ _ ____ _____ ______ _____ ===== / __| | | | _ | __ | | /\ | __ \ ===== | ( | | | | |) | |) | | / \ | | | | ==== _ | | | | _ <

But the database is from the built-in featureCounts, so there’s really nothing that I can do about this, as far as I can tell, because I can’t change the mm10 database

Could anyone tell me what the real problem here is?

Hi @Heinrich_Gompf

Where are you working? I’m wondering if the built-in indexes are not available at that server or similiar. They are active at the usegalaxy.* servers if you want to move over to one and try it there.

If that is not enough to help, what is the full content on the job information view for your failed job? The tool settings, inputs expanded, and logs expanded please. Copy/paste or shared link is good.

Also – updated tutorials can be found here: Galaxy Training! transcriptomics

Thanks for the reply. I’m so new to this, I should have mentioned that I was on the use site and the mm10 worked fine for the HISAT2 step right before the featureCounts (I’m going through the video step-by-step to familiarize myself with everything). This is the URL for the history of the project:

The original files and the trimmomatic files have been purged because the system was telling me I was running out of memory, but the more recent steps are there. Does that help you find things, or is there more I should post?

1 Like

HI @Heinrich_Gompf

Two items:

  1. When you state jobs are failing with an “out of memory error”, this is how to interpret that.

This job was terminated because it used more memory than it was allocated.

The error means that there is an input or parameter problem to fix, or that the job is actually too large to process at the public service. The first is more common.

This and similar errors are described here.

The solution is to double check your inputs first. And, if the data is actually too “large” computationally, to pick a different method or consider using a scaled up private Galaxy server. The public computational resources are considerable but not appropriate for all use cases. For any situation, more available data storage space in your account is completely unrelated.

If you are ever not sure how to use a tool, Help for tools is directly on the tool form. Scroll down on any to review those resources. If a tool is included in a tutorial, you’ll find these links down in this section too.

  1. What I noticed on the job details view (the “i” icon within any dataset):

Screenshot 1 (inputs, parameters)

Screenshot 2 (logs)

  • The input BAM is on the larger side, so some tools may not be able to process it.

  • Filtering the BAM can be one solution. Remove any unmapped, and consider filtering for proper pairs above some minimum mapQ (20 is common). This is like a pre-filter – removing content that wouldn’t be considered anyway by the downstream tool – in an attempt to make the input “smaller” and less complicated for some tool to process.

  • Your inputs for this data were from a paired-end mapping but the option on the form was set to single end. This confused the tool, and the error message in the logs aligns with that guess.

I would suggest correcting that setting first, then to consider filtering if needed.

Hope that helps!

Thank you so much! That helped and it worked now. I was following the video too closely, and this prompted me to look into exactly what I’m doing more closely! Thanks!

1 Like