I’m having an issue mapping paired-end data when the data resides in a collection. I can upload 6 pairs of fastq files, make the collection, perform FastQC and Trimming. However, when I go to run minimap2, only 1 of the 6 pairs is successfully mapped; the other 5 pairs fail. They all have the same the error:
Could not display BAM file, error was:file has no sequences defined (mode=‘rb’) - is it SAM/BAM format? Consider opening with check_sq=False
I can map the 6 pairs individually, but I would much rather use collections. Why is this error specific to collected datasets?
Apart from that, it looks as if Samtools was not installed OR the resulting BAMs had no hits (are empty). So – this is a configuration or resource issue.
That same docs site has help about how to manage dependencies, custom configurations, scaling. Stable releases will now install Samtools when the instance is first started up when the defaults are used.
My apologies for the Galaxy version typo (20.01). You are right that the resulting BAM files are empty, but not for all of the paired-end sets in the collection. One of the paired sets within the collection always maps correctly, while the remaining paired-end sets generate empty BAM files. When the paired sets are process individually and not put into a collection, they map just fine. I will check out the links you sent. Thank you!
I asked our dev team for some advice. This may not be a technical server issue. Note: the options below assume that your server is not hosted publically, but if it is, you could generate a share link to the history with the odd results (inputs + outputs) and post that back (here, or in a direct message).
Things to check:
Are you using the most current version of the tool? If not, upgrade and see if that resolves the problem:
Map with minimap2 A fast pairwise aligner for genomic and spliced nucleotide sequences (Galaxy Version 2.17+galaxy2)
Is the paired-end dataset collection structured correctly? Does that match the collection input options set on the tool form? There are three different ways to run paired-end data through this tool using a collection, and two different collection types (interleaved paired fastq or distinct forward/reverse paired fastq).
The database logs, and the stderr or stdout report (on the Job Details form – the “i” info icon) also may give some clues. If you want to post back that content we may be able to help that way.
If you want to post some screenshots back that show the tool settings (form plus Job Details) and the dataset collection itself plus the included datasets in the history (will be hidden by default) we might be able to help more from that info, too.
The final option is to see if you can reproduce this behavior at a public Galaxy server (usegalaxy.org, usegalaxy.eu, usegalaxy.org.au). A collection with at least two paired-end elements would be enough. That would allow you to generate a share link to the history and allow for a quicker review – plus if the result is different (BAMs are not empty) – is another way to narrow down what might be going wrong.
I’ve tried reproducing your issue with three different versions of minimap2 (going back as far as version 2.12, but all of them worked just fine for me with a collection of 2 PE sequenced samples.
So I doubt this is a problem with the tool, but rather a problem with the collection structure as @jennaj guessed already.
Are you sure each PE dataset in the collection contains the fw and rv reads of one sample?
Thank you both for the feedback. I am trying on the public server today to see if the problem arises there. I suspect the problem is specific to my setup.
I agree, the problem is not specific to a version of minimap2, or minimap2 itself (BWA has the same issue for me). The issue arises when >1 alignments are executed at one time.
As I described above, when fastq files (e.g. SampleA_F1 and R1, SampleB_F1 and R1) were uploaded as a “list-of-pairs”, Only SampleB generated alignment data, while SampleA was empty. Not only are there no alignments for SampleA but, the stdout and stderr are empty as well.
To troubleshoot, I instead selected “paired” for the collection type, and uploaded each set individually (so SampleA is one collection and SampleB is another collection). If I wait for the minimap2 job on SampleA to finish prior to staring another minimap2 job, then both alignments work. But, if I start a minimap2 job for SampleB while minimap2 for SampleA is running, then SampleB generates an empty bam file and SampleA works.
Could it have something to do with the reference sequence? Can 2 jobs utilize the same reference sequence file at the same time?
This should work fine, in general. With minimap2, specifically, and also with BWA-MEM (if you are using a fasta genome from your history), both jobs will build separate index structures for the reference, which can require lots of memory.
So one potential issue could be that your setup tries to run both jobs on the same machine and that machine does not have enough memory for both. Maybe one of the two gets killed without writing anything to stdout/stderr. Pure speculation though since I don’t know anything about your setup.
Anyway, the place to look for more information would be at the level of Galaxy then, not at the tool level. You may want to look in Galaxy’s log, or in the terminal from which you run Galaxy if you’re not redirecting to a log file.
Mapping collections worked correctly on public galaxy…
I am using a built-in reference sequence. Do you know how much memory I need? (my virtualbox uses 20 of the 32 GB on my PC). Is there anything I should look for in the terminal? Where is the log file stored?
I think it is a memory issue. I dropped the allocated memory to my virtualbox from 20 to 10 GB and, instead of 1 alignment working, both failed. If anyone knows a way to decrease minimap memory usage, please let me know.
-INUM Load at most NUM target bases into RAM for indexing [4G]. If there are more than NUM bases in target.fa , minimap2 needs to read query.fa multiple times to map it against each batch of target sequences. NUM may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a multi-part index.
Sounds like it is what you’re looking for with the caveat mentioned in that last sentence.