Convert from BAM to fastq

im using Convert from Bam to FastQ tool. but i Run put the out put on hisat2 , why /?

1 Like

Hi @amir – Would you explain your steps with more detail? The problem is not clear.

i want after i converted my data to fastq,use it on star. but “star” does not read the “convert bam to fastq data”.

1 Like

as you can see, after i used Convert from bam to Fastq, hisat2 does not recognize the result data, please help me

1 Like

The data needs to be in fastqsanger format. It looks like this tool either isn’t detecting the proper format (datatype) or the data is not in fastqsanger format to start with.

Please see this FAQs for how to check the quality score scaling and adjust it if needed. https://galaxyproject.org/support/#getting-inputs-right

Involves running FastQC then Fastq Groomer (if needed). If FastQC reports the data is from an Illumina 1.8+ sequencing protocol, then you can just assign fastqsanger directly. But don’t just assign that if the data is < 1.8 – run it through the groomer tool with the proper settings or you won’t get the correct results with the data from downstream tools.

but in one of galaxy turorials which name is “Differential abundance testing of small RNAs” also it does not working, even when i used the own tutorial data. exactly on using this tool(Convert from bam to fastq). so the tutrial data is on wrong datatype too ? or the tool is a problem. because i checked my dara before. and it is from +1.8 illumina sequencer and it is fastqsangar.gz

1 Like

I’m testing the tool and see the problem. Not quite finished yet, more feedback once done. Thanks for following up on this!!!

Meanwhile, try assigning the fastqsanger datatype to your data. Do this by clicking on the pencil icon per-fastq-dataset to reach the Edit-Attributes > Datatypes page. Directly assign the correct datatype. Re-detecting will not work and that is another part of what I am reviewing.

This might be server-specific to Galaxy Main https://usegalaxy.org. We have the Galaxy 19.05 pre-release loaded and are flushing out issues. Part of what I’ll also be looking at is whether other public servers still running Galaxy 19.01 have the same problem or not.

The tutorial should be fine otherwise. It is notated as containing all the necessary tools for the hands-on portion and the checks are very specific. That said, this is a newer tutorial and depending on what the exact the problem is, and the best solution, if it does need to be updated I will follow up with that as well.

Finished testing.

If the input is a single-end mapped BAM, the output will be given the datatype fastq. I’ll see if we can get that changed to assign fastqsanger directly if the data actually matches that datatype (not something that can be assumed). But you can use the Edit-Attributes > Datatypes > Redetect datatype function to get this reassigned to the fastqsanger datatype. Then downstream tools will recognize it.

If the input is paired-end mapped BAM, in the default format (coordinate sorted), the tool will behave oddly. Some reads will not have quality scores, making the file type invalid. The tool form does state to input queryname sorted BAM (or SAM). If running this tool on this type of data, it needs to be queryname sorted first, or expect odd and non-useable data to be output. Use Samtools sort. If SortSam (the picard version) is used, the output will be a SAM dataset, which takes up more quota space but will also work. This tool could also be improved to “implicitly convert” coordinate sorted BAMs to queryname sorted BAMs – that would be a different change, but I’ll also make the request.

I’ll also let the tutorial authors know that this extra step is needed – at least until/if the bam-to-fastq tool is modified to output fastqsanger directly (again, when the data actually matches that datatype, it isn’t a sure thing that can be assumed).

Link to where I am asking the GTN about their thoughts on the best way to address this (tool change vs tutorial change): https://gitter.im/Galaxy-Training-Network/Lobby?at=5ce320aa509b1035c78268b5

Please feel free to comment there, or send in your thoughts/link to this discussion at the feedback form at the bottom of the tutorial. You can report more than one thing (not just this) – we like to know how using these goes :wink:

So your saying i change the fastq to fastqsangar.gz until Galaxy fix the problem ? because some times users have 100 sample and it is impossible to do this one by one !!

1 Like

Agree. I’ve asked for some feedback here: https://gitter.im/Galaxy-Training-Network/Lobby?at=5ce4307cb313d7231404f1b4

@bjoern.gruening’s lab hosts the Galaxy EU server and is part of the GTN (training) and IUC group (developers that wrote this tool wrapper) … and wears many other hats :cowboy_hat_face: He’ll help to sort out the issues. You’ll also see my comments there about potential solutions (scroll up a bit in the Gitter chat). He is based in Germany (different time zone) so some patience please if the reply doesn’t come until tomorrow.

Appreciate the feedback!! We’ll want this to work without having to “click on hundreds of datasets”, too. A more practical solution is needed.

As an aside, the output will be in uncompressed fastqsanger from the bamtools BAM to Fastq tool (not fastqsanger.gz). At least for now.

Update: Ticket opened for a possible solution: https://github.com/galaxyproject/galaxy/issues/8013

Includes this discussion and the chat, plus I linked in prior enhancement requests for the same and related functionality.

thanks @amir!!

thank you for your help @jennaj

@amir as a workaround until we enhanced the Tool or Galaxy, I would like to point you to the workflow editor. Here you can include after every single step a “post job action” (PJA). One of those PJA is change filetype. You could create a workflow with just this single tool and add a PJA to change fastq to fastqsanger. You can then run this on multiple input files.

Hope that helps,
Bjoern

1 Like

@bjoern thank a lot. its gona be big help until galaxy solve that problem

Just a problem. when i run the work flow, every thing working well but i cant select multiple data to run in edited workflow. So that make this job with same problem

You can select multiple input files in the run-workflow form.

if you mean here, i dont see any “multiple dataset”.and when i click on browsing, still i can select one of them

1 Like

@amir

Add an “inputs” to your workflow. Then you can pick one or multiple datasets. This is a good idea anyway. Running a workflow without distinct “inputs” can lead to other problems.

The first choice is probably the one you want to use. The other is for collections. And the third (not shown here, but you’ll see it in the editor) is a new function that allows tool parameters to be defined in an input dataset.

workflow-parameter-inputs-choices