Convert from BAM to fastq

amir · May 18, 2019, 7:16am

im using Convert from Bam to FastQ tool. but i Run put the out put on hisat2 , why /?

jennaj · May 18, 2019, 9:22pm

Hi @amir – Would you explain your steps with more detail? The problem is not clear.

amir · May 19, 2019, 4:02am

i want after i converted my data to fastq,use it on star. but “star” does not read the “convert bam to fastq data”.

amir · May 20, 2019, 11:38am

as you can see, after i used Convert from bam to Fastq, hisat2 does not recognize the result data, please help me

jennaj · May 20, 2019, 6:37pm

The data needs to be in fastqsanger format. It looks like this tool either isn’t detecting the proper format (datatype) or the data is not in fastqsanger format to start with.

Please see this FAQs for how to check the quality score scaling and adjust it if needed. https://galaxyproject.org/support/#getting-inputs-right

How to format fastq data for tools that require .fastqsanger format?

Involves running FastQC then Fastq Groomer (if needed). If FastQC reports the data is from an Illumina 1.8+ sequencing protocol, then you can just assign fastqsanger directly. But don’t just assign that if the data is < 1.8 – run it through the groomer tool with the proper settings or you won’t get the correct results with the data from downstream tools.

amir · May 20, 2019, 7:09pm

but in one of galaxy turorials which name is “Differential abundance testing of small RNAs” also it does not working, even when i used the own tutorial data. exactly on using this tool(Convert from bam to fastq). so the tutrial data is on wrong datatype too ? or the tool is a problem. because i checked my dara before. and it is from +1.8 illumina sequencer and it is fastqsangar.gz

jennaj · May 20, 2019, 8:39pm

I’m testing the tool and see the problem. Not quite finished yet, more feedback once done. Thanks for following up on this!!!

Meanwhile, try assigning the fastqsanger datatype to your data. Do this by clicking on the pencil icon per-fastq-dataset to reach the Edit-Attributes > Datatypes page. Directly assign the correct datatype. Re-detecting will not work and that is another part of what I am reviewing.

This might be server-specific to Galaxy Main https://usegalaxy.org. We have the Galaxy 19.05 pre-release loaded and are flushing out issues. Part of what I’ll also be looking at is whether other public servers still running Galaxy 19.01 have the same problem or not.

The tutorial should be fine otherwise. It is notated as containing all the necessary tools for the hands-on portion and the checks are very specific. That said, this is a newer tutorial and depending on what the exact the problem is, and the best solution, if it does need to be updated I will follow up with that as well.

jennaj · May 20, 2019, 9:40pm

Finished testing.

If the input is a single-end mapped BAM, the output will be given the datatype fastq. I’ll see if we can get that changed to assign fastqsanger directly if the data actually matches that datatype (not something that can be assumed). But you can use the Edit-Attributes > Datatypes > Redetect datatype function to get this reassigned to the fastqsanger datatype. Then downstream tools will recognize it.

If the input is paired-end mapped BAM, in the default format (coordinate sorted), the tool will behave oddly. Some reads will not have quality scores, making the file type invalid. The tool form does state to input queryname sorted BAM (or SAM). If running this tool on this type of data, it needs to be queryname sorted first, or expect odd and non-useable data to be output. Use Samtools sort. If SortSam (the picard version) is used, the output will be a SAM dataset, which takes up more quota space but will also work. This tool could also be improved to “implicitly convert” coordinate sorted BAMs to queryname sorted BAMs – that would be a different change, but I’ll also make the request.

I’ll also let the tutorial authors know that this extra step is needed – at least until/if the bam-to-fastq tool is modified to output fastqsanger directly (again, when the data actually matches that datatype, it isn’t a sure thing that can be assumed).

jennaj · May 20, 2019, 9:50pm

Link to where I am asking the GTN about their thoughts on the best way to address this (tool change vs tutorial change): https://gitter.im/Galaxy-Training-Network/Lobby?at=5ce320aa509b1035c78268b5

Please feel free to comment there, or send in your thoughts/link to this discussion at the feedback form at the bottom of the tutorial. You can report more than one thing (not just this) – we like to know how using these goes

amir · May 21, 2019, 5:31am

So your saying i change the fastq to fastqsangar.gz until Galaxy fix the problem ? because some times users have 100 sample and it is impossible to do this one by one !!

jennaj · May 21, 2019, 5:19pm

Agree. I’ve asked for some feedback here: https://gitter.im/Galaxy-Training-Network/Lobby?at=5ce4307cb313d7231404f1b4

@bjoern.gruening’s lab hosts the Galaxy EU server and is part of the GTN (training) and IUC group (developers that wrote this tool wrapper) … and wears many other hats He’ll help to sort out the issues. You’ll also see my comments there about potential solutions (scroll up a bit in the Gitter chat). He is based in Germany (different time zone) so some patience please if the reply doesn’t come until tomorrow.

Appreciate the feedback!! We’ll want this to work without having to “click on hundreds of datasets”, too. A more practical solution is needed.

As an aside, the output will be in uncompressed fastqsanger from the bamtools BAM to Fastq tool (not fastqsanger.gz). At least for now.

jennaj · May 21, 2019, 5:48pm

Update: Ticket opened for a possible solution: https://github.com/galaxyproject/galaxy/issues/8013

Includes this discussion and the chat, plus I linked in prior enhancement requests for the same and related functionality.

thanks @amir!!

amir · May 22, 2019, 5:20am

thank you for your help @jennaj

bjoern.gruening · May 22, 2019, 3:39pm

@amir as a workaround until we enhanced the Tool or Galaxy, I would like to point you to the workflow editor. Here you can include after every single step a “post job action” (PJA). One of those PJA is change filetype. You could create a workflow with just this single tool and add a PJA to change fastq to fastqsanger. You can then run this on multiple input files.

Hope that helps,
Bjoern

amir · May 23, 2019, 5:37am

@bjoern thank a lot. its gona be big help until galaxy solve that problem

amir · May 29, 2019, 4:43pm

Just a problem. when i run the work flow, every thing working well but i cant select multiple data to run in edited workflow. So that make this job with same problem

bjoern.gruening · May 29, 2019, 4:54pm

You can select multiple input files in the run-workflow form.

amir · May 29, 2019, 5:12pm

if you mean here, i dont see any “multiple dataset”.and when i click on browsing, still i can select one of them

jennaj · May 29, 2019, 6:51pm

@amir

Add an “inputs” to your workflow. Then you can pick one or multiple datasets. This is a good idea anyway. Running a workflow without distinct “inputs” can lead to other problems.

The first choice is probably the one you want to use. The other is for collections. And the third (not shown here, but you’ll see it in the editor) is a new function that allows tool parameters to be defined in an input dataset.

workflow-parameter-inputs-choices

Topic		Replies	Views
Galaxy tutorial "Differential abundance testing of small RNAs" corrupted usegalaxy.eu support server-admin , workflow , tool-dev	7	816	July 13, 2020
How to convert fastq.gz to fastqsanger.gz usegalaxy.org support upload , fastqsanger , epigenetics , quality-control	4	1134	August 24, 2023
FASTQ Groomer, fastq.qz and fastqsanger upload , fastqsanger	4	77	October 23, 2024
How to convert fastqsanger to fasta usegalaxy.org support download , fastqgz , fastqsanger	3	5352	April 11, 2019
fasta to fastq; fastsanger.gz to fastq; SRA to fastq ncbi , sra , fastqsanger , quality-control	3	5718	February 11, 2020

Convert from BAM to fastq

Related topics