fasta to fastq; fastsanger.gz to fastq; SRA to fastq

Hi,
Is it possible to convert fasta to fastq and fastsanger.gz to fastq?if yes please suggest the way. If not why not?(I am curious to know)
And I have been searching for the way to convert SRA files to fastq format and I (as mentioned here https://toolshed.g2.bx.psu.edu/) found software (http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software) to download for windows. I have downloaded but can’t able to install.
Can anyone please help with how to install and use it?

Thank you
YKV

1 Like

Hello @YKV

In Galaxy:

SRA fastq data can be extracted directly with a tool like: Faster Download and Extract Reads in FASTQ format from NCBI SRA (Galaxy Version 2.10.4). Either from NCBI or from an SRA archive already in your history (the same tool does both). This version of the tool sorts data into collections.

If you don’t want to work with the data in collections, or want to create collections after, then use the original version of the tool: Download and Extract Reads in FASTA/Q format from NCBI SRA (Galaxy Version 2.10.4).

Yes. Try this tool. If you do not supply quality scores, default scores will be created: Combine FASTA and QUAL into FASTQ (Galaxy Version 1.1.1)

Yes. Click on the pencil icon for the dataset, go into the Edit Attributes “Convert” tab, and uncompress the file. The resulting datatype will be fastqsanger (if the data actually has that quality score encoding). In most cases, Galaxy will require fastqsanger or fastqsanger.gz inputs. But if you really want to just assign the datatype fastq for some reason, go into the Edit Attributes “Datatype” tab and directly change it.

Please contact NCBI if you are having trouble with software they release when used locally, including the SRA Toolkit. However, it shouldn’t be needed if you plan to work in Galaxy. Most of these tools are incorporated already. Or, you can obtain/manipulate the data in Galaxy, then download those results for other uses.

Thanks!

Hi @jennaj
Thank you for point to point explanation. I have tried all above mentioned options before.
I am able to change data types and download through NCBI_SRA accession number( Tool: Download NCBI_SRA). But when I have to use them I am having trouble
For example
When I used tool like MAKER
in sub columns have options to choose file either by clicking on scroll bar or folder option (which is next to it). If I choose the file by clicking on scroll bar there is no problem to execute the program but if I choose from folder option then there is indication of “unavailable” like below. (even if I execute there was an error in the result)
image
The only difference from those two selections accessibility, as far as I noticed is the format. So I have changed datatype, the file is available in scroll bar and also then there was no indication of unavailability but error in result.
This is the reason I wanted to change datatype.
And when it comes to the tool FASTER DOWNLOAD NCBI_SRA it is taking way too much time than usual to extract the files(this is not a complaint, I am saying in general to know why it is than any other ways) even after hours it is saying as a list with no display just blank in all the output (I have tried to download on PC in SRA format and upload on galaxy to see the sequence but didn’t work). So I had to use download NCBI_SRA tool and there was this format issue.
NOTE: If I am not wrong, not only with maker but also wherever I have to use more than one file, both file format should be either same or only fastq. otherwise either it wont accept or error in result.

Thank you

1 Like

Hi @YKV

Let’s try to break it down, maybe it will help.

It isn’t clear why you would need to change/modify the datatype at this point.

If the tool is installed in your own Galaxy, this is the version that you want to use (and is what is installed at public Galaxy servers): Galaxy | Tool Shed

What kind of error?

Changing only the datatype assignment is unlikely to resolve errors – unless the original datatype was incorrectly assigned (somewhat difficult to do – and would be a direct action).

Changing/correcting the datatype will help tool forms to recognize data, but if the format doesn’t match the actual type assigned, expect all sorts of problems.

Allowing Galaxy to detect and assign datatypes will avoid this kind of issue in most cases.

The screenshot looks as if you clicked on the “rerun” icon for a prior tool run result (that originally used an input “dataset 10”), but “dataset 10” is not an active dataset in the current history. There are a few ways to reach this kind of view/data organization.

To execute a tool it should be started up (tool form refreshed) while the active history contains the data you want to input to that tool (or, the data is contained in a Data Library that you have access to).

If you are switching between histories, or more likely – doing work in multiple tabs, what is displayed in the far right history panel could be browser cached data, not the current active history. Try using the history refresh (double circle) icon at the top of the history panel – this will always bring up the actual currently active history.

Every tool form, and each data selection option on it, has an independent requirement for the exected input type. However, for any specific data selection on a tool form, yes, the inputs (one or more) would all be of the same datatype/format in the vast majority of cases because that is the required (and expected) input.

If you are ever not sure of what datatype(s) a particular select/input field is filtering on, one simple way to check is to create a new empty history (the + icon at the top of a history panel) then relaunch the tool. Since the history is empty, no data will be automatically selected (that matches the expected type(s)) and what kind of data the tool form accepts for input is revealed.

Example screenshot: