Difficulty in getting SRA into Galalxy

Hi,

I’m trying to upload the following data set on to Galalxy: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89509

I select on the SRA selector and make a note of the SRR number and go to Get Data>Download and Extract Reads in BAM but i get the following notice once completed: 2could not display BAM file error: file was not sequenced defined (mod=‘rb’) is it a SAM/BAM format? "

Not sure what the problem is?

Thanks.

Mohammed

1 Like

Welcome, @mohammed!

Try downloading the data in fastq format using one of these tools. The first is simpler to use but the latter has more built-in grouping options. You could try both on a smaller subset of SRR accessions and review, if you do not understand the differences, then decide which to use for the entire data series (or individual accessions).

  • Download and Extract Reads in FASTA/Q format from NCBI SRA (Galaxy Version 2.10.4)
  • Faster Download and Extract Reads in FASTQ format from NCBI SRA (Galaxy Version 2.10.4)

In most cases, there is no reason to extract reads in bam format. The data would then only need to be converted to fastq after to use it with tools in Galaxy, effectively adding more steps to the analysis and consuming more quota space.

Hope that helps!

Thank you. Currently using: “Download and Extract Reads in FASTA/Q” format from NCBI SRA (Galaxy Version 2.10.4) tool so hopefully it works. Thanks again.

1 Like

Hi,

I’ve downloaded a small set of this data via “Download and Extract Reads in FASTA/Q…” but now how do convert this file into SAM/BAM? and do I use Bowtie2 to align these BAM files once generated? Not sure what the step wise plan should be when getting data from GEO>Download and Extract Reads in FASTA/Q>COnversion to BAM or SAM?>Bowtie>Feature counts etc.

Thanks,

Mohammed

Hi,

Forgot to add that the original seq platform: AB 5500 Genetic Analyzer (Mus musculus) for GEO.

Thanks,

Mohammed

@mohammed

The data represents single-end RNA-seq fastq reads. This tool will extract the fastqsanger (Sanger Phred+33) quality score scaled version of the data. These reads need to be mapped to produce a BAM.

Upstream QA/QC should be done (FastQC and Trimmomatic), then either HISAT2 or RNA-star can be used for the spliced mapping step.

Please review the “Transcriptomics” tutorials here for example workflows:

Thanks! I’ve tried but no luck. Seems that original fastq file is not the standard fastq format (letters). I see numbers via “eye symbol”. Is there any way of identifying what file type this is (e.g. @2_8_524_F3/1
T…223…012…000…2002.0230.3011.1233…102…113.2001.0221.122)? as it seems the original/host file is not been uploaded correctly? and i will need to request fastq files from authors? Does its matter if its SOLID seq?

Thanks,

Mohammed