I downloaded some SRA accessions (which were downloaded as a collection; subsequent processing outputs from Trimmomatic are also in collections) using fasterq-dump, as well as a reference genome using NCBI Datasets Genomes (which got downloaded as a collection of one fasta dataset). I want to use the same reference genome for all three library inputs. When I try to run HISAT2 to map the reads to the genome, I get an error: “Received 3 inputs for ‘library|input_1’ and 1 inputs for ‘reference_genome|history_item’, these should be of equal length.” How can I tell HISAT2 to use the same genome for all three library inputs? Do I need to just download the genome to my computer and re-upload it as a single dataset? Thanks in advance for any advice!
Welcome, @brittb
To get the single file out of the collection folder, use this tool. Don’t attempt to run this on a collection with multiple files unless that is actually what you want!
- Collapse Collection into single dataset in order of the collection
- After this finishes, you may need to uncompress your fasta file, use the pencil-icon for that step.
There is another NCBI data retrieval tool that you can use next time for a genome. It doesn’t organize the output into a collection from the start. This is not a good choice for fastq data but is fine for fasta.
- Download and Extract Reads in FASTQ format from NCBI SRA
- Or, you can copy and paste the URL from NCBI into the Upload tool (use all defaults). This is likely how most people get this data. Go here https://www.ncbi.nlm.nih.gov/ then paste in the accession, navigate to the genome, and get the fasta and the annotation at the same time (you will probably need the annotation in later steps, yes?).
This guide explains how to set up your reference data for transcriptomics analysis. → FAQ: Extended Help for Differential Expression Analysis Tools
For how to set options with and use HISAT2, scroll down to the bottom of the tool form to find the tutorials. You will need to set some options to get the right type of BAM file – if you don’t, you will get errors later and will need to map again. There is a lot of discussion about that at this forum under the tag hisat2 or the target downstream tool name and search will find those.
Hope this helps!
Thank you! I re-downloaded the genome as a single file, and HISAT2 is working now. Simple fix, thanks for the tips on other ways to download a genome!