I want to make “Human RNA-seq” and “Mouse RNA-seq” workflow and share with our group members.
How can I fix the input files? For example, I want to use Human fasta/gtf files for Human RNA-seq, and Mouse fasta/gtf files for Mouse RNA-seq.
When I make as a dataset, user can choose the fasta/gtf files, but I want to fix the files.
Since STAR in my local galaxy doesn’t have references for choice, I want to upload the fasta/gtf files.
Workflows do not include any data themselves. A workflow can reference certain data that is indexed on the server – in most cases that is just the reference genome using the database or “dbkey” for a fasta index – but the reference annotation is always supplied at runtime.
Now, RNA-STAR is a bit special. You could create a hybrid reference genome index that incorporates a reference annotation (just the gene bounds). But please be aware that if you do this, a seperate reference annotation would still need to be supplied by the user for when using other tools later (like downstream counting tools). It is usually best to just let the user get the reference annotation once, then use it for all steps, since this keeps the analysis internally consistent.
Why? A reference genome assembly (fasta) tends to be stable for a very long time, so it is worth pre-indexing since it will get a lot of use. A reference annotation file (GTF, GFF3) could be sourced from many other data providers and tends to change frequently but is still usually based on that original reference genome. There are some other technical reasons but that’s the current logic! It may change later but for now that is how it works.
If you are hosting your own server, you could create a Data Library and stage the reference data in there. It could be both files: the reference genome and the reference annotation, or it could be just the annotation and the genome is pre-indexed (this mitigates large/repetitive computational loads on the server). Maybe label the data in a way that makes it easy to match up. You could include notes in your workflow about this, or prepare a README that you host or link on the server.
If you want to add genome indexes to your server, we have some guides about this. You can incorporate pre-computed indexes and/or add your own custom indexes. Or you can let everything process at runtime.