How to create a workflow that uses specific files.

amufaamo · November 28, 2024, 12:54am

I want to make “Human RNA-seq” and “Mouse RNA-seq” workflow and share with our group members.
How can I fix the input files? For example, I want to use Human fasta/gtf files for Human RNA-seq, and Mouse fasta/gtf files for Mouse RNA-seq.

When I make as a dataset, user can choose the fasta/gtf files, but I want to fix the files.

Since STAR in my local galaxy doesn’t have references for choice, I want to upload the fasta/gtf files.

jennaj · November 28, 2024, 3:28am

Hi @amufaamo

Workflows do not include any data themselves. A workflow can reference certain data that is indexed on the server – in most cases that is just the reference genome using the database or “dbkey” for a fasta index – but the reference annotation is always supplied at runtime.

Now, RNA-STAR is a bit special. You could create a hybrid reference genome index that incorporates a reference annotation (just the gene bounds). But please be aware that if you do this, a seperate reference annotation would still need to be supplied by the user for when using other tools later (like downstream counting tools). It is usually best to just let the user get the reference annotation once, then use it for all steps, since this keeps the analysis internally consistent.

Why? A reference genome assembly (fasta) tends to be stable for a very long time, so it is worth pre-indexing since it will get a lot of use. A reference annotation file (GTF, GFF3) could be sourced from many other data providers and tends to change frequently but is still usually based on that original reference genome. There are some other technical reasons but that’s the current logic! It may change later but for now that is how it works.

If you are hosting your own server, you could create a Data Library and stage the reference data in there. It could be both files: the reference genome and the reference annotation, or it could be just the annotation and the genome is pre-indexed (this mitigates large/repetitive computational loads on the server). Maybe label the data in a way that makes it easy to match up. You could include notes in your workflow about this, or prepare a README that you host or link on the server.

Hands-on: Data Libraries / Data Libraries / Galaxy Server administration

If you want to add genome indexes to your server, we have some guides about this. You can incorporate pre-computed indexes and/or add your own custom indexes. Or you can let everything process at runtime.

More about server administration → Private Galaxy Servers

Hope this helps!

amufaamo · December 3, 2024, 5:30am

OK, I really appriciate it!!!

Topic		Replies	Views
RNA Star: Can I generate a temporary index with files from previous assemblies? reference-annotation , reference-genome	2	125	May 13, 2024
HOw do you create a CTF/GFF file in Galaxy?	7	81	July 9, 2024
hisat2 and featurecounts usegalaxy.org support gtn-tutorial , workflow , galaxy-local , mapping , transcriptomics , featurecounts	23	2043	October 28, 2024
sRNAPipe query for input fasta files usegalaxy.eu support reference-annotation , reference-genome , tool-help	1	113	March 28, 2024
RNAseq of mouse with AAV synthetic transgene - how to do STAR alignment? troubleshooting	4	29	July 9, 2024

How to create a workflow that uses specific files.

Related topics