Dear Support, I am teaching a class with 15 students, and we are getting errors and delays when we run Funnonate and EGGnog. Is there a way to allocate a server for us to finish the course? We have two more months left.
Hi @Shaadi_Mehr
We do offer smaller training clusters at all of the UseGalaxy servers but these are for short workshops (a day or so). The data should be very small for these. Most GTN tutorials would be appropriate but not all. You can ask about any of this. Details → Teaching and Hosting Galaxy training / Tutorial List
So, “delays” or job queues are expected since everyone works with the same priority. However, errors we might be able to solve. Would you like to share back what is going wrong? Maybe we can help to solve the issues?
Let’s start there, thanks!
Thank you. What is the max size we can upload from our laptops for an RNA-seq data?
Hi @Shaadi_Mehr
There isn’t a hard limit (I am double checking this for ORG) and the Upload tool has a resume function.
Guide → Getting Data into Galaxy
The guide covers advanced loading from the command-line, too. Useful for batch work, very large files and automation from an API script.
That said, you might run into practical limits if 15-20 people are all attempting to load larger data from their desktop up to any remote site, all using the same internet connection. This isn’t a Galaxy constraint and is more about the throughput in the connection itself.
As a workaround for connections with slower “up” speeds, you could consider having just a few people load the data, then they could share the history, and others could import it? Or, you could put the data up into the cloud somewhere and allow people to load it by URL? That would remove your local connection speed from being a factor.
I’m mentioning the share ability since is a big part of why Galaxy can be so helpful, and it is a topic we always cover in training. Useful for collaboration, when getting help, and later on when publishing work.
If you share what kind of data this is and how large, I might be able to share more specifics.
Thanks Jen. If I download a bacterial genome or Fungal RNA-seg as fasq zip file form NCBI, can I just upload it to galaxy?
I can put it into WeTransfer server. Would that work?
Hi Jen,
Do I need to format it or can I just download from a local server and upload the raw fastq files into Galaxy if they are around 2GB?
Hi @Shaadi_Mehr
You should be able to load a genome fasta and sequence fastq files directly into Galaxy from NCBI. (same for the reference annotation gtf)
This is a recent post where I loaded up reference data →
Another example with a different species →
- Inquiry about lettuce genome plus any posts with a custom-genome tag.
Then, for fastq data there are dedicated tools. These retrieve using accession IDs. The tutorial covers how to source the accession IDs but if you already have them you can skip down to the tool where just the list of accessions is given.
I’m not sure about that service. The Galaxy requirement is that a link is “public” and anyone can access the file directly without credentials or passwords in the URL string.
If the link is instead just a pointer to a file, this is usually where the problems come up with trying to resolve it through too many redirects/cookies on the hosting site. I suspect this site applies a lot of tracking and might be a problem, but you could try it to see what happens.
You should be able to load 2 GB of fastq data without any problems.
If you are having a problem – try loading using all defaults. You will want Galaxy to “guess” the datatype format. I can help with this if you share an example.
In some extreme cases, uncompressing before loading is needed (a much older version of gzip or similar). But this should be really rare.
So, in short:
- Load public data directly from the data provider whenever possible
- Upload local data using all defaults
- Run some sanity checks once the content is in Galaxy. This is usually just a few cleanup steps to simplify the formats.
Hope this helps and we can follow up more!
Hi Jenn
I actually need help to find new data sets which is already indexed and masked to give to the students to run fungal assembly and annotation as an activity.
I really appreciate it.
Best
Can you please send a link to a new compatible genome and RNA SEq date to use and run Funnotate?
Hi @Shaadi_Mehr
If the goal is to use Funannotate, what about using the training data that they host? Or that the GTN hosts?
Find all of this linked on the tool form. The link out to the tool is where their tutorials are located, too. All of that should work in Galaxy fine.
Help
Funannotate annotate
Funannotate is a pipeline for genome annotation (built specifically for fungi, but will also work with higher eukaryotes).
This script functionally annotates the results from funannotate predict. It pulls annotation from PFAM, InterPro, EggNog, UniProtKB, MEROPS, CAZyme, and GO ontology.
Tutorials
There is 1 tutorial available which uses this tool. View all tutorials referencing this tool.
Hi Jenn,
We have successfully run the pipeline. I am trying to find a new dataset to rerun it and show functional adaptation in one condition compared to another.
1- How can I ensure the genomes I am downloading are compatible before pasting into the upload section?
2- What types of RNA-seq do we need to ensure that Funannotate works for STAR?
3- Can we use the full databases such as Swisspro for STAR and Funannotate?
4- Also, I am wondering if I can discuss and pay for extra CPU for the students via my usnifverstiy account to upload large datasets.
Best,