Silva reference database

Hi, I am trying to align my sequences with the silva 138.1 version but the downloaded file is empty (same for version 132),
can anyone advise please?
I am looking for bacterial and archaeal community in my samples amplified with 341F and 806 R for 16S V3V4 region.
Thank you

1 Like

Hi @Funnyme186

Try this data provider as a source: https://www.arb-silva.de/

You’ll want the fasta version of the data. Example link: Archive

Tutorials: Galaxy Training!

hi @jennaj Thank you for your reply!
I did download the uncompressed version of 132 (9.9 GB), and used it to align my sequences because I didn’t really know how to extract the fasta from arb.

The problem now is that when I get to cluster.split , it is taking too long and error at the end.

Do you have an idea of how to resolve this problem?

If yes, could you please advise?

Regards

Stephanie

1 Like

Hum – is the datatype fasta or fasta.gz. Expand the dataset to check. If fasta.gz you might just need to uncompress it. Click on the pencil icon for the dataset to bring up the Edit Attributes forms. The tab for Convert has a drop-down menu with an “uncompress” function. Some tools do not interpret compressed fasta well. If that works, please let me know.

Or, maybe the full fasta is too large for the server to process. There are several posts at Biostars discussing how others are sub-setting the fasta to just the regions of interest.

EMBOSS Fuzznuc is a tool in Galaxy already (referenced in the second post above).

The https://www.arb-silva.de/ site has a search/filter function. I haven’t used it much but might be another way to subset and output fasta. The site has documentation and examples. Many functions involve using the ARB tool package for data manipulation. That said, most functions can probably be translated to alternative tools in Galaxy.

If that is not enough, where are you working now (server URL)?

  • If usegalaxy.org, would you please send in a bug report from the new error? Include a link to this topic in the comments so I can find it. Would like to review it.
  • If working at a different public Galaxy server, you can share your history with me in a direct message here. You won’t be able to do that yet – so write back if needed and I’ll start one up. That keeps your shared history link private. If you don’t care about privacy, the history share link can just be posted back in this thread. How to: Galaxy Training!. Also post back the dataset number with the failure to make sure we are looking at the same thing.

Let’s start there. Can bring in some domain specialists if needed, but better to rule out technical issues first.

Hi @jennaj
Thank you again for your prompt reply.

My datatype is fastqsanger.

Yes I tried to create a custom reference file with my regions but didn’t use it.

Yes the full silva 132 fasta is too large (9.9 GB) but the readme to compress it was too complicated, I will try to extract the fasta again.

I am working on the usegalaxy.org

I shared my history with your email address.

I sent in a bug report today the new error of cluster.split.

Thank you a lot for your help regarding this issue, I really appreciate it.

Best regards

Stephanie

1 Like

HI @Funnyme186

Well, there were some issues at the server last week. Those may have impacted your work. It looks like you have had a successful rerun by now with the same inputs that originally timed out and failed.

Apologies for the delayed reply – I also needed to wait for things to get back to (mostly) normal before reviewing/replying. All should be Ok now. Everyone should expect slightly longer job queue timing until the banner on the server clears, but everything else is fine.