When starting RStudio interactive tool and selecting bam files as inputs, bam index files are not loaded into RStudio
Welcome back @droslj !!
Are you using the gx_get()
command into Rstudio at a public Galaxy server? Which one? I can double check. I’m pretty sure the data.bam.bai
index should be pulled with the data.bam
file but I am also getting faint flickers of there being something special about this. We can confirm. But I’ll need to reproduce it. Can you help me to do this by explaining what you did exactly, and where? Thanks!
Thanks,
I never left.
I listed the files as inputs when starting the RStudio interactive tool, I also tried importing them using gx_get(). None of the methods works, i.e. bam index files are not loaded, I need to create them in RStudio.
Here is the link to the history:
What are the limitations of the RStudio session in terms of RAM/Cores?
Regards
Andreas
Hi @droslj
Thanks for sharing your history, very helpful. I can confirm. You can get the BAM then create the index.
Let’s get you going! This is a bit of R that will load a BAM dataset into RStudio from a Galaxy history, check to see if it has an index already, create it if not but not create a duplicate, and name it appropriately so it will be found later on when the bam object is called by functions.
Get the library, the path to the bam
from the Galaxy history, and assign it to an object in the environment. Then check and create an index if needed.
library(Rsamtools)
file.copy(gx_get(5), "special.bam")
special <- BamFile("special.bam")
if (!file.exists("special.bam.bai")) {
indexBam("special.bam")
}
Where “5” is the dataset number and “special” is how I am naming this bam file.
The tutorials do not cover this since only the count files were pushed into the R environment in the examples. But you can certainly work with this these files if you want to! Most R libraries and functions will work in RStudio.
Please give this a try and let us know how it works!
Thanks,
I already got this covered. I just thought that you guys might want to fix this.
I have a lot of issues with the stability of RStudio server sessions launched from Galaxy.
Is RStudio server in the Galaxy supposed to be (1) used just like any other commercial platform (that provides RStudio services) or (2) is it supposed to be used only for some minor tasks that can not be performed using Galaxy wrappers?
br
Andreas
Well, the public servers are not commercial platforms, so the environments are still running on the same resources as the rest of public Galaxy server you are working on. There are commercial versions of Galaxy that you could explore and some come with administrative support where you could specify expected uptime constraints. You could also run your own where you administrate the site but that can be a lot of work!
For the BAM file: datasets inside Galaxy work a bit like an object in R. They point to a file, and certain types of files can have an index associated. But when moving data between Galaxy and R, the gx_get()
command only captures a file path. Singular. The top level primary file path. We’ve talked about how to expand this for the sub-files (there can be other types, not just indexes) but there isn’t an easy solution yet. This is what I was double checking yesterday. But, I’m glad you asked. Maybe someone in the community will see this again and come up with an updated method: capture the primary file, or an index, or a sub-type in a different format.
Also, I can let you know that UseGalaxy.org had some problems yesterday launching RStudio. I noticed that while testing (before you posted the link). We kicked started it. You can ask here if that happens again or if you notice anything else odd.