CollectRNASeqMetrics with rRNA interval

Hello,
I’m trying to determine the % of RNA sequencing read that have mapped to rRNA q, but keep running into trouble using theCollectRNASeqMetrics tool. I have downloaded a rRNA interval list from:http://sourceforge.net/mailarchive/message.php?msg_id=27560147
I’ve tried to use the the Bedtointerval tool to convert the file into Picard interval, but I get following error:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/024/667/24667223/_job_tmp -Xmx7g -Xms256m
07:55:52.779 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/picard-2.18.

Anyone who can please share any experience?

1 Like

Welcome @LoransM

The downloaded “rRNA interval list” dataset is either not really in bed format or is a genome mismatch for your other inputs.

You may need to do some manipulation to create a valid BED dataset (tools in these groups can do just about any manipulation: GENERAL TEXT TOOLS). Also check that the data is a match for your other inputs.

FAQs for format/mismatch help are at https://galaxyproject.org/support/. Start with these:

Once your data is in the correct format, rerun. If an error comes up again we can follow up from there.

Hi thanks for your message. So the rRNA interval list can be in bed format?

1 Like

Yes, it should be arranged to be in BED format. The tool Cut is sometimes enough, or if more rearrangement is needed, use the other data manipulation tools I mentioned.

If you cannot figure out how to do this, paste in the first few data lines of your starting file, including any headers, as preformatted text in a reply and we can help with exact instructions/tools to get it into the right format. Confirm the datatype that Galaxy assigned (using autodetect when uploaded to Galaxy – or go to the pencil icon > Edit Attributes > Datatype tab and re-autodetect the datatype). All this will help.

Thank you for your answer.
I downloaded the rRNA intervals as BED from UCSC http://genome.ucsc.edu/cgi-bin/hgTables and used the option to send it to galaxy. It seems to have assigned it as a BED file. I tried to rerun the the CollectRNAmetrics tool again, but getting this error now:
Word count less than 8 Bad line 1 of /galaxy-repl/main/files/033/612/dataset_33612699.dat:

Here are the first few lines of the BED file:
chr1 1815107 1815204 LSU-rRNA_Hsa 495 +
chr1 4417098 4417211 LSU-rRNA_Hsa 234 -
chr1 7876433 7876473 5S 282 +
chr1 9497766 9497837 5S 467 -
chr1 13923133 13923172 5S 256 -
chr1 13949705 13949779 5S 432 -
chr1 15976864 15976906 5S 266 -
chr1 25483508 25483621 LSU-rRNA_Hsa 896 -
chr1 28242609 28242680 LSU-rRNA_Hsa 410 -
chr1 30802098 30802177 LSU-rRNA_Hsa 270 -

Hope you can help me. Thanks

1 Like

Input this BED dataset through BedToIntervalList first, then use that result as one of the inputs to CollectRnaSeqMetrics.

Make sure that all input data is based on the same exact reference genome, with these two tools and any others used.