Hi, I am new to Galaxy.
Are the BAM files generated after mapping using RNA STAR (version 2.7.8a+galaxy1 or newer) sorted (either by coordinate or name or tag)? After clicking on the ‘i’ logo for the ‘*.mapped.bam’ dataset details, the dataset information section says the format is BAM. The ‘Tool Standard Output’ is as follows:
Apr 28 20:52:35 … started STAR run
Apr 28 20:52:35 … starting to generate Genome files
Apr 28 20:53:23 … processing annotations GTF
Apr 28 20:53:54 … starting to sort Suffix Array. This may take a long time…
Apr 28 20:54:14 … sorting Suffix Array chunks and saving them to disk…
Apr 28 21:13:08 … loading chunks from disk, packing SA…
Apr 28 21:14:38 … finished generating suffix array
Apr 28 21:14:38 … generating Suffix Array index
Apr 28 21:19:00 … completed Suffix Array index
Apr 28 21:19:01 … inserting junctions into the genome indices
Apr 28 21:21:04 … writing Genome to disk …
Apr 28 21:21:07 … writing Suffix Array to disk …
Apr 28 21:21:30 … writing SAindex to disk
Apr 28 21:21:33 … finished successfully
Apr 28 21:21:33 … started STAR run
Apr 28 21:21:33 … loading genome
Apr 28 21:21:55 … started mapping
Apr 28 21:30:30 … finished mapping
Apr 28 21:30:54 … started sorting BAM
Apr 28 21:31:45 … finished successfully
However, though this output is essentially the same for my ‘*.transcriptome-mapped.bam’ file, its dataset information says the format is unsorted.bam. See attached screenshots.
The 1st line of ‘.mapped.bam’ file is ‘@HD VN:1.4 SO:coordinate’. So am I correct in assuming that perhaps it might be sorted by coordinate? The 1st line of '.transcriptome-mapped.bam’ is not the same as the ‘*.mapped.bam’ but rather is ‘@SQ SN:NR_046018.2 LN:1652’.
Do I still need to perform the Samtools sort operation for either one or both these files? I intend to use them as inputs for featureCounts.
Thanks a lot for the help in advance.