Sort on BED file

stealsh · March 25, 2022, 7:02pm

Hello,

I’m currently trying to run a sort on some bed files. The instructions for the sort are:

sort -k1,1 -k2,2n -k3,3n

however, I’m having difficulty translating this to the sort tool in Galaxy. Is the K1, k2, k3 referring to the columns? and if so, what do 1, 2n and 3n refer to? Any help would be appreciated.

jennaj · March 25, 2022, 7:56pm

Hi @stealsh

You have this right:

sort

Use the Sort tool in Galaxy

-k1,1

Means to start sorting at the first column and stop sorting this way at the first column (otherwise the sort would apply from whatever the “start” was until the end of the line). Limiting to the first column is important because the “sorting type” to apply differs by column. In this case, the sorting is “alphabetical”. But see the tool form – alphabetical for any letters in the first column (a-b) will be what humans expect but any numbers after may not be. Example: “11” comes before “2” because the first character of “11” (1) is smaller than the first character of “2” (2).

-k2,2n

Sort the second column with a “numerical” sort (“n”) – that means smallest to largest number. For this case, “2” would be smaller than “11” and listed first.

-k3,3n

Sort the third column, same rules as the second column (numerical aka “n”).

In practical terms, these methods will “coordinate sort” the first three columns of a bed file. Most bioinformatics tools are designed to interpret chromosome positions this way for bed datasets. The rest of the line isn’t considered, just passed through the tool associated with whatever was originally included per line after the “chrom-start-stop” data.

Try different sort conditions on your data a few different ways, you’ll notice the difference. The tool form also has a few examples that should help to clear up why the command-line was written that way.

stealsh · March 25, 2022, 8:28pm

Hi @jennaj

Thanks for the detailed reply. That makes it a lot clearer for me, I’ve tried running it and it looks like it’s working. I had been contemplating download files and running them through cut & sort on my conda build and re-uploading to galaxy. Glad I won’t have to.

jennaj · March 25, 2022, 8:56pm

Agree - all of these text manipulation tools included in Galaxy are one of my favorite parts of it. Super handy

Topic		Replies	Views
Issue in Step6: ChIP-Seq data analysis tutorial: Formation of the Super-Structures on the Inactive X usegalaxy.eu support gtn-tutorial , workflow , chip-seq , epigenetics	3	891	February 11, 2019
How to get gene list from bed file containing chromosome number, start and end positions in 1st, 2nd, and 3rd columns? usegalaxy.be support gtn-tutorial	3	257	May 9, 2024
Sortbed tool issue chrominfo	1	288	March 16, 2023
how can I find out the highest and lowest expression values? usegalaxy.eu support text-manipulation	2	328	December 3, 2022
hg19 as reference produce bad ordered bams usegalaxy.org support mapping , reference-genome	2	388	February 22, 2021

Sort on BED file

Related topics