How to get gene list from bed file containing chromosome number, start and end positions in 1st, 2nd, and 3rd columns?

How to get gene list from bed file containing chromosome number, start and end positions in 1st, 2nd, and 3rd columns?

Hi @pallabi

This tutorial includes all of the manipulations. It focuses on something else specific, but what you will do is about the same.

Hope that helps! :slight_smile:

Hi, thanks for the reply. I have tried following the tutorial. I found that it has changed the chromosome coordinates after intersection. My input file contains chromosome number, start, end, sequence, score in the columns according to this order. After intersection, sequence and score columns are removed. And the coordinates of start and end have also changed in the output so, technically i don’t find any similarities between my input and output file.

Hi @pallabi

This tutorial has more manipulations, and the tool panel has more tools. Most are wrapped versions of command line utilities – either the actual utility or a duplication. → GTN Materials Search

But let’s get your use case clarified a bit more and come up with a solution.

These are the coordinates that you want to associate with a “gene”, correct?

I put that in quotes since what you will be mapping to is a transcript footprint, then that transcript is associated with a gene bound. More than one transcript might match, and those might all map to the same gene or might not. It depends on the coordinates: how much of the genome they cover and related.

So you are using the sequence as the name column, and have score, but do not have the strand. Not having the strand might matter for scientific reasons.

And only the first three columns will be preserved with some tools. Others can preserve all columns.

As a reference, this is the BED datatype specification: Genome Browser FAQ

I’m curious now about what you did. What is the content of the file with the genes? The file that was extracted from UCSC?

And, if you care about stranded results, your starting BED should have the 6th column. Do you have that information? Or are you just looking for genomic overlaps that are not stranded?

You can post back screenshots or copy/paste. Thanks!