Hi all!
I have a list of ~5000 sequences of length 80. I want to obtain gene symbols to these sequences. Initially I used R and retrieved gene symbol to more than half of these sequences. As the process was very time consuming, I thought I can use methods used in RNAseq, which I am not familiar with. I tried using Galaxy, and the result I got are correct (I checked against my partiall R results which in turn I had manually verified parts of using UCSC genome browser.
What I do is as follows:
1- Upload the sequences in FASTA format.
2- Hisat2, choose
Source for the reference genome: use a genome from history
Select the reference genome: hg38 ncRNA+CDS
Is this a single or paired library: single end
Specify strand information: F
3- and then pass results to HTSeqCount with following options:
GFF= hg38.gtf
Stranded=NO
I also used StringTie instead of HTSeqCount, and Salmon instead of HiSat2. But no succes.