Unassiged_nofeatures featurecounts results

jk2181 · December 17, 2019, 2:35pm

I am facing problem with featurecounts. I analyzed the small RNA-seq data.
I downloaded fasta and gft file from ensemble database (ftp://ftp.ensembl.org/pub/release-98/)
Homo_sapiens.GRCh38.ncrna.fa.gz
Homo_sapiens.GRCh38.98.gff3.gz

After trimming of adaptor sequence, I mapped the reads to Homo_sapiens.GRCh38.ncrna.fa.gz
I got this results below:
Capture0
37% reads were aligned. After that, I counted the read using featurecounts.
However…
Capture1
Except low quality mapping, the other were in Unassiged_nofeatures…
There was 0 assigned…

I have no idea how to solve it.

I tried to remove all ‘#’ lines in GTF file and change datatype from GFF to GTF.
But it did not work.

Please give me any suggestion?
I also attached all parameters on featurecounts when I ran.

jennaj · December 19, 2019, 6:07am

Hello @jk2181

If you review the “fasta” you chose and the gff3 dataset, you’ll notice that the base sequence identifier names are not the same. The fasta for “ncrna” is a subset of non-coding RNA sequences – not genome chromosomes. The annotation is based on genome chromosomes.

Using the human genome as a custom reference genome will probably run out of memory during the mapping step at any public Galaxy server. GRCh38 is the Ensembl version of the human genome. It is also released from UCSC as hg38. These two releases have different chromosome identifiers.

Try mapping against hg38 natively indexed and using the built-in hg38 genome annotation available in Featurecounts. The Gene IDs will be in Entrez format, but the tool annotateMyIds can be used to convert those to Ensembl format.

If you really would prefer to use the Ensembl-sourced annotation, choose the gtf version of the data instead of the gff3: ftp://ftp.ensembl.org/pub/release-98/gtf/homo_sapiens/Homo_sapiens.GRCh38.98.chr.gtf.gz

This will load the data with the datatype gff (autodetected by the Upload tool) because of the header lines, so remove those first. The format of gft data is much different than gff3 – and it is not easy to convert one to the other and most tools work better with annotation in gtf format. Featurecounts does not work with gff3 data.

Next, convert the chromosome names from Ensembl format to be in UCSC’s format. Use the tool Replace column by values which are defined in a converted file (Galaxy Version 0.2). See the tool form help for where to source a “convert” mapping file for the IDs.

Once both are done, “redetect” the datatype (pencil icon > Edit attributes forms > “Datatypes” tab). It should result as gtf if all was done correctly. Avoid directly assigning the “datatype” to datasets whenever possible – if Galaxy cannot detect the expected datatype, then there is almost always some formatting problem that needs to be addressed.

I added some tags to your post that cover very similar Q&A. Click on any to review. This FAQ is also a useful resource:

https://galaxyproject.org/support/ >> Extended Help for Differential Expression Analysis Tools

Thanks!

Topic		Replies	Views
Genbank to gtf for featurecounts	6	1746	January 29, 2021
hisat2 and featurecounts usegalaxy.org support gtn-tutorial , workflow , galaxy-local , mapping , transcriptomics , featurecounts	14	1560	January 2, 2023
Uploading new reference genome mapping , transcriptomics , reference-genome , featurecounts	4	511	December 19, 2022
dog genome problem with featurecounts usegalaxy.org support	0	263	March 31, 2021
htseq count issue using UCSC-sourced indexed bacterial genomes and NCBI reference annotation (mismatched chrom identifiers) custom-genome	3	740	May 17, 2019

Unassiged_nofeatures featurecounts results

Related Topics