Following my request for adding the dog genome, I ran into a problem. I had a trouble trying to align and then use featurecounts. In the table below you can see our numbers for CanFAM 4 (1rst column) and CanFAM 3.0.
So we aligned using canfam3 within galaxy and canfam 4 that we downloaded from NCBI in fna.gz format. Alignment with hisat2 went well with a majority of aligned reads.
However using featurecounts with annotations from UCSC, as you can see for canFAM3, we have some assigned reads but less than 20 % with a lot of no_features but also multimapping.
Using CanFAM4, which has many more transcripts, we thought we would increase the assigned category, but instead we lose everything in assigned, have many more multimapping and no_features and lose the ambiguity group.
Dog is less well annotated than human or mouse but with canfam 4 we should have better assignment of reads over transcripts. Why do we have 0 assigned now? and so many multimapping and no_features? Is it possible our annotations and genome don’t match well?
FYI, our RNA-seq are paired and stranded. Do you need any other info?
We are trying now to download canFAM4 genome from genbank instead of NCBI maybe we should do the same with annotations (but I’m not sure which one to use, I know when I am in UCSC)?
|Assigned||0||10 824 955|
|Unassigned_Unmapped||1 462 466||2 285 520|
|Unassigned_MultiMapping||17 176 987||11 328 584|
|Unassigned_NoFeatures||51 033 117||40 504 328|
thanks for any help.