I am working with RNA-seq data from which I am obtaining count results and other outputs regarding ENS gene IDs.
Now, I need to calculate the transcript integrity number (TIN) for each of these gens but turns out that the tool for TIN calculation, produces the output in form of ENS transcript IDs which does not correlate with my ENS gene data.
Is there a way, or another tool to calculate transcript integrity with galaxy from which I can obtain this data regarding genes instead of transcripts?
I also tried to convert de Transcript IDs into Gene IDs but as several transcripts can belong to the same gene, I obtain several data for the same gene. Therefore, I believe conversion from TIN calculation output to geneIDs does not work for me.
I’m most curious about the identifier formats, how the files are structured, and your exact tool settings. A shared history will provide all of that. Thanks!
In this case, the problem is not that the tool does not work. It works, but the results obtained are per transcript, when in my case I need the total TIN value for all the transcripts regarding each gene. Therefore, I obtain multiple values and rows for each gene and I cannot extrapolate these results to the rest of my data, as in the rest of my data I only have one gene per row.
I don’t know if I made myself clear this time. Sorry for the inconvenience.
My question, however, is; Is there any tool in galaxy that can generate TIN values per gene instead of per transcript?
So … what you want as a result is a single value per gene, not transcript, correct?
Technically, if you just want one TIN value per gene, you can control this by either filtering your reference BED12 input, or by filtering your results after.
But – this is not how the metric was intended to be applied. I would suggest that you review the publication associated with the tool to make decisions about this. The authors explain with many more details than we can provide here. In particular, review the sections where they correlate TIN and common methods to choose a “representative” transcript such as the length.
In short: summarizing a per-transcript metric into a per-gene metric is not straightforward, especially for samples with variable expression levels, and this specific tool doesn’t attempt to provide that summary.
If you do decide to do something custom, you can do data manipulations in Galaxy. This is the tutorial with examples using web tools, but you can also find similar GTN tutorials for R, SQL, and various notebooks.