TEtranscripts input datasets

I want to run TEtranscripts between treated and control groups. Each group has 3 technical replicates in data collection. However, when I set two collections (one for treated and one for control) for TEtranscripts, it is required to add another input datasets.

I tried to separate them into 3 different datasets for 3 technical replicates and it run but it did not represent how the technical replicates work.

What should I do?

Welcome @ducbiologygroup

Hopefully I can help to explain! :slight_smile: It sounds like you have one condition and this tool is expecting at least two conditions. It is comparing the relative differences between conditions.

To be clear: Each condition would have the treated replicates normalized by their controls to produce the total quantification for that condition. Then the same would be done for the other. At the end, the two conditions would be compared and the relative differences reported (the “DE” results).

The the sub-tool TEcount isn’t available as a stand-alone tool in Galaxy. However, there are several identification tools – please see the RNA Analysis section of the tool panel or browse our training site.



:white_question_mark: Help from the tool form (scroll down in Galaxy to find more)

TEtranscripts annotates reads to genes and transposable elements
Output

TEtranscripts quantifies both gene and transposable element (TE) transcript abundances from RNA-Seq experiments, utilizing both uniquely and ambiguously mapped short read sequences. It processes the short reads alignments (BAM files) and proportionally assigns read counts to the corresponding gene or TE based on the user-provided annotation files (GTF files). In addition, TEtranscripts combines multiple libraries and perform differential analysis using DESeq2.



Hope this helps and you can ask follow up questions about any of this! :rocket:

Thank you @jennaj ,

I’m not sure if I understand correctly. So basically, I have two conditions (treated and control) and each of them has 3 technical replicates.

Does that mean I can’t use TEtranscripts to identify which TE is up/down-regulated upon treatment?

If I can, then how do I select my files for the input?

Thank you very much.

Maybe I misunderstood.

Do you have two sets of conditions, each with their own control?

If yes, you will want to have four collections: two per condition for a total of four collections.

If instead you only have one condition with a control, then there isn’t anything to compare it to, and you can explore what the sample contains itself: the presence/absence of features, and what those features represent, and maybe the relative differences of feature expressions within that condition. But that is not what this tool is doing. It is comparing two different conditions.

I might be explaining this poorly. This is the link to the author’s discussion about what the tool does → https://hammelllab.labsites.cshl.edu/software/#TEtranscripts with more at → TEtranscripts · PyPI.

There is a routine inside the top level tool that can perform the counting independently and might be what you are interested in, but it isn’t available by itself in Galaxy (yet, but we could make a request?).

In short, TEtranscripts requires two conditions, each with their own control. This is what is currently hosted in Galaxy. Then, TEcount processes each of the BAM files independently to generate counts, and could be used on treated and controls to generate some statistics that you could explore other ways, and while this sub-tool is not currently wrapped for Galaxy, it is something we could request.

Does this help?