Dear all, I’m trying to get TPM data from FastQ file(mouse, illumina, pair-end, 150bp, 20-30M), I tried Stringtie, Salmon, Sailfish, Killisto quant, and I did get the results, but I have a couple of questions about the data:
-
Stringtie data showed one gene with multiple TPM values, my understanding is these different reads mapped to different sites of the same gene(there is information about the ch site), should I just plus all the TPM together as the final TPM for each gene?
|Rb1cc1| cov 143.169418| FPKM 8.402312| TPM 17.528645|
|Rb1cc1| cov 0.007920| FPKM 0.000465| TPM 0.000970|
|Rb1cc1| cov 0.353770| FPKM 0.020762| TPM 0.043313|
|Rb1cc1| cov 0.006247| FPKM 0.000367| TPM 0.000765|
|Rb1cc1| cov 0.608617| FPKM 0.035718| TPM 0.074515|
|Rb1cc1| cov 1.899041| FPKM 0.111451| TPM 0.232505|
|Rb1cc1| cov 0.131482| FPKM 0.007716| TPM 0.016098|
|Rb1cc1| cov 0.025351| FPKM 0.001488| TPM 0.003104|
|Rb1cc1| cov 0.081178| FPKM 0.004764| TPM 0.009939|
|Rb1cc1| cov 0.661685| FPKM 0.038833| TPM 0.081012| -
the data from Salmon, Sailfish, and Killisto quant are more confusing, the target ids are like what showed as following, which one I should use to count TPM? or it’s the issue of the reference genome I use(Grcm38)?
ENSMUST00000130201.7|ENSMUSG00000033845.13|OTTMUSG00000029329.3|OTTMUST00000072660.1|Mrpl15-203|Mrpl15|1894|protein_coding|
ENSMUST00000156816.6|ENSMUSG00000033845.13|OTTMUSG00000029329.3|OTTMUST00000072659.1|Mrpl15-206|Mrpl15|4203|protein_coding|
ENSMUST00000045689.13|ENSMUSG00000033845.13|OTTMUSG00000029329.3|OTTMUST00000072661.1|Mrpl15-201|Mrpl15|497|nonsense_mediated_decay|
ENSMUST00000115538.4|ENSMUSG00000033845.13|OTTMUSG00000029329.3|OTTMUST00000072664.1|Mrpl15-202|Mrpl15|910|processed_transcript|
ENSMUST00000192286.1|ENSMUSG00000033845.13|OTTMUSG00000029329.3|OTTMUST00000127355.1|Mrpl15-207|Mrpl15|4600|retained_intron|
ENSMUST00000146665.2|ENSMUSG00000033845.13|OTTMUSG00000029329.3|OTTMUST00000072662.2|Mrpl15-205|Mrpl15|1569|protein_coding|
ENSMUST00000132625.1|ENSMUSG00000033845.13|OTTMUSG00000029329.3|OTTMUST00000072663.1|Mrpl15-204|Mrpl15|654|retained_intron| -
honestly, it’s the first time I work with RNAseq data, so for all the programs, I always use the default setting, based on the sequencing parameter (mouse, illumina, pair-end, 150bp, 20-30M), does anyone can give me some advice about those parameters?
thank you so much, everyone!