Based on the tutorial [Hands-on: Whole transcriptome analysis of Arabidopsis thaliana / Whole transcriptome analysis of Arabidopsis thaliana / Transcriptomics], Salmon shouldn’t create any problems, but it does! If you look at the first DESeq2 analysis result, whether I used a GTF file or a TranscriptIDtoGeneID file, I got errors! [I DELETED ALL THE ANALYSIS AND RERAN IT AGAIN FOR YOU TO EASILY SEE WHAT IS GOING ON IN THE HISTORY]
DESeq2 outputs 677 and 678 were created using Salmon gene quantification outputs for w040drought and w040control with TPM values from Salmon, and a GTF file.
The error message:
Import genomic features from the file as a GRanges object … OK
Prepare the ‘metadata’ data frame … OK
Make the TxDb object … OK
‘select()’ returned 1:1 mapping between keys and columns
reading in files with read.delim (install ‘readr’ package for speed up)
1 2 3 4 5 6 7 8 9 10 11 12
reading in files with read.delim (install ‘readr’ package for speed up)
1 2 3 4 5 6 7 8 9 10 11 12
Error in .local(object, …) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
Example IDs (file): [SORBI_3K010100, SORBI_3K025800, SORBI_3K044406, …]
Example IDs (tx2gene): [EER90453, OQU90574, EER90454, …]
This can sometimes (not always) be fixed using ‘ignoreTxVersion’ or ‘ignoreAfterBar’.
Calls: get_deseq_dataset … tximport → summarizeToGene → summarizeToGene → .local
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The “phase” metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
What I understand from this error message is that the transcript IDs in the quantification files are not present in the first column of the tx2gene file. In other words, the annotation file I am using does not match the transcript IDs in my quantification files. This is kind of weird since I used the same GTF file. The file is below, and as you instructed previously to someone on the forum, it contains no headers and is in a tab-delimited format:
Later on, I thought that perhaps if I used the TranscriptID to GeneID file, I could overcome the error! To do this, I prepared the TranscriptID to GeneID file using Gffread and Cut tools, and fed it into DESeq2 by changing the option to “Gene mapping format: Transcript-ID to Gene-ID mapping file.” I ran the analysis:
DESeq2 outputs 680 and 679 were created using Salmon gene quantification outputs for w040drought and w040control with TPM values from Salmon, and the TranscriptID to GeneID file.
Error message:
reading in files with read.delim (install ‘readr’ package for speed up)
1 2 3 4 5 6 7 8 9 10 11 12
Error in $<-.data.frame
(*tmp*
, “TXNAME”, value = character(0)) :
replacement has 0 rows, data has 48558
Calls: get_deseq_dataset → $<- → $<-.data.frame
I searched the error message and found that it indicates a mismatch in row counts when trying to add a new column to a data.frame in R. Specifically, it means that the new column (TXNAME) I am trying to add has 0 rows, while the existing data frame has 48,558 rows. This row count mismatch is causing the error.
Then I thought it would be better to create count files from Salmon transcript quant files. To do this, I used the tximport tool. After obtaining collective count files, I prepared them by splitting into two files for my DESeq2 analysis. As you suggested in your previous answers, I converted the files into tabular format.
DESeq2 outputs 682 and 681 were created using 666 Drought_counts_w040, 665 Drought_counts_w040, and 667 sample info.
Error message:
Error in DESeqDataSetFromMatrix(countData = tbl, colData = subset(sample_table, : ncol(countData) == nrow(colData) is not TRUE Calls: get_deseq_dataset → DESeqDataSetFromMatrix → stopifnot"
Now, I am open to suggestions to resolve this problem. Here is the link to my history:
Note: You might suggest that I can do this analysis with other tools. I know and I can, but I have liked Salmon. Skipping the trimming procedure and avoiding reproducibility issues are nice…