Proteogenomics help needed: How to get genomic coordinates of identified proteins?

nathanieltay · May 11, 2022, 2:25pm

Hi,

I am trying to fetch the genomic coordinates of mass spec identified proteins. The process of identifying these proteins is as follows:

Align RNA-Seq paired-end read files to the human reference genome (input of 43 x 2 paired-end read files → output of 43 x BAM files).
Assemble transcripts via Stringtie with reference to Ensemble GTF (output = 43 x GTF files).
Merge all 43 GTFs into 1 single GTF file (output = 1 x merged.gtf file).
Extract transcript sequences of the merged.gtf using human reference genome (output = 1 x merged.fasta file)
Translate the sequences in 3 frames (output = 1 x 3-frame-translated.fasta).
Split the translated sequences at every stop codon (output = 1 x protein_db.fasta).

I would now like to map the proteins from step 6 back to their genomic coordinates. I tried to follow this Proteogenomics tutorial starting from the “Transcript Assembly” section. Starting at the “Evaluate the assembly with annotated transcripts” section, I input merged.gtf from the above step 3. I then follow this up with the “Translate transcripts” section which uses the merged.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.2bit as input. Next, I tried to follow the instructions in the “Creating FASTA Databases” section but I cannot do so as I do not have a genomic_mapping.sqlite database as input.

Can anyone advise how I might go about solving this problem? I am open to using non-galaxy tools as well although I have had much less success there.

Topic		Replies	Views
Automatically acquiring and adding NCBI data	1	831	July 18, 2019
StringTie "no reference transcript". Solutions? Or need new alignments? usegalaxy.org support custom-genome , cloudman , cloud , cloudlaunch	2	2015	April 13, 2020
Human genome 'primary assembly' as reference for mapping ?	7	964	June 12, 2021
Sequence to gene name usegalaxy.org support uniprot , mapping , blast , annotatemyids , reference-annotation , feature-annotation	2	733	July 23, 2019
Can the latest gtf file be used as the annotation file with the old reference genome available in galaxy? usegalaxy.org support troubleshooting , transcriptomics	1	102	February 19, 2024

Proteogenomics help needed: How to get genomic coordinates of identified proteins?

Related Topics