RNA seq analysis on transgenic animal

Freek_vE · March 29, 2023, 11:40am

We are doing RNAseq on a transgenic animal and need to see expression from the transgene.
As the trangene is not present in the “standard” genome, I think the relevant sequences cannot be mapped and will not be represented after HISAT2 analysis…does anyone know a smart workaround…the transgenic construct is 11Kb, but is dont see options to provide a “custom genome” where I create a genome with the extra 11Kbsequence in or simply use the 11Kb construct as a “mini genome”

The must be a a way to do this but i am just a beginner bioinformatics-wise…any pointers??

wm75 · March 29, 2023, 11:57am

In HISAT2, specifically, there is “Source for the reference genome” right at the top of the tool interface, which lets you switch between “Use a built-in genome” and “Use a genome from history”, the latter being what you are looking for.

Analogous options exist for alternative mappers like STAR.

Michael_Thon · March 29, 2023, 3:23pm

I suggest you add your 11 kb sequence to the genome. You might have to provide a gff3/gtf file of the annotations and if thats the case you would have to add a few lines to the reference genome’s gff3 file to indicate where the gene and exons are in the 11 kb segment.

Freek_vE · March 29, 2023, 3:35pm

Thank you!, i will give that a try
F

George_Aranjuez · March 29, 2023, 8:12pm

Which approach did you end up using? I have the exact situation.

Freek_vE · March 31, 2023, 7:51am

I am still working on this!
As I am a complete novice I need to google everything…I just managed to download the right version of my zebrafish genome… I hope…now starting at seeing how to edit this…will keep you posted…I will keep a log of what i did exactly and keep you posted

Freek_vE · March 31, 2023, 9:29am

Dear Michael…thanks for you great pointers
I got hold of the GFF file and genome sequence but was wondering how to add lines.

As you can see in the snapshot, the GFF file has lots of references to genbank sequences or other accession numbers. As we use a homemade construct we don’t have such references. I could simply add “homemade” or “custom” instead but was wondering… will this confuse HISAT?? or does it only need the added genome sequence and the coordinates, and interpret other information simply as a label?

Michael_Thon · April 1, 2023, 2:58pm

Hi,
The last column will probably need the ID= tag. The first column of course is the name of your sequence. I can only suggest some trial and error to get it working. You could do some test alignments using only your sequence and your custom gff to make sure that hisat is recognizing your annotations. Once that works you just combine everything and analyze the full data set.

Freek_vE · April 3, 2023, 6:30am

Thanks that was what i was planning:-)

Freek_vE · April 25, 2023, 10:48am

After a few trials this has worked
It is best to do this in small steps…

download zebrafish genome sequence from “NCBI genomes” click on genome…image below

image824×126 24.3 KB
open in text edit, scroll to the end, and copy header for the mitochondrial sequence and paste it again at the end of the file. make a “fake change” …I just added an extra 7 to the accession number and made it into a “pitochondrion”. Then paste the transgenic nucleotide sequence that you have under that header. (note how many bases your construct has). Save this genome on your hard drive. Below is how it looked in the end

image1078×152 23.1 KB
download IGV (Downloads | Integrative Genomics Viewer) and choose genome>load from file, and load your genome to see if IGV accepts it and has added the new bit of sequence at the end (do this before you do anything else with that custom genome…i didnt check and payed the price
Download the GFF file see image above again, click on GFF…
Open the GFF file in text edit, scroll towards end and find the header for mitochondrial DNA, copy the header and paste it below the mitochondrial annotation data but before the last “end of file” sign, and modify as needed (number of bases, modify accession number (add “7” in my case) and some gene info.
Here is how it looked in the end…note the change in the accession number and length of the sequence in several places and new genes, that i created in the construct.

image2358×488 197 KB
Save the modified gff file and load it also in IGV (where the fish genome is already loaded) using file>load from file.
If you can see your annotations in the new bit of DNA you can do the next step…upload the modified genome file in GALAXY and use it as a local genome (ie from “history”) in HISAT2 to map your sequence files ( file type type: fq.gz) to and create BAM files.
download a resulting BAM file and try to add it into IGV (file>load) where the genome and the gff file are already loaded.

No guarantees …but perhaps these are a few useful pointers at least.
You get a few prompts from IGV…I just did what they asked for.
if you are successful /not successful let me know

brebelo · April 26, 2023, 10:33am

Hello @Freek_vE what type of text editor did you used? It was on the Galaxy?

Thank you.
Bárbara

Freek_vE · April 26, 2023, 10:55am

I have a Mac, this has “TextEdit” built in, a programme that opens a lot of file types, and can edit them.
I have not tried to do this in Galaxy, TextEdit worked, so I didn’t bother looking elsewhere, there might be an option on Galaxy as well.

brebelo · April 26, 2023, 10:59am

I tried in windows but the file is too heavy. I am searching how to do it in Galaxy or Linux but I was just wondering if you had the same problem. Thank you.

Freek_vE · April 26, 2023, 10:59am

Googling brings up NotePad2 or SublimeText as PC versions of TextEdit

My Mac is also sighing under the weight of these files!

Topic		Replies	Views
Add bacterial genes to a plant genome file data-manipulation , transcriptomics	2	425	April 21, 2023
RNAseq data alignment and counting using Salmon usegalaxy.org support mapping , transcriptomics , featurecounts , salmon	4	2290	November 29, 2022
RNAseq of mouse with AAV synthetic transgene - how to do STAR alignment? troubleshooting	4	34	July 9, 2024
Built-in reference genome of domestic water buffalo usegalaxy.org.au support reference-index , reference-annotation , reference-genome	1	24	January 6, 2025
Adding zebrafish genome builds to HiSat and Star aligners usegalaxy.eu support reference-index	12	1967	April 30, 2019

RNA seq analysis on transgenic animal

Related topics