RNA seq analysis on transgenic animal

We are doing RNAseq on a transgenic animal and need to see expression from the transgene.
As the trangene is not present in the “standard” genome, I think the relevant sequences cannot be mapped and will not be represented after HISAT2 analysis…does anyone know a smart workaround…the transgenic construct is 11Kb, but is dont see options to provide a “custom genome” where I create a genome with the extra 11Kbsequence in or simply use the 11Kb construct as a “mini genome”

The must be a a way to do this but i am just a beginner bioinformatics-wise…any pointers??

In HISAT2, specifically, there is “Source for the reference genome” right at the top of the tool interface, which lets you switch between “Use a built-in genome” and “Use a genome from history”, the latter being what you are looking for.

Analogous options exist for alternative mappers like STAR.

I suggest you add your 11 kb sequence to the genome. You might have to provide a gff3/gtf file of the annotations and if thats the case you would have to add a few lines to the reference genome’s gff3 file to indicate where the gene and exons are in the 11 kb segment.

Thank you!, i will give that a try
F

Which approach did you end up using? I have the exact situation.

I am still working on this!
As I am a complete novice I need to google everything…I just managed to download the right version of my zebrafish genome… I hope…now starting at seeing how to edit this…will keep you posted…I will keep a log of what i did exactly and keep you posted

2 Likes

Dear Michael…thanks for you great pointers
I got hold of the GFF file and genome sequence but was wondering how to add lines.


As you can see in the snapshot, the GFF file has lots of references to genbank sequences or other accession numbers. As we use a homemade construct we don’t have such references. I could simply add “homemade” or “custom” instead but was wondering… will this confuse HISAT?? or does it only need the added genome sequence and the coordinates, and interpret other information simply as a label?

Hi,
The last column will probably need the ID= tag. The first column of course is the name of your sequence. I can only suggest some trial and error to get it working. You could do some test alignments using only your sequence and your custom gff to make sure that hisat is recognizing your annotations. Once that works you just combine everything and analyze the full data set.

1 Like

Thanks that was what i was planning:-)

After a few trials this has worked
It is best to do this in small steps…

  1. download zebrafish genome sequence from “NCBI genomes” click on genome…image below

  2. open in text edit, scroll to the end, and copy header for the mitochondrial sequence and paste it again at the end of the file. make a “fake change” …I just added an extra 7 to the accession number and made it into a “pitochondrion”. Then paste the transgenic nucleotide sequence that you have under that header. (note how many bases your construct has). Save this genome on your hard drive. Below is how it looked in the end

  3. download IGV (Downloads | Integrative Genomics Viewer) and choose genome>load from file, and load your genome to see if IGV accepts it and has added the new bit of sequence at the end (do this before you do anything else with that custom genome…i didnt check and payed the price :frowning_face:

  4. Download the GFF file see image above again, click on GFF…

  5. Open the GFF file in text edit, scroll towards end and find the header for mitochondrial DNA, copy the header and paste it below the mitochondrial annotation data but before the last “end of file” sign, and modify as needed (number of bases, modify accession number (add “7” in my case) and some gene info.
    Here is how it looked in the end…note the change in the accession number and length of the sequence in several places and new genes, that i created in the construct.

  6. Save the modified gff file and load it also in IGV (where the fish genome is already loaded) using file>load from file.
    If you can see your annotations in the new bit of DNA you can do the next step…upload the modified genome file in GALAXY and use it as a local genome (ie from “history”) in HISAT2 to map your sequence files ( file type type: fq.gz) to and create BAM files.

  7. download a resulting BAM file and try to add it into IGV (file>load) where the genome and the gff file are already loaded.

No guarantees …but perhaps these are a few useful pointers at least.
You get a few prompts from IGV…I just did what they asked for.
if you are successful /not successful let me know

1 Like

Hello @Freek_vE what type of text editor did you used? It was on the Galaxy?

Thank you.
Bárbara

I have a Mac, this has “TextEdit” built in, a programme that opens a lot of file types, and can edit them.
I have not tried to do this in Galaxy, TextEdit worked, so I didn’t bother looking elsewhere, there might be an option on Galaxy as well.

I tried in windows but the file is too heavy. I am searching how to do it in Galaxy or Linux but I was just wondering if you had the same problem. Thank you.

Googling brings up NotePad2 or SublimeText as PC versions of TextEdit

My Mac is also sighing under the weight of these files!