Hello, I´m very very new to the Bioinformatic field!
I am working with nanopore sequencing data and trying to analyze my data.
I´ve tried to create a new workflow, nevertheless I´m not 100% sure if what I am doing is correct. Could you pls help me out
I´ve started out with following steps:
- single fastq files → porechop (Galaxy Version 0.2.4+galaxy0) → fastp (Galaxy Version 0.24.0+galaxy3)
afterwards I wasn´t sure on which of the 3 ways would be the correct continuation:
Test 2.1) Minimap2 (Galaxy Version 2.28+galaxy0) (including the reference Genome: Homo_sapiens.GRCh38.cdna.all.fa) → featureCounts (Galaxy Version 2.0.6+galaxy0) (using Homo_sapiens.GRCh38.111.gtf)
Test 2.2A) Minimap2 (Galaxy Version 2.28+galaxy0) (including the reference Transcriptome: RefTranscriptRealShitGalaxy370-ob__gencode.v47.transcripts.fa.gz__cb.fasta uncompressed) → featureCounts (Galaxy Version 2.0.6+galaxy0) (using Homo_sapiens.GRCh38.111.gtf)
Test 2.2B) Minimap (including the reference Transcriptome: RefTranscriptRealShitGalaxy370-ob__gencode.v47.transcripts.fa.gz__cb.fasta uncompressed) → Sambamba sort (Galaxy Version 1.0.1+galaxy1) → Salmon quant (Galaxy Version 1.10.1+galaxy2) (using RefTranscriptome: RefTranscriptRealShitGalaxy370-ob__gencode.v47.transcripts.fa.gz__cb.fasta uncompressed and the BioMart Gene list: RefTranscriptRealShitGalaxy370-[gencode.v47.transcripts.fa.gz].fasta.gz)
Following inputs were used generally accordingly to the attachment:
Reference Genome fasta: Homo_sapiens.GRCh38.cdna.all.fa
GTFfile Genome: Homo_sapiens.GRCh38.111.gtf
Reference Transcriptome fasta: RefTranscriptRealShitGalaxy370-ob__gencode.v47.transcripts.fa.gz__cb.fasta uncompressed
BioMart Genelist: RefTranscriptRealShitGalaxy370-[gencode.v47.transcripts.fa.gz].fasta.gz
Also another question: porechop is taking reeeaaaally long - >24 h for a single file - is this normal? how can I make it chop faster?
Many thanks for your help!