I want to use the HISAT2 tool for read alignment, but the reference genome that I need is not available under the built-in options provided.
It is directed to contact the Galaxy team if the genome of interest is not listed.
Hi @FraCoppo
In Galaxy, HiSAT2 and other mapping tools can handle custom reference genome. Upload the genome of interest as a fasta file, and in job setup change Source for the reference genome to From history. HiSAT2 will index the genome and map reads.
Hope that helps.
Kind regards,
Igor
Thaks a lot Igor!
Your suggestion was precious, allowed me to go ahead with the analysis.
Then, using featureCounts on the derived BAM files, I inserted an annotation file .gff manually (a custom one, as for the reference-genome). Unfortunately, however, the featureCounts analysis does not work. I also made sure that the gene annotation file corresponds to the same reference genome.
Do you have any other suggestions? Is there another way to transform reads from BAM to gene expression?
Hi Francesco,
What the standard output and/or the standard error log files say? Info ( i ) icon > check the log files in the middle window.
Do you get summary files? If yes, what are the summary?
Two common causes for no reads assigned to genes: different chromosome/contig names and mismatch between attributes and job setup. For example, gene annotation might not have gene_id attribute.
If you share a history and post URL here, I or someone else can check what is going on. If the history is big, maybe copt one sample into a new history and share it. It will help if shared history contains teh ref genome, gene annotation, BAM file and a failed counting job.
Kind regards,
Igor
Hi Igor, you are helping me step by step, thank you.
the error seems to be: " failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is ‘gene_id’ ".
In this case you would be right, the annotation file might have some problems. is it possible to work on these files to correct them?
Hi Francesco,
A bit of background for annotation files and read counting procedure. Genomic annotations are made of features, such as exons, CDS, gene start etc. In formats like gtf and gff/gff3 a feature has two coordinates, start and end. A complex feature, such as a spliced gene contains information about lower lever features, such as exons. Type of a feature is described in column 3 (CDS, gene, etc), while relationship between different features are described in column 9. The read counting occurs at feature level and aggregated at attribute level. Check job setup for your featureCount job. GFF feature type is specified as “exon” (info from column 3), and GFF gene identifier is set to “gene_id” (info from column 9). Preview the gene annotation file used for the featureCounts jobs (dataset #133). It does nos not contain “exon” in column 3 and it does not contain “gene_id” in column 9. Now check the GTF version of the annotation (dataset #134). It does not contain “exon” in column 3, but it does have “gene_id” attribute in column 9, eg gene_id “BQ8897_RS00005”. Use the GTF file instead of GFF and change GFF feature type to “CDS” or “gene”. I usually prefer “CDS”, as I am not interested in tRNA etc.
Hope it does make sense. Search the forum for additional information: this situation was discussed on several occasions.
Kind regards,
Igor