I am trying to run Extract Genomic DNA for the Daphnia magna reference genome (fasta) (Assembly: ASM2063170v1.1) and corresponding gtf file. I have downloaded the respective files from NCBI and uploaded them on Galaxy. I have specified the comment lines for the GTF file.
Next I tried to launch Extract Genomic DNA by selecting the genome fasta and respective gtf file from my history but it refuses to launch and gives me this weird warning message I cannot fix: Unspecified genome build, click the pencil icon in the history item to set the genome build
***Fetch sequences for intervals in ****
As I uploaded the files to galaxy of course there is no genome build available for Daphnia magna. Further I have realized it marks my genome fasts as (unavailable). Why?
Can someone please help me to figure out what is going on here? I think it might have to do with the custom genome build. Btw I made sure to have the fasta correctly formatted using the built in fasta harmonizing tool NormalizeFasta in Galaxy.
Thanks for your suggestions. The fasta and gtf file are correctly formatted also interpreted as such by galaxy. However, I noticed that the fasta contains lower and upper case letters within the fasta sequence.
Could this be the reason why the fasta is not recognized as a custom genome?
Also thank you for suggesting the bedtools MaskFastaBed. However, here I have the same problem that Galaxy does not recognize my fasta as usable reference genome sequence, although it is correctly identified as fasta.
I will sit down and make a new and clean history of my work and will share it here, hoping that someone might be able to identify the problem. Thx @jennaj for sharing this valuable information with me.