Hello,
I’m trying to get a organelle sequence using the GetOrganelle tool.
However, after uploading the files, i got a fasta file divided into “scaffolds” 1-4. When uploading this file on Geseq to generate a plastome graph, it’s actually generating 4 graphs, one for each scaffold.
Is there a way of getting the full non-divided plastome sequence so i can generate a single graph?
Welcome @asteraceae
You are loading up the fasta file to here, correct?
From what I can tell, yes, each sequence is reported in a separate graph by default. The help at that site states that multiple scaffolds/contigs can be combined. I didn’t find any advice about how to format this but may have missed it, and I didn’t find a direct support contact but please double check me.
Suggestions
Your ideas to combine scaffolds into a single fasta sequence is what I would try too! This would involve some decisions.
-
Order of scaffolds.
- The default order of 1-4 would be a good place to start.
-
. How to designate the breakpoints between different scaffolds.
- For nucleotide sequence, it is a common practise to add in a “gap” string such as NNN[N] of some standard length.
- The length should probably be distinguishable from actual gaps (if any) and the same for all.
How to
Try a tool like gfastats.
Instructions are here → gfastats/instructions at main · vgl-hub/gfastats · GitHub
The form will look something like this
Then, create a SAK text file (txt datatype when uploading this to Galaxy) with your JOIN instructions. Yours will look like the first line here:
That line copied out here, and modified to have four scaffolds and three gaps.
JOIN scaffold1+ scaffold2+ scaffold3+ scaffold4+ 50 gap1 50 gap2 50 gap3 new_scaffold
Where scaffold1+ is the identifier for the first scaffold in the fasta title line.
Example:
the fasta has >myscaffold so this would be myscaffold+ in the file.
The tool should be able to handle the special characters in the identifiers. But if complains, you can use a tool like this to rename them: Rename sequences with the option Numerical counter.
Please give this a try! 