Fragmented plastome sequence

Hello,

I’m trying to get a organelle sequence using the GetOrganelle tool.

However, after uploading the files, i got a fasta file divided into “scaffolds” 1-4. When uploading this file on Geseq to generate a plastome graph, it’s actually generating 4 graphs, one for each scaffold.
Is there a way of getting the full non-divided plastome sequence so i can generate a single graph?

Welcome @asteraceae

You are loading up the fasta file to here, correct?

From what I can tell, yes, each sequence is reported in a separate graph by default. The help at that site states that multiple scaffolds/contigs can be combined. I didn’t find any advice about how to format this but may have missed it, and I didn’t find a direct support contact but please double check me.

Suggestions

Your ideas to combine scaffolds into a single fasta sequence is what I would try too! This would involve some decisions.

  1. Order of scaffolds.

    • The default order of 1-4 would be a good place to start.
  2. . How to designate the breakpoints between different scaffolds.

    • For nucleotide sequence, it is a common practise to add in a “gap” string such as NNN[N] of some standard length.
    • The length should probably be distinguishable from actual gaps (if any) and the same for all.

How to

Try a tool like gfastats.

Instructions are here → gfastats/instructions at main · vgl-hub/gfastats · GitHub

The form will look something like this

Then, create a SAK text file (txt datatype when uploading this to Galaxy) with your JOIN instructions. Yours will look like the first line here:

That line copied out here, and modified to have four scaffolds and three gaps.

JOIN    scaffold1+    scaffold2+   scaffold3+   scaffold4+    50  gap1  50  gap2  50 gap3 new_scaffold

Where scaffold1+ is the identifier for the first scaffold in the fasta title line.

Example:

the fasta has >myscaffold so this would be myscaffold+ in the file.


The tool should be able to handle the special characters in the identifiers. But if complains, you can use a tool like this to rename them: Rename sequences with the option Numerical counter.

Please give this a try! :slight_smile: