Hi there,
I am trying to use FastTree to generate a tree to be visualized on Phandango.
I have data of 4 E. coli isolates sequences. Three of them are on our own collection (natural strains) and the fourth was a reference genome from NCBI of E. coli K12 MG1665 (This strain was also part of the experiments, thus I wanted to include it on the tree)
Originally I had my own annotations with Prokka (from Linux) on the three strains. With the Gff3 files, I ran Roary followed by FastTree (nucleotide) on these three isolates and it worked (used core aligment and newick file from Roary)
As I wanted to expand the analysis and include more sequences (for a better understanding of the core genome). I started simple, adding the 4th sequence (MG1665). Running Prokka in Linux for this sequence was no longer an option for me (changed institutions), so I decided to rerun prokka on the four isolates, including MG1665.
Then I ran Roary on those 4 strains and worked.
When running FastTree (nucleotide) on them, I got the following error: “This job was terminated because it used more memory than it was allocated”.
Since they are only 4 isolates, I wondered if this was a memory problem (I run on the galaxy guest, not a cloud server).
Maybe relevant… or not:
- The three first strains were annotated from scaffolds, while the MG1665 was annotated from the chromosome sequence.
- The Newick file from Roary for the 3 strains and 4 strains looks like this (P.S. I shortened the names of the three sequences on the last one)
3 strains: (38.27_PROKKA_09102020.gff:1.843208323,39.62_PROKKA_09102020.gff:0.806511634,9.54_PROKKA_09082020.gff:0.000000005);
4 strains:
(39.62.gff:0.417292218,9.54.gff:0.138018002,(38.27.gff:0.253699967,MG1655.gff:0.218665394)1.000:5.990038221);
From the Newick files, I see a big change, yet I don’t know if it is caused by real differences in the sample or by the processing. Trying different possibilities is tricky, as Roary takes around 12h to complete every combination of strains that I am using.
P.S. I am quite new to WGS analysis, so I apologize if I have done something stupid.
Thank you in advance.