Bam file to fasta file - Genome assembly

Hello,
I need your help because I would like to find solution to convert my Bam file in a Fasta file.
I am explaining my project :
I sequenced my species genome and I made a De Novo Assembly by Spades, after that I mapped this Spades fasta file with my non annotated reference genome (I have only contigs) with LASTZ, the output is a Bam file. Now I would like to pile these datas to find a more precise Fastq genome sequence.
What can I do ?

I tested a lot of tools like concatenate, merged, GetfastaBed but nothing is well because I don’t want only sequences mapped on genome reference but create a new genome with a combinaison about these 2 files (reference genome and bam file).

Hopefully to be clear,

Thank you

Marion

1 Like

Is there a protocol you are following? Combining new reads (pre-assembled or not) with an existing assembly to close/fill gaps involves processing steps beyond simply mapping one to the other. Please start by reviewing these Galaxy Tutorials and linked references:

More about how/where to run tutorials:

Not all assembly-related tools are covered in the tutorials, but this should give you an idea of where you are now and possible analysis paths. If your genome is smaller (prokaryotic), using a public Galaxy server can work. For larger genomes (eukaryotic), more resources will likely be needed – and that would involve setting up your own Galaxy server with sufficient resources. Any tool that you want to use could be wrapped for Galaxy if it hasn’t been already – check the ToolShed Galaxy | Tool Shed to find out of wrapped or not.

We can follow up from there as needed: e.g. help for how to wrap a tool or request that it be wrapped, appropriate domain-specific public Galaxy servers, how to set up your own Galaxy, and the like.

Hi,
Thank you for your answer, I followed Galaxy Tutorials but I didn’t find a solution, could you order me tools succession to obtain a fasta file from my Bam? please

What tools can I use after Spades and LASTZ? I followed Making sense of a newly assembled genome tutorial but it’s only for procaryote and I am working with eucaryote species and I have own private Galaxy server.

1 Like

Hi Marion,

Sharing the tutorials was intended to help you understand the factors to consider. Bacterial is covered because they are simpler to assemble during training. There isn’t a “one-size-fits-all” workflow for eukaryotic genome assembly: whether de-novo, reference guided (same or cross-species), or involving the extension/finishing of existing contig/scaffolds/long-reads.

SPAdes wrapped for Galaxy is what you already used but perhaps without incorporating the existing contigs? You could try that and evaluate the result.

A broader assembly overview that you might find helpful (new):

Dominguez Del Angel V, Hjerde E, Sterck L et al. Ten steps to get started in Genome Assembly and Annotation [version 1; referees: 2 approved]. F1000Research 2018, 7 (ELIXIR):148
(https://doi.org/10.12688/f1000research.13598.1)

In short, review publications and tool manuals to determine the best pathway for your analysis goals then construct your own workflow in Galaxy.

Once you have an idea about your intended processing method, check to see if those tools are wrapped for Galaxy or not in the ToolShed’s “Assembly” group. Tools that are not yet wrapped could be.

To “test drive” many of the existing wrapped assembly tools, install them into your Galaxy or review/try the tools under the tool group “Assembly” at public Galaxy servers. Galaxy EU is a good resource as it incorporates more assembly tools than the Galaxy Main and Galaxy AU public servers. Be sure to use downsampled data – a full genome would probably exceed resources on any public server.