I am currently trying to put together an avian reference genome using the Vertebrate Genome Project workflow on Galaxy. To provide some context, I have HiC and HiFi sequences from the same individual (female specimen). I am therefore following the HiC-phased assembly with hifiasm mode.
I have been able to successfully follow the assembly tutorial up until the scaffolding step. Just after mapping the HiC forward and reverse (using ‘BWA MEM’), the tutorial instructs to merge the alignments using the tool ‘filter and merge’. However, this tool does not run properly and the error messages says: “the input files do not have the same sequence names or lengths”.
So I went back and had a look at my haplotype assemblies and found that Hap1 has 792 contigs and Hap2 has 382 contigs. I suspect that this unequal length might be the cause of the issue at the scaffolding step, but I am unsure. I also think that maybe this is because my specimen is a female bird (heterogametic sex in birds) and maybe the sex chromosomes are impacting this? However, because this is my first time trying to assemble a genome so I am slightly unsure of what is happening and how to resolve this issue. I am happy to provide more details if needed.
Thank you for your help in advance. I really appreciate it!
Nirjana
The assembly workflow I am following:
Delphine Lariviere, Alex Ostrovsky, Cristóbal Gallardo, Anna Syme, Linelle Abueg, Brandon Pickett, Giulio Formenti, Marcella Sozzoni, Anton Nekrutenko, Vertebrate genome assembly using HiFi, Bionano and Hi-C data - Step by Step (Galaxy Training Materials). https://training.galaxyproject.org/training-
It includes this note, which I think is relevant for you. We should probably put this at the top of the other workflow to make the usage clearer (I’ll follow up on it!).
For your own data, the important part is to use the same exact reference target dataset for both the forward and reverse reads BWA-MEM mapping step. This way the BAM index is based on the same assembly version – specifically, the string of reference bases – and you’ll be able to merge the BAMs together. The result is all of the mappings combined and using the same coordinate system. Then you can rerun the entire process again on the other haplotype assembly, and proceed to downstream workflow.
How to check
Click into the Job Information Details view for both of the input BAMs and check the Inputs/Parameters table.
Was the same reference target dataset selected for both the F and R mappings? This would be your Hap1 assembly OR Hap2 assembly, not both at the same time. You will be resolving each assembly independently first. Comparisons between assemblies is a downstream analysis process.
Were the forward and reverse Hi-C reads from the same sample, at the same processing stage? Mixed up samples will lead to other problems.
And, because of the way these data are merged, the sort order is important, so also double check that both had the QNAME sorting toggled on or you will run into another different problem!
You can also run a tool on the BAMs that pulls out the reference database target’s index into a summary table. This is just a way to check to see how the two different target choices may have differed using a different method. You are expecting both to be identical for the first two columns (the latter two counts columns may differ and that’s fine).
Samtools idxstats reports stats of the BAM index file
Please give that a review and let us know if my guess was correct or not, and if you find and can adjust your data to resolve the error!
Thank you for your quick response! and thank you so much for your help. I went through the ‘How to check’ checklist and I think I made a silly mistake. I incorrectly aligned the forward reads to Hap1 assembly and reverse reads to Hap2 assembly and then tried to merge it. No wonder it didn’t work!
I think it was very clear in the tutorial that we are supposed to work with one haplotype assembly at a time. But because I do not have Bionano data, I had to use another reference file and perhaps that is what caused my confusion.
I have spent the past few days resolving this issue and have managed to complete the scaffolding stage! Thank you so much for your help!
Great! I’m really glad to learn you got this working and that the tutorials are useful! Sometimes just explaining an issue to someone else helps to “see” what might be going wrong.