Bismark Meth. Extractor not generating coverage files

Hello,
I am currently trying to analyse some methylation sequencing data using Bismark. When I try to extract the methylation status using the Bismark Meth. Extractor, I am expecting to also get a .cov coverage file. However, this does not appear to be being generated.

The tool output gives the following:

Writing bedGraph to file: dataset_e734b08e-b355-49c5-94cc-61ef7b831d1e.datbedGraph.gz.
Also writing out a coverage file including counts methylated and unmethylated residues to file: dataset_e734b08e-b355-49c5-94cc-61ef7b831d1e.datbismark.cov.gz

However, I do not see the cov.gz in the resulting result archive. Am I using an incorrect setting (I had asked it to give me the genome wide cytosine report) or is this something that it currently does not output?

Any help would be greatly appreicated!
Many thanks,
Lewis

Welcome @Lewis!

I think your cov file is supposed to be the

Methylation proportion report for each possible position in the read (Mbias Report)

This report shows the methylation proportion across each possible position in the read (described in further detail in:Hansen et al., Genome Biology, 2012, 13:R83). The data for the M-bias plot is also written into a text file and is in the following format:

read position | count methylated | count unmethylated | % methylation | total coverage

Can you check if this is in the output datasets?

Hi David,

Many thanks for your reply! The M-bias report is not quite what I am after. This report is to check whether there is any bias in methylation across the read. As I did 150 PE sequencing, this outputs a file for each one of the 150 positions. What I am after is a coverage file that has counts (both methylated and unmethylated) and their genomic position. The methylation extractor should give this as a .cov file, but this doesn’t appear to be being saved. Here is a screenshot from the Bismark manual of what I am after:

Kind regards,
Lewis

1 Like

Ok, @Lewis ,

I have some guesses:

  1. The *cov.gz file declared at the output log is only created as a temp file in order to build other outputs;

Writing bedGraph to file: dataset_f5b173bc-5683-4e3a-aae9-bbfab0223a5b.datbedGraph.gz
Also writing out a coverage file including counts methylated and unmethylated residues to file: dataset_f5b173bc-5683-4e3a-aae9-bbfab0223a5b.datbismark.cov.gz

(…)

Output will be written into the directory: /data/dnb03/galaxy_db/job_working_directory/022/607/22607745/tmp/tmp7xnxvb6j/
Summary of parameters for genome-wide cytosine report:

Coverage infile: dataset_f5b173bc-5683-4e3a-aae9-bbfab0223a5b.datbismark.cov.gz
Output directory: >/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/tmp/tmp7xnxvb6j/<
Parent directory: >/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/tmp/tmp7xnxvb6j/<
Genome directory: >/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/tmp/tmpeuc9ajoz/<
CX context: yes
Genome coordinates used: 1-based (default)
GZIP compression: no
Split by chromosome: no
.

  1. The coverage files are only output separately with the optional parameter --bedgraph ((IV) Bismark methylation extractor):

Alternatively, the output of the methylation extractor can be transformed into a bedGraph and coverage file using the option --bedGraph (see also --counts ).;

  • which I can’t see in the command line of my tests:

Methylation extractor run with: 'bismark_methylation_extractor --no_header -o /data/dnb03/galaxy_db/job_working_directory/022/607/22607745/tmp/tmp7xnxvb6j --single-end --comprehensive --report --bedGraph --CX_context --cytosine_report --CX_context --genome_folder /data/dnb03/galaxy_db/job_working_directory/022/607/22607745/tmp/tmpeuc9ajoz /data/dnb03/galaxy_db/files/f/5/b/dataset_f5b173bc-5683-4e3a-aae9-bbfab0223a5b.dat'

  1. Previsously, this error was solved and related to another optional parameter, buffer size (--buffer), in standalone bismark_methylation_extractor v0.20.0; But I can’t test this now.

Hi David,

I totally agree with you that it seems the .cov file is not being saved, and instead is being used only as a temporary file to generate the genome-wide cytosine coverage report. The --bedGraph option is definitely there in the command line code you copied above, so I don’t think that this is the issue. The bedgraph file (.datbedGraph.gz) is present in the results archive generated when asking for the genome-wide cytosine report, which is why I found it odd that the .cov file is not there as well. It would definitely be interesting to see if the --buffer option is the cause of this file not being present, however if it can’t be fixed then I think I’ll most likely have to find another way of running the bismark extractor! Many thanks for your help so far!

Sorry! I showed the text from the log file, but I meant the text in the Commmand Line field, from the Job Information

python ‘/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/bismark/ff6ee551b153/bismark/bismark_methylation_extractor.py’ --multicore “${GALAXY_SLOTS:-4}” --infile ‘/data/dnb03/galaxy_db/files/f/5/b/dataset_f5b173bc-5683-4e3a-aae9-bbfab0223a5b.dat’ --single-end --splitting_report ‘/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/outputs/galaxy_dataset_d5147783-6aac-4b1f-ab32-0634ac21f788.dat’ --mbias_report ‘/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/outputs/galaxy_dataset_9855a442-64fa-4f61-9305-d4f581661ad8.dat’ --cytosine_report ‘/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/outputs/galaxy_dataset_5a7de963-e281-40a2-af49-df099dde7ebb.dat’ --genome_file ‘/data/db/data_managers/h7n7_360722/seq/h7n7_360722.fa’ --cx_context --comprehensive --compress
‘/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/outputs/galaxy_dataset_d1e0feca-7641-4b02-84e6-7dff0081a634.dat’ --log_report ‘/data/dnb03/galaxy_db/job_working_directory/022/607/22607745/outputs/galaxy_dataset_61ded233-440e-4bf5-81b8-9214fa10b5d8.dat’

Ah I see, that definitely could be it! Is there any way of testing this theory out? I have no idea who to contact to see if the methylation extractor could be changed or modified on Galaxy to be able to produce the .cov file, or even if this is possible!

1 Like

It can be updated by someone from Galaxy team, but, unfortunately, they are very busy.

I can try to run Bismark standalone and test --buffer, but I can’t say when I’ll be able to do it.

Another option is try the bismark2bedGraph script mentioned in the manual, which may be also quicker.

1 Like

Hi @Lewis,
I have updated Bismark extractor in order to provide the coverage file: Bismark methylation extractor: include coverage output by gallardoalba · Pull Request #1131 · bgruening/galaxytools · GitHub. It will be available soon.

However, I recommend you to use Methyldakel instead of Bismark.

Regards

2 Likes

@David and @gallardoalba many thanks for all of your help and input, and for updating the extractor to provide the coverage file. I’ll make sure to look into Methyldakel if you recommend this as a better way of getting the methylation statistics.

1 Like