ATAC-Seq Analysis Using MACS2 Callpeak

shgalaxyuser · November 12, 2024, 4:18pm

Hello everyone! I am new to Galaxy and I am trying to work my way through the ATAC-seq tutorial (Hands-on: ATAC-Seq data analysis / ATAC-Seq data analysis / Epigenetics). I am at the MACS2 Callpeak step, but keep receiving an error. Any and all insight is extremely appreciated! Thank you.

What I Input:

Number	73
Name	MACS2 callpeak on data 62 (Peaks in tabular format)
Created	Tuesday Nov 12th 9:19:05 2024 GMT-5
Filesize	-
Dbkey	mm10
Format	tabular
File contents	contents
History Content API ID	f9cad7b01a4721354ef60effa34ef599
History API ID	bbd44e69cb8906b5392a7444d65eb960
UUID	858aa35b-c60e-49d3-bcf7-54024dba6a08
Full Path	/corral4/main/objects/8/5/8/dataset_858aa35b-c60e-49d3-bcf7-54024dba6a08.dat

Tool Parameters

Input Parameter	Value
Are you pooling Treatment Files?	0
ChIP-Seq Treatment File	62: MarkDuplicates on data 57: BAM (as BED)

#SRR891268_R1

#SRR891268_R2|
|Do you have a Control File?|0|
|Format of Input Files|Single-end BAM|
|Effective genome size|1870000000|
|Build Model|nomodel|
|Set extension size|200|
|Set shift size|-100|
|Peak detection based on|qvalue|
|Minimum FDR (q-value) cutoff for peak detection|0.05|
|Additional Outputs|Peaks as tabular file (compatible wih MultiQC) Peak summits Scores in bedGraph files (–bdg)|
|advanced_options||
|When set, scale the small sample up to the bigger sample|0|
|Use fixed background lambda as local lambda for every peak region|0|
|Save signal per million reads for fragment pileup profiles|0|
|When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling|Not available.|
|The small nearby region in basepairs to calculate dynamic lambda|Not available.|
|The large nearby region in basepairs to calculate dynamic lambda|Not available.|
|Composite broad regions|nobroad|
|Use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region|1|
|How many duplicate tags at the exact same location are allowed?|all|
|Minimum fragment size in basepair|20|
|Buffer size|100000|

The Error I Received:

Galaxy Tool ID	toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.9.1+galaxy0
Job State	error
Command Line	export PYTHON_EGG_CACHE=`pwd` && (macs2 callpeak -t ‘/corral4/main/objects/e/8/c/dataset_e8caa316-80eb-411d-8233-08819f152174.dat’ --name MarkDuplicates_on_data_57__BAM__as_BED_ --format BAM --gsize ‘1870000000’ --call-summits --keep-dup ‘all’ --d-min 20 --buffer-size 100000 --bdg --qvalue ‘0.05’ --nomodel --extsize ‘200’ --shift ‘-100’ 2>&1 > macs2_stderr) && cp MarkDuplicates_on_data_57__BAM__as_BED__peaks.xls ‘/corral4/main/jobs/062/436/62436865/outputs/dataset_858aa35b-c60e-49d3-bcf7-54024dba6a08.dat’ && exit_code_for_galaxy=$? && cat macs2_stderr 2>&1 && (exit $exit_code_for_galaxy)
Tool Standard Output	INFO @ Tue, 12 Nov 2024 14:19:09: # Command line: callpeak -t /corral4/main/objects/e/8/c/dataset_e8caa316-80eb-411d-8233-08819f152174.dat --name MarkDuplicates_on_data_57__BAM__as_BED_ --format BAM --gsize 1870000000 --call-summits --keep-dup all --d-min 20 --buffer-size 100000 --bdg --qvalue 0.05 --nomodel --extsize 200 --shift -100 # ARGUMENTS LIST: # name = MarkDuplicates_on_data_57__BAM__as_BED_ # format = BAM # ChIP-seq file = [‘/corral4/main/objects/e/8/c/dataset_e8caa316-80eb-411d-8233-08819f152174.dat’] # control file = None # effective genome size = 1.87e+09 # band width = 300 # model fold = [5, 50] # qvalue cutoff = 5.00e-02 # The maximum gap between significant sites is assigned as the read length/tag size. # The minimum length of peaks is assigned as the predicted fragment length “d”. # Larger dataset will be scaled towards smaller dataset. # Range for calculating regional lambda is: 10000 bps # Broad region calling is off # Paired-End mode is off # Searching for subpeak summits is on INFO @ Tue, 12 Nov 2024 14:19:09: #1 read tag files… INFO @ Tue, 12 Nov 2024 14:19:09: #1 read treatment tags… struct.error: unpack requires a buffer of 4 bytes Exception ignored in: ‘MACS2.IO.Parser.BAMParser.tsize’ Traceback (most recent call last): File “/usr/local/lib/python3.10/site-packages/MACS2/callpeak_cmd.py”, line 389, in load_tag_files_options ttsize = tp.tsize() struct.error: unpack requires a buffer of 4 bytes Traceback (most recent call last): File “/usr/local/bin/macs2”, line 653, in main() File “/usr/local/bin/macs2”, line 51, in main run( args ) File “/usr/local/lib/python3.10/site-packages/MACS2/callpeak_cmd.py”, line 65, in run else: (treat, control) = load_tag_files_options (options) File “/usr/local/lib/python3.10/site-packages/MACS2/callpeak_cmd.py”, line 391, in load_tag_files_options treat = tp.build_fwtrack() File “MACS2/IO/Parser.pyx”, line 1169, in MACS2.IO.Parser.BAMParser.build_fwtrack File “MACS2/IO/Parser.pyx”, line 1181, in MACS2.IO.Parser.BAMParser.build_fwtrack File “MACS2/IO/Parser.pyx”, line 1166, in MACS2.IO.Parser.BAMParser.get_references struct.error: unpack requires a buffer of 4 bytes
Tool Standard Error	empty
Tool Exit Code	1
Job Messages	Job Message 1:

desc: Fatal error: Exit code 1 ()
error_level: 3
exit_code: 1
type: exit_code

Job Message 2:

desc: Fatal error: Matched on error:
error_level: 3
match: error:
stream: stdout
type: regex|
|Job API ID|bbd44e69cb8906b5410c5d52606a7d81|

jennaj · November 12, 2024, 5:41pm

Hi @shgalaxyuser !!

Double check your target genome to map against at this step in the tutorial → Hands-on: ATAC-Seq data analysis / ATAC-Seq data analysis / Epigenetics (mapping)

Notice how that step is against the hg38 human genome, but your BAM dataset was mapped against the mouse mm10 genome. How can I tell? Because the assigned database “dbeky” is for mouse. If you click on your input BAM dataset on that same job details view you are copy/pasting from, you can see it.

Your form is set up correctly for human here:

That “mismatch” is what the tool is reporting with this part of the error message. It was expecting the human chromosome identifiers but found the mouse chromosome identifiers in the BAM.

This means that you will need to back up and re-do the mapping step again. It looks like you have deleted this dataset already, so I’m guessing that you maybe already found the problem and are doing that now?

And, great that you posted all of the details! Very helpful. Next time you can go ahead and also include the history share link – that will allow even more people to help! Since you are following a tutorial, the data shouldn’t have any privacy concerns. Even if this was your own data, you can always share for help then unshare after.

Please let us know how this goes and if you need more help!

Topic		Replies	Views
ATAC-Seq MACS2 callpeak error	1	31	October 2, 2024
ATAC-seq data analysis tutorial: troubleshooting gtn-tutorial , atac-seq	12	156	December 13, 2024
Galaxy ATAC Seq final step error. troubleshooting	1	152	January 17, 2024
ATAC-seq macs2 peak splitting in sliding windows gtn-tutorial , macs2 , atac-seq	1	974	September 13, 2019
MACS2 peak calling, there is not a explicit input control gtn-tutorial , macs2	2	729	November 28, 2022

ATAC-Seq Analysis Using MACS2 Callpeak

Tool Parameters

Related topics