ATAC-seq data analysis tutorial: troubleshooting

Hi there,

I’m following the tutorial for ATAC-seq analysis (Hands-on: ATAC-Seq data analysis / ATAC-Seq data analysis / Epigenetics) and now I’m studing computematrix output for ATAC-seq heatmap.
I noticed that the profile above the heatmap shows two peaks, one before and one after the TSS. In your opinion could it be possible?

Welcome, @bioiz

If you want to share your tutorial history we can help to correct the settings. My first guess is that you missed the some of the Advanced Options with MACS2 when calling the peaks.

How to share is in the banner of this forum, also here, → How to get faster help with your question

Let’s start there, and let us know if you solve this yourself! :slight_smile:

Hi @jennaj,

Thank you for your support.
I can not share with all of you my history, but I can paste here the parameters of MACS2 call peak.
I already checked with the tutorial and I can not find some missing parameters. Do you?

Thanks again.

Irene



I noticed that I missed a “-” in the “set shift size” parameter. I rerun MACS2 and tried to convert it into a bigwig file as reported in the tutorial but it gives me this error.

" bedGraph error line 14038305 of trackless: chromosome chr14_KI270723v1_random has size 38115 but item ends at 38178
bedGraph error line 28298034 of trackless: chromosome chr22_KI270731v1_random has size 150754 but item ends at 150772
bedGraph error line 28298035 of trackless: chromosome chr22_KI270731v1_random has size 150754 but item ends at 150843
bedGraph error line 46765819 of trackless: chromosome chr9_KI270718v1_random has size 38054 but item ends at 38062
bedGraph error line 46765820 of trackless: chromosome chr9_KI270718v1_random has size 38054 but item ends at 38081
bedGraph error line 46765821 of trackless: chromosome chr9_KI270718v1_random has size 38054 but item ends at 38110
bedGraph error line 46765822 of trackless: chromosome chr9_KI270718v1_random has size 38054 but item ends at 38121
bedGraph error line 46770621 of trackless: chromosome chrUn_GL000216v2 has size 176608 but item ends at 176657
bedGraph error line 46772306 of trackless: chromosome chrUn_KI270303v1 has size 1942 but item ends at 1985
bedGraph error line 46772307 of trackless: chromosome chrUn_KI270303v1 has size 1942 but item ends at 2038
bedGraph error line 46772347 of trackless: chromosome chrUn_KI270330v1 has size 1652 but item ends at 1665
bedGraph error line 46772407 of trackless: chromosome chrUn_KI270336v1 has size 1026 but item ends at 1045
bedGraph error line 46772408 of trackless: chromosome chrUn_KI270336v1 has size 1026 but item ends at 1089
bedGraph error line 46772425 of trackless: chromosome chrUn_KI270337v1 has size 1121 but item ends at 1161
bedGraph error line 46772426 of trackless: chromosome chrUn_KI270337v1 has size 1121 but item ends at 1202
bedGraph error line 46772680 of trackless: chromosome chrUn_KI270435v1 has size 92983 but item ends at 92998
bedGraph error line 46773430 of trackless: chromosome chrUn_KI270438v1 has size 112505 but item ends at 112518
bedGraph error line 46773431 of trackless: chromosome chrUn_KI270438v1 has size 112505 but item ends at 112520
bedGraph error line 46773432 of trackless: chromosome chrUn_KI270438v1 has size 112505 but item ends at 112523
bedGraph error line 46773433 of trackless: chromosome chrUn_KI270438v1 has size 112505 but item ends at 112528
bedGraph error line 46773434 of trackless: chromosome chrUn_KI270438v1 has size 112505 but item ends at 112530
bedGraph error line 46773435 of trackless: chromosome chrUn_KI270438v1 has size 112505 but item ends at 112533
bedGraph error line 46773436 of trackless: chromosome chrUn_KI270438v1 has size 112505 but item ends at 112539
bedGraph error line 46776764 of trackless: chromosome chrUn_KI270442v1 has size 392061 but item ends at 392090
bedGraph error line 46776874 of trackless: chromosome chrUn_KI270466v1 has size 1233 but item ends at 1241
bedGraph error line 46776875 of trackless: chromosome chrUn_KI270466v1 has size 1233 but item ends at 1276
bedGraph error line 46776876 of trackless: chromosome chrUn_KI270466v1 has size 1233 but item ends at 1325
bedGraph error line 46778371 of trackless: chromosome chrUn_KI270591v1 has size 5796 but item ends at 5868"

How can I fix it?

Thank you again.

Irene

1 Like

Hi @bioiz

Which tool are you using to convert the BAM to a bigWig file?

Remember that MACS2 can report “overhanging” chromosome coordinates. That means you need to use a conversion tool that has an extra option to trim those down before using certain tools.

The tutorial data is fine with the version of the tool in the hands-on, and the workflow mentioned here in the tutorial uses a version of the tool that will trim correctly.

You can also find that tool here directly:

If you open up the full parameter list, you’ll see the default option toggled on for the trimming.

In general, if you ever see a tool again complain in a message about mismatched chromosome lengths, there are three primary things to check:

  1. that the correct reference genome is assigned to the input metadata (database, or fasta index)
  2. the option of a form is set correctly – the target reference database/fasta index
  3. then, rarely, consider looking for minor tool quirks like I am describing here.

MACS2 is the only tool I can think of right now that reports those “overhanging” coordinates in a bed-style file but I am sure there are others! Please review this discussion at the MACS forum if you are curious.

Please give that a try and see how it works! You could also consider using that workflow as a template and running it to see what happens. You can adapt it to fit your target genome, reference data, input type, etc. Or, you can later extract a workflow from your own history once you get the process worked out.

Hope this helps! :slight_smile:

Dear @jennaj ,

to converto the BAM to a bigWig file I used the tool reported in the tutorial that is the one you have mentioned (wigtobigwig

bedGraph or Wig to bigWig converter (Galaxy Version 472+galaxy0).
I used the following parameters. Can you see something wrong?
I have also checked the reference genome associated to the input metadata in all my analysis and it is the same, hg38. The only one sample that does not report the database is the trimmed processed file (Trim Galore!), could be this the origin of the problem?

Thank you very much for your kindly support.

Best,

Irene

Hi @bioiz

It seems you are mentioning two different tools here but maybe I am misunderstanding. The screenshot does seem to show that a clip parameter was applied. If you click on that dataset 225 – you’ll see a “database” assigned. Make sure that matches what you mapped against. It should be hg38 (and not some other variation).

There is never a database assigned to read files. Why? Those are just strings of DNA or RNA – they aren’t attached to a genome assembly yet (chromosome coordinates). Reads are from a species, but that is a different concept. Each species can have many “assemblies”. Once you have coordinates based on a specific assembly, you want to keep using that in all downsteam steps.

So … if the coordinates are being trimmed by the conversion tool, and you are getting an error like this

bedGraph error line 28298034 of trackless: chromosome chr22_KI270731v1_random has size 150754 but item ends at 150772

Then look for mismatched assemblies. Did you really use hg38 for all steps? If you used the hg38 genome in Galaxy, you would still be suppling your own reference data: GTF or BED. Are you sure that was based on the UCSC hg38 assembly?

Mixing up data from different assemblies is pretty common for people new to bioinformatics analysis. We have a guide here that can help.

Hope that helps!

Hi @jennaj,

Thank you very much for your help in understanding what the problem with my analysis is.

I checked the reference genome at every step and yes, I still have hg38. However, I realized that in the samples of which I have previously attached the screenshot, I did the alignment on hg38 and not on Canonical hg38, as indicated in the tutorial.

To check that that wasn’t the problem, I relaunched the entire analysis aligned on Canonical hg38 in the Bowtie2 but when I have to convert the MACS2 bedgraph into bigwig with the wigtobigwig tool it gives me this error.

  slurmstepd: error: *** JOB 2631873 ON js2-mem-large9 CANCELLED AT 2024-12-09T18:54:07 DUE TO TIME LIMIT ***

Do you have any suggestion?

Thank you

Irene

Hi @bioiz

Your error message is described here. → FAQ: Understanding walltime error messages

However, since that was run at UseGalaxy.org and we had a throttle on the server yesterday, I would rerun this to see what happens before digging into any potential content issues.

Thanks! :slight_smile:

Hi @jennaj,

I was aware of the throttling on UseGalaxy.org and for this reason I waited for the restoration and then I reran the analysis a few times, always getting the same error.
Since I checked the reference genome in all processes and it’s always hg38, I checked the dimension of my file and it’s 1.4Gb. Is this too heavy? Could this be the reason for the conversion failure? How can I fix it? Maybe I need to use the bamCoverage tool?

Moreover, I have another concern: the wigtobigwig tool gives this error only when I use as input the bedgraph derived from MACS2, as reported in the tutorial, and not with the “wrong” bedgraph obtained when I missed the minus in the --shift parameter or when I tried to call the peaks directly from the BAM file without converting it to BED format (as reported in the tutorial).
In your opinion, how is it possible?

I hope to find a solution soon.

Thanks for your help.

Irene

Hi @bioiz

I’m not sure but can say that content is more important that size. If you share your history, I can take a look. The workflow works as far as I know for this tutorial but maybe there is some problem. We can investigate and fix it: either the data or the workflow or both. Thanks! :slight_smile:

Hi @jennaj,

I shared with you the link to my history in a private message. Thanks!! :slight_smile:

1 Like

Resolved in a private message: getting parameters to better match data content.

As a supplemental, this is a good place to look up what messages from MACS2 mean. Galaxy hosts the original tool, so it will work the same. → https://groups.google.com/g/macs-announcement