cut and run: MACS2 error

Hi, I would like to ask for help with MACS2. I follow tutorial Epigenetics: CUT&RUN data analysis and I have problem with MACS2 tool.
I tried to check: files are not empty, format of the file should be fine (I did single end), ref. genome correctly define ???


The history:
epigenetic tutorial - history: cut-run-data-analysis

Have a nice day!

Hi @bezinka

Thanks for posting the error along with the history! Very helpful.

The job grew into a process that exceeded the expected memory allocation. The EU server has significant resources, so there is likely a problem with the data content.

From a quick look, it seems your data (Rep1) was paired end, but the option on this tool form was set to single end. Paired end is handled slightly differently. Correcting this setting is what I would suggest starting with.

How to review settings: check the Job Details report

Hope this helps! If you get another error, it will probably be different, and we can follow up more. :slight_smile:

Dear Jennaj,

Thank you very much for your answer.

  1. I follow tutorial and there is to choose single end for MACS2: (that is recomended also for MACS2 in the ATAC-seq tutorial)

  2. I tried to run also with the pair-end with the same result.

  3. With the single-end settings: I tried to run it in different times and days - I deleted the files,but it behave the same, grey - orange for short time - red.

  4. I will follow ATAC-seq tutorial, there is recoding and than I will start cut and tag from scratch and see.

  5. I keep this history for some time, if you will have any idea, I will be happy to try and learn.

Thank you very much for you time. :slight_smile:

Best wishes :four_leaf_clover:

1 Like

AH, thank you for clarifying. I’m not sure why it is done that way. Maybe this is a specific choice to allow MACS2 to process this newer type of data. Let’s ask one of the scientists working on this, Hi @pavanvidem, would you be able to confirm the usage for this option – using single-end settings with paired-end data for this type of data?

I also ran through the tutorial’s workflow with the example data, and everything was successful, at two of the UseGalaxy servers. The single-end choice with MACS2 was applied, so we know the pipeline is technically “working” and and the issue is still centered around the data preparation. You will be able to access the shared histories and the workflow itself from the workflow invocation links, and these are fresh runs from today. Maybe you can notice what is different from these examples?

EU – same server as you are using

ORG – for comparison

Hi @bezinka I took another look at your original history, from the very start – and think I found the problem. It can be seen in the data tagging. It looks like the pairs were mixed up when building the collection, and that could certainly cause scientific algorithm problems later on, usually at the data reduction steps. Try correcting this first.

How the data is tagged now. Notice how the sample names are not consistent for both the forward and reverse reads. The tag is for the sampleID, not the R1/R2 notation that designates the read direction within a sample.

Both should be tagged with #rep1 in here

and #rep2 in here

I would suggest starting completely over. Load the data fresh from the tutorial, and first make sure the tags are assigned for each sample pair correctly based on the file name (not R1/R2 strand) and build the collection again. Using the Auto Build List function will be the easiest and should make a correct guess (it uses the file name, not tags).

If you have trouble, please can capture a screenshot of those steps and post it back and I can clarify exactly what to do, and I’ll see it tomorrow, or you can share back your new history with errors and I’ll check it.

Hope this helps! This is a good example of how getting sample data prepared at the very start is so important. It can be an unpleasant lesson, but everyone doing bioinformatics has it happen, I promise! :slight_smile: And we can still get feedback about the single/paired option with MACS2 because I am curious about it too.

Hi,
thank you very much for your help.
Such a (stupid) mistake, but I learned a lot. :blush:
I finished and anyone can find histories below (I will keep it as long as possible).

ATAC-seq:
mistake: I download gene v36, but ‘bedtools Intersect intervals’ doesn’t work with it; it has to be v38 (I put tag on wrong data)
history:
ATAC-seq history

Cun and run:
mistakes:

  1. I found only labeling of replicates - the MACS2 did not run when mislabeled.
  2. I misread BAM instead of BED for the first time` I left it in the history.
    history:
    Cut and run 2 history

Single-end BED:
From my understanding that is because I converted a BAM file into BED,so I lost pair-end information, so then I had to use single-end.
I went through tutorials at the end of the MACS2 tool and the pair-end is specified for the BAM file, or detected from the treatment file.

Best wishes :four_leaf_clover:

1 Like

Great! I’m glad this worked even if I am not exactly clear on why it does. BED can still be paired end. Maybe it doesn’t matter for the chemistry behind these reads.

And a HUGE THANK YOU for posting back your completed histories! This should be super helpful for anyone else with a similar error later on. I am going to mark your answer and the solution, and add this topic into the ones that show up at the top of searches. :rocket:

1 Like