duplicates in MAGeCKs test

jean-pierre_de_villa · March 11, 2025, 10:02am

How to deal with duplicates in Mageck through Galaxy?

jennaj · March 11, 2025, 6:47pm

Would you be able to explain what is going on a bit more? Maybe with a shared history as an example, and screenshots? Thanks!

jean-pierre_de_villa · March 11, 2025, 7:36pm

Thanks
I will have 2 fastq files corresponding to a duplicate of a same experiment setting. How “to say” to Mgeck to consider these 2 files as duplicate?

jennaj · March 11, 2025, 8:33pm

Hi @jean-pierre_de_villa

Thanks for clarifying.

From the MAGeCK wiki: How to specify the technical replicates (“duplicates”). → MAGeCK / Wiki / QA

To pool the reads in Galaxy, combine the fastq files before the counting step.

Quoted note from our tutorial here → Hands-on: CRISPR screen analysis / CRISPR screen analysis / Genome Annotation

Comment: Replicates

If we have biological and/or technical replicates we can handle them in a similar way to that described on the MAGeCK website. For biological replicates, we input them in MAGeCK test Treated Sample Labels/Control Sample Labels fields separated by a comma. For technical replicates, we could combine the fastqs for each sample/biological replicate, for example with the Concatenate datasets tool, before running MAGeCK count.

Thanks for explaining more, and hope this helps!

jean-pierre_de_villa · March 12, 2025, 7:10am

Thanks Jenni,
Although it seems quite easy to concatenate fq files for technical replicates things are not so evident for biological replicates:
For biological replicates, treat them as separate samples and use them together when doing the comparison; so MAGeCK can analyze the variance of these samples. For example in the test command, “-t sample1_bio_replicate1,sample1_bio_replicate2 -c sample2_bio_replicate1,sample2_bio_replicate2” compares 2 samples (with 2 biological replicates in each sample)
I don’t see where the -t option applies on the galaxy version of Mageck-test on MainGalaxy (https://usegalaxy.org/).

I do find the instruction on Galaxy Europe (https://usegalaxy.eu/) though:
If sample label is provided, the labels must match the labels in the first line of the count table, separated by comma (,); for example, HL60.final,KBM7.final. For sample index, 0,2 means the 1st and 3rd samples are treatment experiments. See Help below for a detailed description. (–treatment-id)

I should be fine
perhaps you could add that sentence in main Galaxy?
Thanks++ again
Best

jennaj · March 12, 2025, 5:23pm

Hi @jean-pierre_de_villa

Thanks for clarifying more! I thought the confusion was for technical replicates. Whoops! Yes, biological replicates are input as separate fastq files.

The sample sheet (count file) uses sample names that can be automatically interpreted from the file names (or better, element identifiers from a collection of inputs) or you can rename these on the MAGeCK count tool form.
Then for the MAGeCK test runs, you can reference these sample identifiers by name or position.
The test can happen a few ways, too. You can limit by the samples included, but you can also change how the counts are generated, output extra statistic files, limit by the first 1 million reads per sample, and such. (see the Output/Advanced Options)

This might be confusing since the servers have different versions of the tool forms loaded up. These synch about once a week, and there are sometimes reasons to have these not synched. I think this is all fine right now.

But I’m wondering if you spotted something else we need to report. When I reviewed, it looks like current forms both include this option to clarify the treatment groups per run. What am I missing? The ORG server has a newer version of the form, and if a mistake was made by omitted something important, we can get that corrected! Maybe share a screenshot back to help me to see it?

Xref MAFeCk change log

This is what I was reviewing.

Count

Test

Thanks and I’m glad you have this working! We appreciate the feedback from everyone using these!

Topic		Replies	Views
Integrating Day 0 Controls in a CRISPR Screen Analysis with Mageck: Seeking Advice single-cell	1	25	March 21, 2025
Troubleshooting Mageck Count genome-annotation	3	168	July 10, 2024
Error MAGeck count tool usegalaxy.org support	0	384	December 17, 2020
How to remove duplicates in a concatenated paired dataset? usegalaxy.org.au support workflow , metagenomics , mothur	0	402	September 16, 2021
Paired sample comparison in the MAGeCKs test usegalaxy.eu support tool-dev	1	176	March 8, 2024

duplicates in MAGeCKs test

Related topics