duplicates in MAGeCKs test

How to deal with duplicates in Mageck through Galaxy?

Hi @jean-pierre_de_villa

Would you be able to explain what is going on a bit more? Maybe with a shared history as an example, and screenshots? Thanks! :slight_smile:

Thanks
I will have 2 fastq files corresponding to a duplicate of a same experiment setting. How “to say” to Mgeck to consider these 2 files as duplicate?

Hi @jean-pierre_de_villa

Thanks for clarifying.

From the MAGeCK wiki: How to specify the technical replicates (“duplicates”). → MAGeCK / Wiki / QA

To pool the reads in Galaxy, combine the fastq files before the counting step.

Quoted note from our tutorial here → Hands-on: CRISPR screen analysis / CRISPR screen analysis / Genome Annotation

Comment: Replicates

If we have biological and/or technical replicates we can handle them in a similar way to that described on the MAGeCK website. For biological replicates, we input them in MAGeCK test Treated Sample Labels/Control Sample Labels fields separated by a comma. For technical replicates, we could combine the fastqs for each sample/biological replicate, for example with the Concatenate datasets tool, before running MAGeCK count.

Thanks for explaining more, and hope this helps! :slight_smile:

Thanks Jenni,
Although it seems quite easy to concatenate fq files for technical replicates things are not so evident for biological replicates:
For biological replicates, treat them as separate samples and use them together when doing the comparison; so MAGeCK can analyze the variance of these samples. For example in the test command, “-t sample1_bio_replicate1,sample1_bio_replicate2 -c sample2_bio_replicate1,sample2_bio_replicate2” compares 2 samples (with 2 biological replicates in each sample)
I don’t see where the -t option applies on the galaxy version of Mageck-test on MainGalaxy (https://usegalaxy.org/).

I do find the instruction on Galaxy Europe (https://usegalaxy.eu/) though:
If sample label is provided, the labels must match the labels in the first line of the count table, separated by comma (,); for example, HL60.final,KBM7.final. For sample index, 0,2 means the 1st and 3rd samples are treatment experiments. See Help below for a detailed description. (–treatment-id)

I should be fine
perhaps you could add that sentence in main Galaxy?
Thanks++ again
Best

Hi @jean-pierre_de_villa

Thanks for clarifying more! I thought the confusion was for technical replicates. Whoops! Yes, biological replicates are input as separate fastq files.

  • The sample sheet (count file) uses sample names that can be automatically interpreted from the file names (or better, element identifiers from a collection of inputs) or you can rename these on the MAGeCK count tool form.

  • Then for the MAGeCK test runs, you can reference these sample identifiers by name or position.

  • The test can happen a few ways, too. You can limit by the samples included, but you can also change how the counts are generated, output extra statistic files, limit by the first 1 million reads per sample, and such. (see the Output/Advanced Options)

This might be confusing since the servers have different versions of the tool forms loaded up. These synch about once a week, and there are sometimes reasons to have these not synched. I think this is all fine right now.

But I’m wondering if you spotted something else we need to report. When I reviewed, it looks like current forms both include this option to clarify the treatment groups per run. What am I missing? The ORG server has a newer version of the form, and if a mistake was made by omitted something important, we can get that corrected! Maybe share a screenshot back to help me to see it?

Xref MAFeCk change log

This is what I was reviewing.

Count

Test


Thanks and I’m glad you have this working! We appreciate the feedback from everyone using these! :slight_smile: