How to deal with duplicates in Mageck through Galaxy?
Would you be able to explain what is going on a bit more? Maybe with a shared history as an example, and screenshots? Thanks!
Thanks
I will have 2 fastq files corresponding to a duplicate of a same experiment setting. How “to say” to Mgeck to consider these 2 files as duplicate?
Thanks for clarifying.
From the MAGeCK wiki: How to specify the technical replicates (“duplicates”). → MAGeCK / Wiki / QA
To pool the reads in Galaxy, combine the fastq files before the counting step.
Quoted note from our tutorial here → Hands-on: CRISPR screen analysis / CRISPR screen analysis / Genome Annotation
Comment: Replicates
If we have biological and/or technical replicates we can handle them in a similar way to that described on the MAGeCK website. For biological replicates, we input them in MAGeCK test Treated Sample Labels/Control Sample Labels fields separated by a comma. For technical replicates, we could combine the fastqs for each sample/biological replicate, for example with the Concatenate datasets tool, before running MAGeCK count.
Thanks for explaining more, and hope this helps!
Thanks Jenni,
Although it seems quite easy to concatenate fq files for technical replicates things are not so evident for biological replicates:
For biological replicates, treat them as separate samples and use them together when doing the comparison; so MAGeCK can analyze the variance of these samples. For example in the test command, “-t sample1_bio_replicate1,sample1_bio_replicate2 -c sample2_bio_replicate1,sample2_bio_replicate2” compares 2 samples (with 2 biological replicates in each sample)
I don’t see where the -t option applies on the galaxy version of Mageck-test on MainGalaxy (https://usegalaxy.org/).
I do find the instruction on Galaxy Europe (https://usegalaxy.eu/) though:
If sample label is provided, the labels must match the labels in the first line of the count table, separated by comma (,); for example, HL60.final,KBM7.final. For sample index, 0,2 means the 1st and 3rd samples are treatment experiments. See Help below for a detailed description. (–treatment-id)
I should be fine
perhaps you could add that sentence in main Galaxy?
Thanks++ again
Best
Thanks for clarifying more! I thought the confusion was for technical replicates. Whoops! Yes, biological replicates are input as separate fastq files.
-
The sample sheet (count file) uses sample names that can be automatically interpreted from the file names (or better, element identifiers from a collection of inputs) or you can rename these on the MAGeCK count tool form.
-
Then for the MAGeCK test runs, you can reference these sample identifiers by name or position.
-
The test can happen a few ways, too. You can limit by the samples included, but you can also change how the counts are generated, output extra statistic files, limit by the first 1 million reads per sample, and such. (see the Output/Advanced Options)
This might be confusing since the servers have different versions of the tool forms loaded up. These synch about once a week, and there are sometimes reasons to have these not synched. I think this is all fine right now.
But I’m wondering if you spotted something else we need to report. When I reviewed, it looks like current forms both include this option to clarify the treatment groups per run. What am I missing? The ORG server has a newer version of the form, and if a mistake was made by omitted something important, we can get that corrected! Maybe share a screenshot back to help me to see it?
Xref MAFeCk change log
This is what I was reviewing.
Count
Test
Thanks and I’m glad you have this working! We appreciate the feedback from everyone using these!