could you make KMC from the toolshed (available Galaxy Australia) also available in usegalaxy.eu?

Hi @bejo
Are you after analysis of F and R reads in a single KMC Counter job? If yes, consider concatenate and merge the read files head to tail. It works on GZipped files, as well, at least the latest versions of Concatenate do.
Hope that helps.
Kind regards,
Igor

Hi @jennaj @igor
I know how to upload FASTA files and merge them into a collection.

However, I am using the download function in Galaxy to retrieve sequence libraries from NCBI using “fasterq dump tool”. So you get fwd and reverse libraries paired and it would be handy and these can be parsed in parallel. (with two random FASTA files, uploaded them as “collection” and are indeed parsed together resulting a single output file).
QUESTION: How can you manage the paired FWD and REV sequence libraries are handled as a “collection”, so you avoid the need to start 2 jobs manually?

Alternatively, I took to random fasta files. In KMC you can select both single FASTA as individual datasets (assuming KMC will handle both files separately, in one go, producing separate output files), but this runs into an error.

Finally, I would like to construct a Workflow but first I need to understand how to parse paired seq-libraries in one in KMC. Could it be a minor issue in KMC?

Hi @bejo

You can change the shape of your collections, and merge, split, concatenate the data inside of them.

Would this tool help?

Concatenate multiple datasets tail-to-head while specifying how

Hi @bejo,

As @jennaj suggested, you can manipulate the data using tools from Collection Operation section.

I noticed an issue with KMC Counter on multiple inputs and notified the owner of the wrapper. I hope the issue will be resolved in reasonable time. Currently it works only on a single file, so, for now, Concatenate is the way to go, if you are after multiple files.

On Galaxy Australia, dedicated data importing tools are very slow, imho, ~ten times slower compared to import by URL. Because of this, I do not use fasterq_dump etc. I import reads using URLs from ENA. In this scenario I ends up with files, not a collection.

Kind regards,
Igor

1 Like

Dear all, great to hear your feedback :ok_hand:

@jennaj shaping collections I was not aware off. I appreciate you indicate this to me. I am going to try this.

@igor indeed importing data seems slow, remarkably it is called “faster"q_dump” :smile: I will try the URL option too.

KMC is available for some time now. I already frequently used KMC. In github the option “complex” is another option in KMC. It would give the user other filtering options, rather than comparing two datasets with each other.

I would like to find shared kmers in a set of samples. I can also perform a chain of intersections, but that’s not that handy. According to github, the KMC function complex would do this trick as well:
"complex operation allows to define operations for more than 2 input kmer sets.

I am wondering if this option in KMC could be made available.

@igor have you seen my message about the “complex” function for KMC?