Do not get results in the tables (Runs, Sample, etc) - can not find the problem

Hi!

I am trying to upload both long and short reads from a project, but I do not get any results in the four different files; Runs, Samples, Studies, Experiments. I cannot find anything incorrect in my metadatafile. Do you have the possibility to help out?

job ID:

11ac94870d0bb33aed89f696b892bd04

Welcome @janslet

Since you are working at UseGalaxy.eu, sharing just the job ID is not enough (only 2-3 people can view data that way under GDPR, and they are engineers mostly). Plus, all of the upstream steps can influence how a job turns out, so reviewing the connecting jobs is important!

Instead, you can share your working history back here for troubleshooting feedback. How to do this in this topic. → How to get faster help with your question

You can post the history share link back as a reply, then unshare once we are done. Please also include links to any tutorials or other external references you might be following for the protocol for more context. Screenshots can also help, especially if the problems are happening in a Rule Builder or Apply Rules step. Details about your goals can also be helpful but I’ll probably see what is going on. :slight_smile:

XRef

Thank you very much for your quick feedback.

Please find enclosed the history link: Galaxy

:jannice

HI @janslet

Thanks for sharing the history!

The messages from the tool, and the way the samples are organized in the history, both combine to make me suspect that the sample labels in the xlsx metadata file are missing the references to the individual sample files.

Try this:

  1. Examine your xlsx metadata file locally. This can’t be viewed inside Galaxy, but you could decide to output a copy in a tabular format – at a minimum, try to isolate the sample labels or file names, the key that is “matching up” with the files supplied. I would suggest loading that up as a reference to help with later steps.

  2. Organize your samples files into a list collection. Then, extract the default element identifiers. What do those sample labels look like? Can you notice how they are different from the metadata file?

From there, you can use the tool Apply Rules to manipulate to your collection to parse out the samples label (element identifier) to better match the xlsx metadata.

Or, you could decide to do that manipulations directly on the text – the output of Extract element identifiers can become the input for Relabel identifiers.

Some references for these manipulations

It is difficult to know how complicated this will be, but most manipulations should be possible and we can help here if you get stuck. Your base files have two different formats for the file names, which is an extra wrinkle but there are still ways to do this (maybe by doing each file format independently first, then merging after).

Getting the target “sample” names the metadata is referencing is what we’ll need next. So, try the “export to tabular” function from Excel, then load that up to Galaxy first. You can put that in the existing history and let me know once done if you are not sure what to try after that.

That’s a lot of information! In short, getting the data organized so the ENA submission receipt tool can “see” it correctly is where to start. I don’t think that step worked correctly. OR, did I misunderstand? :slight_smile:

Dear jennaj,

I really appreciate your quick reply. I have worked with the metadata-file and hoped that I got the correct naming of the files and the sample_alias and that all four sheets in the metadata file is connecting. But, when I started the ENA Upload tool, it still does not work.

Yes, it is a lot of information, but I hope I/we will manage to upload in the end :smiley:

Hi @janslet

I’d like to get this solved for you!

How to create a testing history for method development and troubleshooting!

Let’s start over in a new history in the sharable storage space for the troubleshooting. Then, you can move back into other storage choices for the real runs. You’ll have 2 TB of space at the UseGalaxy.eu servers which should be plenty for this type of exploratory work.

You will be able to develop a reusable workflow that can be run in any history or any data storage space, and possibly any Galaxy server! If this is new to you, this is a nice Introduction to Workflows.

Steps

  1. Create a new history, give it a unique name, and set the History Preferred Storage location to the Short term storage

    Click on the storage icon to reach the directions

    Click on that Preferred Storage pop-up window, to reach the per-History Preferred Storage choices, and choose Short term storage. This will only impact this history where we will be troubleshooting your job.

  2. Upload some of your samples

    Do this as an upload step, not from a copy from another history. You don’t need to add all of your full data, just enough to represent the different types of input data.

    For your use case, this would be including a representative example of R1+R2 pairs, and the “R-nfilt” and “nfilt” types. Or, you can go ahead and load all samples and we can help to subset later on as we do the testing and development work.

  3. Upload your metadata file in xlsx format

    Also do this one as an Upload step, not a copy from another history. Don’t worry about filtering it down to match your sub-samples (if you choose to do that). Just load the original file.

    Load one copy using all default settings with the Upload tool.

    Then, please also load another copy of the same file, into the same history, and select the datatype xlsx in the Upload tool. We will use this as a comparison.

  4. Upload your metadata file in tsv or csv format

    In Excel on your own computer, export a tsv (tab separated) or csv (comma separated) version of your metadata file. Please do not adjust the file name or extensions. We want the data exactly as it is created by Excel.

    Then, Upload this exported file to your Galaxy history, using all default settings.

    Once in Galaxy, please do not make any adjustments, we want the plain text file in the original format as a baseline dataset as another comparison.

    If you want to export and then load both a tsv and csv version, that would be great! More data and more details is always better.

  5. Finally, try to run the target tool!

    It is okay if this fails. We will want to inspect how you are selecting the input datasets and the exact parameter settings, plus the tool version choice and the job logs. Your run will capture all of these details.

    If you need to organize the dataset samples into a collection, go ahead and do that to prepare the data for input to the tool. Make sure to do this step in the same testing history, and as a new manipulation. We don’t want any data or manipulations copied from any other history.

    If you want to try the run a few different ways, that would be great! With collections, without, possibly different parameter settings. Just be sure to do all of this work in our testing history.

  6. :scientist: Once done, and if the tool still fails, please generate a history share link and post that back here for review. Toggle just the first sharing option Make History accessible. Leave the other options at default or we won’t be able to review, or test out changes, or help you to build up a workflow for reuse against your full final data run in permanent storage.


Hope this helps and I’ll watch for your replies! :slight_smile:

Hi @jennaj

I tried doing the protocol as described above, but ended up having problems “connecting” the .tsv files as the ENA upload tool wanted Tabular files it could not find.

history:

But, I went back to the first history as I recall I was a bit unsure if I got the correct webin client. I had entered the incorrect webin client and started the ENA upload tool again, and now it seems that it ran, but I still did not get any acc numbers. Is this because I sent it to the test upload?

:slight_smile: janslet

Great, thanks for sharing the history @janslet

The messages from the tool can be found on the job’s Details tab (i-info icon on a dataset).

This is your view from one of the jobs in the history you linked above. It is describing that the tool couldn’t “find the right data” for some keys. This indicates a metadata problem.


Now, that can be confusing - but backing up and examining the metadata template versus what you currently have is where to start. Thanks for loading up a few of the sheets in plain text! Very helpful!

The different template formats are available here. → GitHub - ELIXIR-Belgium/ENA-metadata-templates: TSV and XLSX templates for submitting ENA objects (study, experiment, sample and run) using the ENA-upload-tool

Your choice of template needs to match your choice on this form. For this run, you selected ERC000011 and from what I see of the two sheets that did upload, that is what you are actually using.

However, I see what is likely the problem. For the primary Run key, there are duplicated entries. This could produce the type of error you had – the scripts at ENA were looking for a 1-1 relationship, but found multiple rows with slightly different content, the tool was confused, and died.

Example of some duplicates:

Then the template with some annotations showing how there are multiple primary keys and how they link everything together on the Experiment sheet. This means if you want to group R1 and R2 reads, that would be on the Sample sheet instead.

There may be more issues, so correct this then iterate until you fully solve the data organization. The instructions at the Github above also show how to construct this in json format along with a detailed guide.

Your experiment isn’t overly large, so you could also decide to use the direct entry method instead. But this is your choice! Filling out a Galaxy form would be tedious for so many samples! You might as well do that in the Excel sheet (or with tabular files) since you plan to reuse it for the actual submission.

Xref → Hands-on: Submitting sequence data to ENA / Submitting sequence data to ENA / Using Galaxy and Managing your Data

Hope this helps! :slight_smile: