Welcome, @Grace_Ekalle
Can you explain what you think is going wrong and if you are already following a tutorial, include that link? Thanks! We’d like to help you but some context will help along with the shared history link.
When I try to run my dataset (SRR-it is a very large dataset) into trim QC to trim it, it always stops running at the Fastp job. An error sign shows saying that there isn’t enough memory for the job to run. So i realized that i have to split the dataset. My ultimate goal is to get it into a krona pie chart to see the types of bacteria present in the data set. https://science.c-moor.org/miniCURE-BioDIGS/taxonomy-profiling.html
Great, thanks for explaining.
The out-of-runtime memory can be due to some format/content/parameter problem or the job is actually too large to process. The first is much more common but the second is possible!
Try this:
- Try getting that accession’s data using this tool. Use all defaults – just input the SRR identifier on the form. This will ensure that format is not a problem. You can do this is a new history if you want to.
- Faster Download and Extract Reads in FASTQ format from NCBI SRA (link to tool at ORG)
- Run Flatten Collection on the paired-end output from step 1 (use defaults). Then run FastQC on that flattened collection output. Review the report – this is where you will learn what you want to do when you do the trimming.
- You “flatten” to give each data file a unique name. Later, when you have multiple samples, this will be important if you decide to use MultiQC to summarize the FastQC reports. Exact copies of data do not consume more quota space, and the reports themselves are tiny.
- Run your trimming tool on the original output from step 1 addressing any content problems you learned about from FastQC.
- If this fails, you should re-review how to set up the trimming tool to process the data. Sometimes the logs will report what went wrong, sometimes the tool is overwhelmed and just spins out and dies.
- If the trimming tool works, then flatten that new trimmed output collection, rerun FastQC (and optionally MultiQC) and review. Did the trimming happen as you expected it to? Do you you need to do more, or change anything?
- Repeat this as much as you need to. You want super-clean data for downstream steps. This will clarify any errors you may produce with them while you figure out the best scientific processing for your data.
- If the trimming tool keeps failing for memory reasons, this would be very very unusual and I would be interested in reviewing. You can share back your history with all the work from this mini-guide and me or another moderator will help to review. We can’t help you to do the scientific parts – just the technical parts – so do try to solve the trimming parameters by reviewing that tool’s documentation and tutorials first please. Then leave all the QA reports and data intact when you share your history and we’ll try to help more.
We have a tutorial here that can guide you for the different types of read data, and some other tutorials cover this too, but this is where to start. → Hands-on: Quality Control / Quality Control / Sequence analysis
Hope this helps and I’ll watch for your reply – let us know if you get this to work, too!