I cannot figure out what to do with Fastq files that are already paired

Hi there! I am a baby researcher, a post-bac on my second microbiome analysis project. I am on the second step and I am already stuck. On my previous project, I was working with Illumina data, so I received raw data as unpaired sets and was, therefore, able to follow the tutorial exactly. It went very well with only a few hiccups. In this project, my PIs sent the samples to a different lab for sequencing in which they used Pac Bio to sequence the data. I was able to use software to bin the files back into separate samples so that now I have 18 individual samples that are already paired. However, now I am not sure how to progress from here. I don’t need to use the make. contigs command, right? So do I just group the files? Do I need to also unpair the samples first? Is there a way to use make. contigs anyway?
Ugh, I admit, I am a better biologist than a computer scientist and I feel completely stupid right now.
Thanks in advance to anyone who can help!

1 Like

Hello @n_griffith42

These are PacBio long reads, correct?

What are your goals?

  • Assemble the reads?

  • Call SNPs?

  • Assign taxonomy/function?

The samples are bayou water sample from different time points. I am one of the only people in my lab group that has done any sequencing analysis before. The PIs want me to do what I did with my previous samples which is compare changes in microbes over time and between the locations. I was planning on using silva again to align the samples for taxonomy. Then I could do the comparisons. The problem for me is I am a regular biologist learning bioinformatics on my own >_< I received a single fasta file then was able to bin it into the 18 separate samples. Now I am trying to figure out how to get started with the process without needing to do the make.contigs step.

Additionally the binning generated fastq files which seems to be upsetting Galaxy greatly.

1 Like

Do you know what the original data represented?

Pacbio long reads are very different from amplicon reads.

Is your original data uploaded to a public Galaxy server? If so or you can do that now, you could send me a direct message here with a share link to your history that contains that and the data you manipulated. I will probably be able to figure out what the content represents, how to process it, and what problem is with your “bin” data (I’m not exactly sure what that means from your explanation, and it sounds like Galaxy doesn’t recognize the format either). But we can try to sort that all out.

Send a direct message by clicking on the top right icon for your account. The expanded list will have a mail icon. Click on that and send the message to me (@jennaj).

Generate a shared history link by clicking into the History menu and pick “Share or Publish” from the list. On that form, the top option is to generate a link. Be sure to check the box to share objects or I won’t be able to look at them this way, and I don’t like to alter data in users accounts (to share it properly or for any other reason). I’ll help by inspecting first, then possibly making a temporary copy if needed to work out how to fix it up (any copied data won’t be retained, just used for troubleshooting with you directly, so is completely private). I help people this way every day.

As an alternative to sharing a history link, if you are working at Galaxy Main https://usegalaxy.org, I am an admin there as well as here at Galaxy Help. If your registered email address here is the same as at Main, just let me know it is the same (I can look it up) plus tell the history name so I can find it and know am reviewing the right data. If your email is different, you can also just let me know your registered account email address at Main plus the history name. I can find it that way (no need to actually share it).

You don’t need to post any of that to this public thread, however you decided to do this.

Small note: Never share any account’s password with anyone. Administrators won’t need it. We just need to know where to look.

ps: I learned on my own too, but that was double-digit years ago :slight_smile: Everyone needs a little help at the start.

Hi
just do more tutorials and keep asking questions about every problem.
I was like you 8 mounth ago and now i run every project in my university which are related .
@jennaj always answer and help with your problems.

Cheers and wish you best

Wow, I am having heck even figuring out how to send a direct message. It doesn’t even offer me the option. The data is supposed to represent all the ribosomal RNA sequences in each water sample. So there are 16S and 18S sequences. I am supposed to be identifying the sequences with taxonomy, possibly evaluating the diversity and creating interpretative charts via Krona. I did all of these before on my senior thesis project via Galaxy/Mothur. This issue is I am starting out with Pac Bio data instead of Illumina files, and I am not sure how to get through the whole “make.contigs” part from the point these files are at as they don’t have to be paired. My email is the same on both and I am on usegalaxy.org. My history is named “Harvey Project 1st Test Run.” I understand that the Pac Bio files are supposedly more in depth than the Illumina files, so I understand it is going to take longer. I just need to get to the post “make.contigs” point. From there I can understand how to go through the rest of the steps, I hope.

Hi there, @jennaj were you able to find the history? I’m working a day job and doing this right now, so I am hopping back and forth. Sorry, for the delay.

1 Like

Hi @n_griffith42

Sorry for the delay, missed your reply.

I didn’t find an account registered under the same email address as used here at Galaxy Help at the Galaxy Main https://usegalaxy.org public server.

I did, however, find an account that appears to be yours. I sent you a direct message – please confirm this is your account and that you are still having problems. If the name of the history is different now, please clarify and note the dataset numbers that are problematic. In particular, I am interested in reviewing the starting data you manipulated yourself.

For reference, this is where to find or send direct messages here at Galaxy Help (upper right corner):

ghelp-direct-message

A post was split to a new topic: Metagenomics analysis: Tutorals