Hi folks,
I am having a hard time with mapping a large number of sequences using Interproscan and it appears to be that the tool is not recognizing my sequences as valid. I am able to correctly map some sequences that look like this:
d1f1aa_ b.1.8.1 (A:) Cu,Zn superoxide dismutase, SOD {Baker’s yeast (Saccharomyces cerevisiae) [TaxId: 4932]}
vqavavlkgdagvsgvvkfeqasesepttvsyeiagnspnaergfhiqefgdatngcvsa
gphfnpfkkthgaptdevrhvgdmgnvktdengvakgsfkdslikligptsvvgrsvvih
agqddlgkgdteeslktgnagprpacgvigltn
They are in fasta format like this:
Interposcan works with them:
but when I attempt to input other sequences (from another source) that look like this:
COX_I_YP_001518912.1_Cu_user;Acaryochloris_marina_MBIC11017(Bacteria/Cyanobacteria)
MTEAQAPHLEEVEVTPWREYFSFSTDHKVIGIQYLVTSFVFYLIGGLLAELVRTELATPASDFVPRETYNELFTMHATIMIFLWIIPTLTGGFGNFLVPLMIGARDMAFPKLNAIAFWIIPPTSILLLCSFFVGPASAGWTSYPPLSLMTNKAGEAIWILGVILLGTSSIMAGLNFLVTILKMRIPSMTLNDMPLFCWAMLATSALQLVATPVLSGAMVLLGFDLLVGTNFFNPAGGGDPIVYQHMFWFYSHPAVYIMILPAFGLISEILPVHARKPIFGYQAIAYSSIAISFLGLIVWAHHMFTSGTPDWLRMFFMIATMVIAVPTGIKVFSWVATVWGGKLNLCSAMLFGMAFVSMFVVGGLSGVMVASVPFDIHVHDTYFVVAHLHYVLFGGSVFGIYAGLYHWFPKMTGRMLNEFWGKVHFAMTFVGFNICFLPMHVLGLQGMNRRIAEYDPKFAALNVVCTIGSYILATSTIPFVVNAVWSWLAGPRANSNPWKGLTLEWTVPSPPPVENFEEDPVLAIGPYDYGTPKALDFVAATLAPAHALAAESLE
in the fasta like this:
The analysis fails. The bug report looks like this:
I tried simplifying the header format, but am unclear why this is still not being picked up as a valid sequence. Both files are correctly imported as fastas.