bowtie index build error for large genome which is 15GB ( one chromosome in over 2GB)

Please see the text in detail.

Building a SMALL index
Error: Reference sequence has more than 2^32-1 characters! Please build a large index by passing the –large-index option to bowtie2-build
Error: Encountered internal Bowtie 2 exception (#1)
*Command: /home/galaxy/galaxy-dev/database/dependencies/_conda/envs/__bowtie2@2.4.4/bin/bowtie2-build-s --wrapper basic-0 --threads 80 /home/galaxy/galaxy-dev/database/objects/b/1/1/dataset_b11d4bf4-6411-4bae-b0d0-039405fb0c3a_files/Garlic_genome.SPLIT.fa Galrlic_V1_split *
Deleting “Galrlic_V1_split.3.bt2” file written during aborted indexing attempt.
Deleting “Galrlic_V1_split.4.bt2” file written during aborted indexing attempt.
Error building index.

However in this interface, there is no ‘–large index’ option.
How to add an option for data_manager_bowtie2_index_builder?

Does this relate to the configuration file data_manager_conf.xml of data_manager_bowtie2_index_builder?

looks I find the answer!!

vim /database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bowtie2_index_builder/9dd107db92c2/data_manager_bowtie2_index_builder/data_manager/bowtie2_index_builder.py
--------------************
def build_bowtie2_index(data_manager_dict, fasta_filename, params, target_directory, dbkey, sequence_id, sequence_name, data_table_names=DEFAULT_DATA_TABLE_NAMES):
# TODO: allow multiple FASTA input files
fasta_base_name = os.path.split(fasta_filename)[-1]
sym_linked_fasta_filename = os.path.join(target_directory, fasta_base_name)
os.symlink(fasta_filename, sym_linked_fasta_filename)
args = [‘bowtie2-build’, ‘–large-index’, sym_linked_fasta_filename, sequence_id]
######### I insert -large-index into this line !!!
threads = os.environ.get(‘GALAXY_SLOTS’)
if threads:
args.extend([‘–threads’, threads])
proc = subprocess.Popen(args=args, shell=False, cwd=target_directory)
return_code = proc.wait()
if return_code:
print(“Error building index.”, file=sys.stderr)
sys.exit(return_code)
data_table_entry = dict(value=sequence_id, dbkey=dbkey, name=sequence_name, path=sequence_id)
for data_table_name in data_table_names:
_add_data_table_entry(data_manager_dict, data_table_name, data_table_entry)

now a large genome index is done!

‘bowtie2_index_builder.py’ requires modifications to become compatible with large genomes.

1 Like

Great that you found this! We need people to help with expanding the data manager’s scope in a few places, so any help with this one is welcomed. Please consider posting an issue ticket to the development repository, and even suggesting changes in a PR if you want. The original authors, or the IUC, will work with you on this.

How to find where to propose changes (any tool, not just this one): tool form → Options → See in Toolshed → link to Dev repo.

Thanks for posing back the answer :slight_smile:

1 Like