Issues with FASTA fetch on local galaxy instance

I’m running a local galaxy instance on my desktop and I was trying to use the data managers to install the hg38 genome reference. I tried rsync as well which didn’t work. It was installed but wouldn’t open when I went to the data managers tab and clicked on it. So instead I’m trying the fasta fetcher, picard, etc way and when I run FASTA fetcher I’ve been getting this error.

Fatal error: Exit code 1 ()
Traceback (most recent call last):
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/4d3eff1bc421/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 497, in <module>
    main()
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/4d3eff1bc421/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 478, in main
    tmp_dir=tmp_dir)
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/4d3eff1bc421/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 300, in download_from_ucsc
    url = _get_ucsc_download_address(params, dbkey)
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/4d3eff1bc421/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 260, in _get_ucsc_download_address
    path_contents = _get_files_in_ftp_path(ftp, ucsc_path)
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/4d3eff1bc421/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 65, in _get_files_in_ftp_path
    ftp.retrlines('MLSD %s' % (path), path_contents.append)
  File "/home/jgoodman/galaxy/database/dependencies/_conda/envs/__python@3.7/lib/python3.7/ftplib.py", line 475, in retrlines
    with self.transfercmd(cmd) as conn, \
  File "/home/jgoodman/galaxy/database/dependencies/_conda/envs/__python@3.7/lib/python3.7/ftplib.py", line 406, in transfercmd
    return self.ntransfercmd(cmd, rest)[0]
  File "/home/jgoodman/galaxy/database/dependencies/_conda/envs/__python@3.7/lib/python3.7/ftplib.py", line 368, in ntransfercmd
    source_address=self.source_address)
  File "/home/jgoodman/galaxy/database/dependencies/_conda/envs/__python@3.7/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/home/jgoodman/galaxy/database/dependencies/_conda/envs/__python@3.7/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
BlockingIOError: [Errno 11] Resource temporarily unavailable

Despite trying multiple different days at different times of day, I still run into the resource temporarily unavailable error. Any help would be greatly appreciated!

not sure if that’s a problem with your Galaxy instance since I cannot currently get a directory listing of e.g. goldenPath/hg38/bigZips from the command line or with filezilla either.

@Maximilian_Haeussler here’s what I’m getting with filezilla:

Status:	Resolving address of hgdownload.cse.ucsc.edu
Status:	Connecting to 128.114.119.163:21...
Status:	Connection established, waiting for welcome message...
Status:	Insecure server, it does not support FTP over TLS.
Status:	Logged in
Status:	Retrieving directory listing of "/goldenPath/hg38/bigZips"...
Command:	CWD /goldenPath/hg38/bigZips
Response:	250 CWD command successful
Command:	PWD
Response:	257 "/apache/htdocs/goldenPath/hg38/bigZips" is the current directory
Command:	TYPE I
Response:	200 Type set to I
Command:	PASV
Response:	227 Entering Passive Mode (128,114,119,163,164,245).
Command:	MLSD
Error:	Connection timed out after 120 seconds of inactivity
Error:	Failed to retrieve directory listing

Is there a known issue with UCSC’s ftp server atm?

1 Like

Hmm, I can access this directory fine:

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

Do you get an error in your web browser ?

1 Like

Works with the https:// link in the browser, yes :slight_smile:
Filezilla still fails though.

1 Like

@Maximilian_Haeussler the Galaxy data manager tool tries to do this:

1 Like

@Jack_Goodman as a workaround, try to use:

Choose the source for the reference genome: URL, with this link: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz

Basically that bypasses the tool’s attempt to discover the file from just the hg38 name.

1 Like

That job ran without issues but then I tried running the SAM index builder afterwards and ran into an error, which I why I was trying to get the UCSC to work thinking it could get around this. The error SAM indexer shows is:

Error building index:
Traceback (most recent call last):
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_sam_fasta_index_builder/2a1ac1abc3f7/data_manager_sam_fasta_index_builder/data_manager/data_manager_sam_fasta_index_builder.py", line 87, in <module>
    if __name__ == "__main__": main()
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_sam_fasta_index_builder/2a1ac1abc3f7/data_manager_sam_fasta_index_builder/data_manager/data_manager_sam_fasta_index_builder.py", line 82, in main
    build_sam_index( data_manager_dict, options.fasta_filename, target_directory, options.fasta_dbkey, sequence_id, sequence_name, data_table_name=options.data_table_name or DEFAULT_DATA_TABLE_NAME )
  File "/home/jgoodman/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_sam_fasta_index_builder/2a1ac1abc3f7/data_manager_sam_fasta_index_builder/data_manager/data_manager_sam_fasta_index_builder.py", line 48, in build_sam_index
    sys.stderr.write( chunk )
TypeError: write() argument must be str, not bytes

I should note that TwoBit, Picard index, and HISAT2 indexers work without issue.

toolshed.g2.bx.psu.edu/repos/devteam/data_manager_sam_fasta_index_builder/2a1ac1abc3f7

now that is a very(!) outdated version of the data manager from Aug 2015. Not sure why you have this one installed, but it’s likely still expecting Python2, not Python3.
Just go back to the Admin interface and install the latest version of the data manager and things should work.

1 Like