Setup Project Import on Mac

Files and file structure

File structure and files needed for importing a new project

# Copy files from Google Drive
tmp/PRJNA273161_taxid_overrides
# create "/opt/bin/bio" folder
cd /opt/bin/bio
scp -r gg:/opt/bin/bio/SCG .

USearch

Get usearch fromĀ http://www.drive5.com/usearch/download.html. For development purpose, the free usearch 32 bit should be sufficient.

mv usearch-blah-blah /opt/bin/bio/usearch64
# Regenerate the udh files in archaeal_scg and bacterial_scg folders
cd [ archaeal_scg / bacterial_scg ]
/opt/bin/bio/usearch64 -makeudb_ublast file.with.faa -output file.with.faa.udb
# example /opt/bin/bio/usearch64 -makeudb_ublast all.faa -output all.faa.udb

pullseq

Set up pullseq fromĀ https://github.com/bcthomas/pullseq

git clone https://github.com/bcthomas/pullseq.git
cd pullseq
./bootstrap  # get set up for config/build after cloning
./configure  # configure the application based on your system
make         # will build the application
make install # will install in /usr/local by default

Generating *.b6, *.b6+ files

Generating forward and reverse hits for *.b6 files in the input data

# cd into the ggkbase application folder
# copy filter_m8_tofile.rb into ggkbase application folder
# from Google Drive: [link]
gem install trollop
# cd to the folder that contains the data
for i in *.b6; do ruby ~/ggkbase_staging/filter_m8_tofile.rb -i $i; done

Converting *.b6 to *.b6+ files

#db_type = KEGG, UNIREF, UNIPROT
annolookup.py in.b6 db_type > out.b6+

Thor task

Run the Thor task

thor importer:bin [dir of input data] -f [prefix] -p [project slug]
# Example:
thor importer:bin /Users/shufei/webdev/ggkbase/data/31_016/idba_ud_full/ -f 31_016_scaffold_min1000 -p shufei-import