Files and file structure
File structure and files needed for importing a new project
# Copy files from Google Drive tmp/PRJNA273161_taxid_overrides # create "/opt/bin/bio" folder cd /opt/bin/bio scp -r gg:/opt/bin/bio/SCG .
USearch
Get usearch fromĀ http://www.drive5.com/usearch/download.html. For development purpose, the free usearch 32 bit should be sufficient.
mv usearch-blah-blah /opt/bin/bio/usearch64 # Regenerate the udh files in archaeal_scg and bacterial_scg folders cd [ archaeal_scg / bacterial_scg ] /opt/bin/bio/usearch64 -makeudb_ublast file.with.faa -output file.with.faa.udb # example /opt/bin/bio/usearch64 -makeudb_ublast all.faa -output all.faa.udb
pullseq
Set up pullseq fromĀ https://github.com/bcthomas/pullseq
git clone https://github.com/bcthomas/pullseq.git cd pullseq ./bootstrap # get set up for config/build after cloning ./configure # configure the application based on your system make # will build the application make install # will install in /usr/local by default
Generating *.b6, *.b6+ files
Generating forward and reverse hits for *.b6 files in the input data
# cd into the ggkbase application folder # copy filter_m8_tofile.rb into ggkbase application folder # from Google Drive: [link] gem install trollop # cd to the folder that contains the data for i in *.b6; do ruby ~/ggkbase_staging/filter_m8_tofile.rb -i $i; done
Converting *.b6 to *.b6+ files
#db_type = KEGG, UNIREF, UNIPROT annolookup.py in.b6 db_type > out.b6+
Thor task
Run the Thor task
thor importer:bin [dir of input data] -f [prefix] -p [project slug] # Example: thor importer:bin /Users/shufei/webdev/ggkbase/data/31_016/idba_ud_full/ -f 31_016_scaffold_min1000 -p shufei-import