                                   dbxtax



Wiki

   The master copies of EMBOSS documentation are available at
   http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

   Please help by correcting and extending the Wiki pages.

Function

   Index NCBI taxonomy using b+tree indices

Description

   dbxflat indexes NCBI taxonomy files, and builds EMBOSS B+tree format
   index files.

   These indexes allow access of flat files larger than 2Gb.

Usage

   Here is a sample session with dbxtax


% dbxtax
Index NCBI taxonomy using b+tree indices
Basename for index files [taxon]:
Resource name [taxresource]:
Database directory [.]: taxonomy
        id : ID
       acc : Synonym
       tax : Scientific name
       rnk : Rank
        up : Parent
        gc : Genetics code
       mgc : Mitochondrial genetic code
Index fields [*]:
Compressed index files [Y]:
General log output file [outfile.dbxtax]:


   Go to the output files for this example

Command line arguments

Index NCBI taxonomy using b+tree indices
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-dbname]            string     [taxon] Basename for index files (Any string
                                  from 2 to 19 characters, matching regular
                                  expression /[A-z][A-z0-9_]+/)
  [-dbresource]        string     [taxresource] Resource name (Any string from
                                  2 to 19 characters, matching regular
                                  expression /[A-z][A-z0-9_]+/)
   -directory          directory  [.] Database directory
   -fields             menu       [*] Index fields (Values: id (ID); acc
                                  (Synonym); tax (Scientific name); rnk
                                  (Rank); up (Parent); gc (Genetics code); mgc
                                  (Mitochondrial genetic code))
   -[no]compressed     boolean    [Y] Compressed index files
   -outfile            outfile    [*.dbxtax] General log output file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -release            string     [0.0] Release number (Any string up to 9
                                  characters)
   -date               string     [00/00/00] Index date (Date string dd/mm/yy)
   -indexoutdir        outdir     [.] Index file output directory

   Associated qualifiers:

   "-directory" associated qualifiers
   -extension          string     Default file extension

   "-indexoutdir" associated qualifiers
   -extension          string     Default file extension

   "-outfile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


Input file format

   dbxtax reads and indexes the NCBI taxonomy nodes.dmp, names.dmp,
   division.dmp, gencode.dmp,and merged.dmp files.

Output file format

   dbxtax creates one summary file for the database and two files for each
   field indexed.

     * dbalias.ent is the master file containing the names of the files
       that have been indexed. It is an ASCII file. This file also
       contains the database release and date information.
     * dbalias.xid is the B+tree index file for the ID names. It is a
       binary file.
     * dbalias.pxid is an ASCII file containing information regarding the
       structure of the ID name index.
       Output files for usage example
       File: outfile.dbxtax

Processing directory: /homes/user/test/data/taxonomy/
Processing file: nodes.dmp
entries: 41 (41) time: 0.0s (0.0s)
Total time: 0.0s
Entry idlen 7 OK. Maximum ID length was 6 for '117570'.
Field acc acclen 15 OK. Maximum acc term length was 5 for '40673'.
Field tax taxlen 120 OK. Maximum tax term length was 23 for 'Pongo abelii x pygm
aeus'.
Field rnk rnklen 16 OK. Maximum rnk term length was 12 for 'superkingdom'.
Field up uplen 7 OK. Maximum up term length was 6 for '131567'.
Field gc gclen 2 OK. Maximum gc term length was 1 for '1'.
Field mgc mgclen 2 OK. Maximum mgc term length was 1 for '1'.

       File: taxon.ent

# Number of files: 1
# Release: 0.0
# Date:    00/00/00
Reference 2 filename database
nodes.dmp names.dmp

       File: taxon.pxac

Type         Identifier
Compress     Yes
Pages        3
Secpages     0
Order        71
Fill         46
Level        0
Pagesize     2048
Cachesize    20000
Order2       22
Fill2        24
Secpagesize  512
Seccachesize 20000
Count        1
Fullcount    1
Kwlimit      15
Reffiles     1

       File: taxon.pxgc

Type         Secondary
Compress     Yes
Pages        3
Secpages     1
Order        132
Fill         65
Level        0
Pagesize     2048
Cachesize    20000
Order2       22
Fill2        41
Secpagesize  512
Seccachesize 20000
Count        1
Fullcount    41
Kwlimit      2
Idlimit      7

       File: taxon.pxid

Type         Identifier
Compress     Yes
Pages        3
Secpages     0
Order        99
Fill         56
Level        0
Pagesize     2048
Cachesize    20000
Order2       22
Fill2        24
Secpagesize  512
Seccachesize 20000
Count        41
Fullcount    41
Kwlimit      7
Reffiles     1

       File: taxon.pxmgc

Type         Secondary
Compress     Yes
Pages        3
Secpages     3
Order        132
Fill         65
Level        0
Pagesize     2048
Cachesize    20000
Order2       22
Fill2        41
Secpagesize  512
Seccachesize 20000
Count        3
Fullcount    39
Kwlimit      2
Idlimit      7

       File: taxon.pxtax

Type         Identifier
Compress     Yes
Pages        14
Secpages     6
Order        14
Fill         13
Level        0
Pagesize     2048
Cachesize    20000
Order2       22
Fill2        24
Secpagesize  512
Seccachesize 20000
Count        86
Fullcount    90
Kwlimit      120
Reffiles     1

       File: taxon.pxrnk

Type         Secondary
Compress     Yes
Pages        3
Secpages     17
Order        68
Fill         45
Level        0
Pagesize     2048
Cachesize    20000
Order2       22
Fill2        41
Secpagesize  512
Seccachesize 20000
Count        17
Fullcount    41
Kwlimit      16
Idlimit      7

       File: taxon.pxup

Type         Identifier
Compress     Yes
Pages        3
Secpages     12
Order        99
Fill         56
Level        0
Pagesize     2048
Cachesize    20000
Order2       22
Fill2        24
Secpagesize  512
Seccachesize 20000
Count        33
Fullcount    41
Kwlimit      7
Reffiles     1

       File: taxon.xac
       This file contains non-printing characters and so cannot be
       displayed here.
       File: taxon.xgc
       This file contains non-printing characters and so cannot be
       displayed here.
       File: taxon.xid
       This file contains non-printing characters and so cannot be
       displayed here.
       File: taxon.xmgc
       This file contains non-printing characters and so cannot be
       displayed here.
       File: taxon.xrnk
       This file contains non-printing characters and so cannot be
       displayed here.
       File: taxon.xtax
       This file contains non-printing characters and so cannot be
       displayed here.
       File: taxon.xup
       This file contains non-printing characters and so cannot be
       displayed here.
       Data files None.
    Notes None.
    References None.
    Warnings None.
    Diagnostic Error Messages None.
    Exit status It always exits with status 0.
    Known bugs None.
    See also

       Program name                       Description
       dbiblast      Index a BLAST database
       dbifasta      Index a fasta file database
       dbiflat       Index a flat file database
       dbigcg        Index a GCG formatted database
       dbxcompress   Compress an uncompressed dbx index
       dbxedam       Index the EDAM ontology using b+tree indices
       dbxfasta      Index a fasta file database using b+tree indices
       dbxflat       Index a flat file database using b+tree indices
       dbxgcg        Index a GCG formatted database using b+tree indices
       dbxobo        Index an obo ontology using b+tree indices
       dbxreport     Validate index and report internals for dbx databases
       dbxresource   Index a data resource catalogue using b+tree indices
       dbxstat       Dump statistics for dbx databases
       dbxuncompress Uncompress a compressed dbx index
    Author(s) Peter Rice
       European Bioinformatics Institute, Wellcome Trust Genome Campus,
       Hinxton, Cambridge CB10 1SD, UK
       Please report all bugs to the EMBOSS bug team
       (emboss-bug (c) emboss.open-bio.org) not to the original author.
       History
    Target users This program is intended to be used by administrators
       responsible for software and database installation and maintenance.
    Comments None
