extasia.org: dmtools extasia.org/code/dmtools/

dmtools

dmtools source code

overview

GNU locate and updatedb

GNU's locate command is a very helpful command for unix users who are trying to locate files on a system. It is invoked with a pattern, and prints to stdout all paths on the system that match that pattern.

GNU locate comes with most linux distributions. On my laptop, if I wanted to find all paths which included the string "stdio", I would use locate:

  $ locate stdio
  /usr/include/bits/stdio-lock.h
  /usr/include/bits/stdio.h
  /usr/include/bits/stdio_lim.h
  /usr/include/g++/cstdio
  /usr/include/g++/stdiostream.h
  /usr/include/stdio.h
  /usr/include/stdio_ext.h
  /usr/lib/perl5/5.6.0/i586-linux/CORE/nostdio.h
  /usr/lib/perl5/site_perl/5.6.0/i586-linux/bits/stdio-lock.ph
  /usr/lib/perl5/site_perl/5.6.0/i586-linux/bits/stdio.ph
  /usr/lib/perl5/site_perl/5.6.0/i586-linux/bits/stdio_lim.ph
  /usr/lib/perl5/site_perl/5.6.0/i586-linux/stdio.ph
  /usr/lib/perl5/site_perl/5.6.0/i586-linux/stdio_ext.ph
  /usr/lib/YaST2/plugin/libpy2stdio.so.2
  /usr/lib/YaST2/plugin/libpy2stdio.so.2.0.0
  /usr/local/lib/perl5/5.6.1/i686-linux/CORE/nostdio.h
  /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/bits/stdio-lock.ph
  /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/bits/stdio.ph
  /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/bits/stdio_lim.ph
  /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/stdio.ph
  /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/stdio_ext.ph
  /usr/share/man/allman/man3/stdio.3.gz
  /usr/share/man/allman/man3/tstdio.3.gz
  /usr/share/man/man3/stdio.3.gz
  /usr/src/linux-2.4.16/arch/ppc/boot/include/nonstdio.h
  /usr/src/linux-2.4.16/arch/ppc/xmon/nonstdio.h

updatedb is the command that creates the locate database. It is typically run from the root crontab several times per week in non-peak hours.

locate displays data that was current the last time updatedb was run. That is, updatedb creates a "snapshot" of paths on the system when it runs. Because of this, the degree to which locate output represents "reality" decreases as time passes between when the snapshot (the locate database) was made and the time at which locate is run. Running updatedb several times a week is usually sufficient for the currentness needs of most users.

dmlocate, dmupdatedb, and friends

When I worked at NASA's Ames Research Center, I made my manager aware that I could implement this functionality, with some enhancements, for filesystems configured for use by SGI's Data Migration Facility (DMF). I wrote dmlocate, dmupdatedb, and a few other perl scripts and C programs that support dmupdatedb.

dmlocate can be used in much the same way as GNU's locate, but dmlocate allows the user to specify that matching be done on more attributes than just a file's path.

Differences between locate / updatedb and dmlocate / dmupdatedb

GNU's locate database is "one-dimensional" in that it stores only full pathnames. dmlocate databases are multi-dimensional in that for each file, several attributes are stored:

GNU's locate databases are one to a machine. That is, a single locate database is created on a machine, and that database is used to match patterns given to the locate command. dmlocate databases are one to a DMF-configured filesystem on a machine. If a machine has five DMF-configured filesystems, then there will be five dmlocate databases created by dmupdatedb. I'm not aware that GNU's locate command parallelizes searches of the locate database. dmupdatedb and dmlocate are both parallelized. dmupdatedb determines which filesystems on a machine are DMF-configured and creates a dmlocate database for each of them. Each dmlocate database is created by a different process spawned by dmupdatedb. dmlocate parallelizes searches by spawning a number of processes equal to the number of existing dmlocate databases. Each process applies the given patterns to an individual database. Both dmlocate database production and dmlocate searches benefit considerably from performance increases because of the parallelization. But then, it is typical for DMF configured filesystems to be much larger than non-DMF unix filesystems, because of their use of offline storage.

GNU's locate, by default, takes file globs as patterns (For more information regarding file globbing, see File Generation in the sh man page or File Substitution in the csh man page.) dmlocate takes regular expressions as patterns. It supports all regular expressions supported by perl 5. (See the perl 5 version of the perlre man page for more information on regular expression support in perl.) Regular expressions are much more powerful and flexible than file globs. (locate can be given regular expressions as patterns through use of the --regexp option.)

GNU locate expects one pattern per invocation. This is because only one attribute is stored per file in a locate database. dmlocate accepts up to four patterns per invocation (actually more, but "last one in" wins for each different attribute type--see dmlocate usage). dmlocate will parse regular expressions to match any combination of attributes from: DMF state, bfid, inode number, and/or full path.

dmlocate synopsis

  usage: dmlocate options
         dmlocate [ non_field_options ] Pattern
         dmlocate --help

  field options:

    -b,--bfid BFID     Print database entries to stdout whose bfid
                       fields match regexp BFID

    -I,--inode Inode   Print database entries to stdout whose inode
                       fields match regexp Inode.  (Note the difference
                       between this option and the lowercase "-i" option.)

    -p,--path Path     Print database entries to stdout whose path
                       fields match rexexp Path

    -s,--state State   Print database entries to stdout whose state
                       fields match rexexp State

  other options:

    -d,--debug         Show debug output

    -h,--help          List usage

    -i,--ignore-case   Ignore case when matching path fields.  (Note the
                       difference between this option and the uppercase
                       "-I" option.)

    -n,--dont-execute  Do not execute dmlocate database searches (useful
                       for debugging)

    -v,--dont-match    Print entries whose path fields do not match Pattern.
                       Incompatible with field options.

  When no field options are given, but a Pattern is, dmlocate will print
  database entries whose path fields match regexp Pattern.

  Field options and a freestanding Pattern may not be specified on a
  command line.  But either a subset of field options or a Pattern
  must be present.

  Matching state and bfid fields is always case-insensitive.  Matching
  path fields is case sensitive unless --ignore-case is specified.

  To specify matching the compliment of a field option, simply change
  the dash to a plus if the single dash invocation of the field option is
  used.  If the double dash invocation is used, append "-not" to the
  option name.

    Examples: "-s dul" becomes "+s dul"; "--bfid 0" becomes "--bfid-not 0"

  Caveat:  matching by specifiying compliments is more time consuming than
  matching when no compliments are specified.

  If multiple instances of field options are given, the "last one in"
  wins.  That is, if "--bfid 0 --bfid 3496 --bfid 711B" is given,
  neither "--bfid 0" nor "--bfid 3496" will be used for matching.

  Perl regular expressions are used.  Refer to the perlre(1) man page.

  The default dmlocate database directory is .  To use
  an alternative directory, set the environment variable DMLOCATEDBDIR
  accordingly.

dmupdatedb synopsis

  usage: dmupdatedb [ options ]

  options:
    -h,--help           List usage

    -i,--interactive    Indicate interactive behavior or dmupdatedb is
                        desired.  Each process spawned to create a
                        dmlocate database is niced.  dmupdatedb will wait for
                        all child processes to finish before exiting.

                        The default behavior is non-interactive.

dmlocate example invocations

For each of the following examples, both the short and long form of the option names are given.

  1. List all dmf files with state DUL. (Matches on state are not case sensitive.)
          dmlocate -s dul
          dmlocate --state dul
    
  2. List all dmf files with bfid 0
          dmlocate -b '^0$'
          dmlocate --bfid '^0$'
    
  3. List all dmf files whose bfid *contains* 0
          dmlocate -b 0
          dmlocate --bfid 0
    
  4. List all dmf files whose path contains "this" or "that"
          dmlocate 'this|that'
          dmlocate -p 'this|that'
          dmlocate --path 'this|that'
    
  5. List all unmigrating dmf files ending in '.html'
          dmlocate -p '\.html$' -s unm
          dmlocate --path '\.html$' --state unm
    

    Note that like GNU's locate, dmlocate shows data from a snapshot of the system. Because of this, queries for unmigrating files may not tell you what files are unmigrating now, but those that were unmigrating the last time dmupdatedb was run.

  6. List all core files
          dmlocate '/core$'
          dmlocate -p '/core$'
          dmlocate --path '/core$'
    
  7. List all database entries representing dmf controlled files, i.e., ones that don't have states REG or INV:
          dmlocate +s 'reg|inv'
          dmlocate --state-not 'reg|inv'
    
  8. Compliment and non-compliment regexp's may appear together on the command line. List each entry whose state is not offline, whose inode is in the 130000 - 140000 range and whose path is not in the /prod_1 or /prod_2 filesystems:
          dmlocate +s ofl -I '^1(3\d{4}|40000)$' +p '^/(prod_1|prod_2)/'
          dmlocate --state-not ofl --inode '^1(3\d{4}|40000)$' \
                     --path-not '^/(prod_1|prod_2)/'
    


Last modified: $Date: 2004/02/21 19:53:44 $