GNU's locate command is a very helpful command for unix users who are trying to locate files on a system. It is invoked with a pattern, and prints to stdout all paths on the system that match that pattern.
GNU locate comes with most linux distributions. On my laptop, if I wanted to find all paths which included the string "stdio", I would use locate:
$ locate stdio /usr/include/bits/stdio-lock.h /usr/include/bits/stdio.h /usr/include/bits/stdio_lim.h /usr/include/g++/cstdio /usr/include/g++/stdiostream.h /usr/include/stdio.h /usr/include/stdio_ext.h /usr/lib/perl5/5.6.0/i586-linux/CORE/nostdio.h /usr/lib/perl5/site_perl/5.6.0/i586-linux/bits/stdio-lock.ph /usr/lib/perl5/site_perl/5.6.0/i586-linux/bits/stdio.ph /usr/lib/perl5/site_perl/5.6.0/i586-linux/bits/stdio_lim.ph /usr/lib/perl5/site_perl/5.6.0/i586-linux/stdio.ph /usr/lib/perl5/site_perl/5.6.0/i586-linux/stdio_ext.ph /usr/lib/YaST2/plugin/libpy2stdio.so.2 /usr/lib/YaST2/plugin/libpy2stdio.so.2.0.0 /usr/local/lib/perl5/5.6.1/i686-linux/CORE/nostdio.h /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/bits/stdio-lock.ph /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/bits/stdio.ph /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/bits/stdio_lim.ph /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/stdio.ph /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/stdio_ext.ph /usr/share/man/allman/man3/stdio.3.gz /usr/share/man/allman/man3/tstdio.3.gz /usr/share/man/man3/stdio.3.gz /usr/src/linux-2.4.16/arch/ppc/boot/include/nonstdio.h /usr/src/linux-2.4.16/arch/ppc/xmon/nonstdio.h
updatedb is the command that creates the locate database. It is typically run from the root crontab several times per week in non-peak hours.
locate displays data that was current the last time updatedb was run. That is, updatedb creates a "snapshot" of paths on the system when it runs. Because of this, the degree to which locate output represents "reality" decreases as time passes between when the snapshot (the locate database) was made and the time at which locate is run. Running updatedb several times a week is usually sufficient for the currentness needs of most users.
When I worked at NASA's Ames Research Center, I made my manager aware that I could implement this functionality, with some enhancements, for filesystems configured for use by SGI's Data Migration Facility (DMF). I wrote dmlocate, dmupdatedb, and a few other perl scripts and C programs that support dmupdatedb.
dmlocate can be used in much the same way as GNU's locate, but dmlocate allows the user to specify that matching be done on more attributes than just a file's path.
GNU's locate database is "one-dimensional" in that it stores only full pathnames. dmlocate databases are multi-dimensional in that for each file, several attributes are stored:
GNU's locate databases are one to a machine. That is, a single locate database is created on a machine, and that database is used to match patterns given to the locate command. dmlocate databases are one to a DMF-configured filesystem on a machine. If a machine has five DMF-configured filesystems, then there will be five dmlocate databases created by dmupdatedb. I'm not aware that GNU's locate command parallelizes searches of the locate database. dmupdatedb and dmlocate are both parallelized. dmupdatedb determines which filesystems on a machine are DMF-configured and creates a dmlocate database for each of them. Each dmlocate database is created by a different process spawned by dmupdatedb. dmlocate parallelizes searches by spawning a number of processes equal to the number of existing dmlocate databases. Each process applies the given patterns to an individual database. Both dmlocate database production and dmlocate searches benefit considerably from performance increases because of the parallelization. But then, it is typical for DMF configured filesystems to be much larger than non-DMF unix filesystems, because of their use of offline storage.
GNU's locate, by default, takes file globs as patterns (For more information regarding file globbing, see File Generation in the sh man page or File Substitution in the csh man page.) dmlocate takes regular expressions as patterns. It supports all regular expressions supported by perl 5. (See the perl 5 version of the perlre man page for more information on regular expression support in perl.) Regular expressions are much more powerful and flexible than file globs. (locate can be given regular expressions as patterns through use of the --regexp option.)
GNU locate expects one pattern per invocation. This is because only one attribute is stored per file in a locate database. dmlocate accepts up to four patterns per invocation (actually more, but "last one in" wins for each different attribute type--see dmlocate usage). dmlocate will parse regular expressions to match any combination of attributes from: DMF state, bfid, inode number, and/or full path.
usage: dmlocate options
dmlocate [ non_field_options ] Pattern
dmlocate --help
field options:
-b,--bfid BFID Print database entries to stdout whose bfid
fields match regexp BFID
-I,--inode Inode Print database entries to stdout whose inode
fields match regexp Inode. (Note the difference
between this option and the lowercase "-i" option.)
-p,--path Path Print database entries to stdout whose path
fields match rexexp Path
-s,--state State Print database entries to stdout whose state
fields match rexexp State
other options:
-d,--debug Show debug output
-h,--help List usage
-i,--ignore-case Ignore case when matching path fields. (Note the
difference between this option and the uppercase
"-I" option.)
-n,--dont-execute Do not execute dmlocate database searches (useful
for debugging)
-v,--dont-match Print entries whose path fields do not match Pattern.
Incompatible with field options.
When no field options are given, but a Pattern is, dmlocate will print
database entries whose path fields match regexp Pattern.
Field options and a freestanding Pattern may not be specified on a
command line. But either a subset of field options or a Pattern
must be present.
Matching state and bfid fields is always case-insensitive. Matching
path fields is case sensitive unless --ignore-case is specified.
To specify matching the compliment of a field option, simply change
the dash to a plus if the single dash invocation of the field option is
used. If the double dash invocation is used, append "-not" to the
option name.
Examples: "-s dul" becomes "+s dul"; "--bfid 0" becomes "--bfid-not 0"
Caveat: matching by specifiying compliments is more time consuming than
matching when no compliments are specified.
If multiple instances of field options are given, the "last one in"
wins. That is, if "--bfid 0 --bfid 3496 --bfid 711B" is given,
neither "--bfid 0" nor "--bfid 3496" will be used for matching.
Perl regular expressions are used. Refer to the perlre(1) man page.
The default dmlocate database directory is . To use
an alternative directory, set the environment variable DMLOCATEDBDIR
accordingly.
usage: dmupdatedb [ options ]
options:
-h,--help List usage
-i,--interactive Indicate interactive behavior or dmupdatedb is
desired. Each process spawned to create a
dmlocate database is niced. dmupdatedb will wait for
all child processes to finish before exiting.
The default behavior is non-interactive.
For each of the following examples, both the short and long form of the option names are given.
dmlocate -s dul
dmlocate --state dul
dmlocate -b '^0$'
dmlocate --bfid '^0$'
dmlocate -b 0
dmlocate --bfid 0
dmlocate 'this|that'
dmlocate -p 'this|that'
dmlocate --path 'this|that'
dmlocate -p '\.html$' -s unm
dmlocate --path '\.html$' --state unm
Note that like GNU's locate, dmlocate shows data from a snapshot
of the system. Because of this, queries for unmigrating files
may not tell you what files are unmigrating now, but those that
were unmigrating the last time dmupdatedb was run.
dmlocate '/core$'
dmlocate -p '/core$'
dmlocate --path '/core$'
dmlocate +s 'reg|inv'
dmlocate --state-not 'reg|inv'
dmlocate +s ofl -I '^1(3\d{4}|40000)$' +p '^/(prod_1|prod_2)/'
dmlocate --state-not ofl --inode '^1(3\d{4}|40000)$' \
--path-not '^/(prod_1|prod_2)/'