File: coreutils.info, Node: Top, Next: Introduction, Up: (dir)
GNU Coreutils
*************
This manual documents version 5.2.1 of the GNU core utilities,
including the standard programs for text and file manipulation.
Copyright (C) 1994, 1995, 1996, 2000, 2001, 2002, 2003, 2004 Free
Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.1 or any later version published by the Free Software
Foundation; with no Invariant Sections, with no Front-Cover Texts,
and with no Back-Cover Texts. A copy of the license is included
in the section entitled "GNU Free Documentation License".
* Menu:
* Introduction:: Caveats, overview, and authors.
* Common options:: Common options.
* Output of entire files:: cat tac nl od
* Formatting file contents:: fmt pr fold
* Output of parts of files:: head tail split csplit
* Summarizing files:: wc sum cksum md5sum
* Operating on sorted files:: sort uniq comm ptx tsort
* Operating on fields within a line:: cut paste join
* Operating on characters:: tr expand unexpand
* Directory listing:: ls dir vdir d v dircolors
* Basic operations:: cp dd install mv rm shred
* Special file types:: ln mkdir rmdir mkfifo mknod
* Changing file attributes:: chgrp chmod chown touch
* Disk usage:: df du stat sync
* Printing text:: echo printf yes
* Conditions:: false true test expr
* Redirection:: tee
* File name manipulation:: dirname basename pathchk
* Working context:: pwd stty printenv tty
* User information:: id logname whoami groups users who
* System context:: date uname hostname
* Modified command invocation:: chroot env nice nohup su
* Process control:: kill
* Delaying:: sleep
* Numeric operations:: factor seq
* File permissions:: Access modes.
* Date input formats:: Specifying date strings.
* Opening the software toolbox:: The software tools philosophy.
* GNU Free Documentation License:: The license for this documentation.
* Index:: General index.
--- The Detailed Node Listing ---
Common Options
* Exit status:: Indicating program success or failure.
* Backup options:: Backup options
* Block size:: Block size
* Target directory:: Target directory
* Trailing slashes:: Trailing slashes
* Traversing symlinks:: Traversing symlinks to directories
* Treating / specially:: Treating / specially
* Standards conformance:: Standards conformance
Output of entire files
* cat invocation:: Concatenate and write files.
* tac invocation:: Concatenate and write files in reverse.
* nl invocation:: Number lines and write files.
* od invocation:: Write files in octal or other formats.
Formatting file contents
* fmt invocation:: Reformat paragraph text.
* pr invocation:: Paginate or columnate files for printing.
* fold invocation:: Wrap input lines to fit in specified width.
Output of parts of files
* head invocation:: Output the first part of files.
* tail invocation:: Output the last part of files.
* split invocation:: Split a file into fixed-size pieces.
* csplit invocation:: Split a file into context-determined pieces.
Summarizing files
* wc invocation:: Print newline, word, and byte counts.
* sum invocation:: Print checksum and block counts.
* cksum invocation:: Print CRC checksum and byte counts.
* md5sum invocation:: Print or check message-digests.
Operating on sorted files
* sort invocation:: Sort text files.
* uniq invocation:: Uniquify files.
* comm invocation:: Compare two sorted files line by line.
* ptx invocation:: Produce a permuted index of file contents.
* tsort invocation:: Topological sort.
`ptx': Produce permuted indexes
* General options in ptx:: Options which affect general program behavior.
* Charset selection in ptx:: Underlying character set considerations.
* Input processing in ptx:: Input fields, contexts, and keyword selection.
* Output formatting in ptx:: Types of output format, and sizing the fields.
* Compatibility in ptx:: The GNU extensions to `ptx'
Operating on fields within a line
* cut invocation:: Print selected parts of lines.
* paste invocation:: Merge lines of files.
* join invocation:: Join lines on a common field.
Operating on characters
* tr invocation:: Translate, squeeze, and/or delete characters.
* expand invocation:: Convert tabs to spaces.
* unexpand invocation:: Convert spaces to tabs.
`tr': Translate, squeeze, and/or delete characters
* Character sets:: Specifying sets of characters.
* Translating:: Changing one set of characters to another.
* Squeezing:: Squeezing repeats and deleting.
* Warnings in tr:: Warning messages.
Directory listing
* ls invocation:: List directory contents
* dir invocation:: Briefly list directory contents
* vdir invocation:: Verbosely list directory contents
* dircolors invocation:: Color setup for `ls'
`ls': List directory contents
* Which files are listed:: Which files are listed
* What information is listed:: What information is listed
* Sorting the output:: Sorting the output
* More details about version sort:: More details about version sort
* General output formatting:: General output formatting
* Formatting the file names:: Formatting the file names
Basic operations
* cp invocation:: Copy files and directories
* dd invocation:: Convert and copy a file
* install invocation:: Copy files and set attributes
* mv invocation:: Move (rename) files
* rm invocation:: Remove files or directories
* shred invocation:: Remove files more securely
Special file types
* link invocation:: Make a hard link via the link syscall
* ln invocation:: Make links between files
* mkdir invocation:: Make directories
* mkfifo invocation:: Make FIFOs (named pipes)
* mknod invocation:: Make block or character special files
* readlink invocation:: Print the referent of a symbolic link
* rmdir invocation:: Remove empty directories
* unlink invocation:: Remove files via unlink syscall
Changing file attributes
* chown invocation:: Change file owner and group
* chgrp invocation:: Change group ownership
* chmod invocation:: Change access permissions
* touch invocation:: Change file timestamps
Disk usage
* df invocation:: Report filesystem disk space usage
* du invocation:: Estimate file space usage
* stat invocation:: Report file or filesystem status
* sync invocation:: Synchronize data on disk with memory
Printing text
* echo invocation:: Print a line of text
* printf invocation:: Format and print data
* yes invocation:: Print a string until interrupted
Conditions
* false invocation:: Do nothing, unsuccessfully
* true invocation:: Do nothing, successfully
* test invocation:: Check file types and compare values
* expr invocation:: Evaluate expressions
`test': Check file types and compare values
* File type tests:: File type tests
* Access permission tests:: Access permission tests
* File characteristic tests:: File characteristic tests
* String tests:: String tests
* Numeric tests:: Numeric tests
`expr': Evaluate expression
* String expressions:: + : match substr index length
* Numeric expressions:: + - * / %
* Relations for expr:: | & < <= = == != >= >
* Examples of expr:: Examples of using `expr'
Redirection
* tee invocation:: Redirect output to multiple files
File name manipulation
* basename invocation:: Strip directory and suffix from a file name
* dirname invocation:: Strip non-directory suffix from a file name
* pathchk invocation:: Check file name portability
Working context
* pwd invocation:: Print working directory
* stty invocation:: Print or change terminal characteristics
* printenv invocation:: Print all or some environment variables
* tty invocation:: Print file name of terminal on standard input
`stty': Print or change terminal characteristics
* Control:: Control settings
* Input:: Input settings
* Output:: Output settings
* Local:: Local settings
* Combination:: Combination settings
* Characters:: Special characters
* Special:: Special settings
User information
* id invocation:: Print real and effective uid and gid
* logname invocation:: Print current login name
* whoami invocation:: Print effective user id
* groups invocation:: Print group names a user is in
* users invocation:: Print login names of users currently logged in
* who invocation:: Print who is currently logged in
System context
* date invocation:: Print or set system date and time
* uname invocation:: Print system information
* hostname invocation:: Print or set system name
* hostid invocation:: Print numeric host identifier.
`date': Print or set system date and time
* Time directives:: Time directives
* Date directives:: Date directives
* Literal directives:: Literal directives
* Padding:: Padding
* Setting the time:: Setting the time
* Options for date:: Options for `date'
* Examples of date:: Examples of `date'
Modified command invocation
* chroot invocation:: Run a command with a different root directory
* env invocation:: Run a command in a modified environment
* nice invocation:: Run a command with modified scheduling priority
* nohup invocation:: Run a command immune to hangups
* su invocation:: Run a command with substitute user and group id
Process control
* kill invocation:: Sending a signal to processes.
Delaying
* sleep invocation:: Delay for a specified time
Numeric operations
* factor invocation:: Print prime factors
* seq invocation:: Print numeric sequences
File permissions
* Mode Structure:: Structure of File Permissions
* Symbolic Modes:: Mnemonic permissions representation
* Numeric Modes:: Permissions as octal numbers
Date input formats
* General date syntax: General date syntax
* Calendar date items: Calendar date items
* Time of day items: Time of day items
* Time zone items: Time zone items
* Day of week items: Day of week items
* Relative items in date strings: Relative items in date strings
* Pure numbers in date strings: Pure numbers in date strings
* Authors of getdate: Authors of getdate
Opening the software toolbox
* Toolbox introduction:: Toolbox introduction
* I/O redirection:: I/O redirection
* The who command:: The `who' command
* The cut command:: The `cut' command
* The sort command:: The `sort' command
* The uniq command:: The `uniq' command
* Putting the tools together:: Putting the tools together
GNU Free Documentation License
* How to use this License for your documents::
File: coreutils.info, Node: Introduction, Next: Common options, Prev: Top, Up: Top
Introduction
************
This manual is a work in progress: many sections make no attempt to
explain basic concepts in a way suitable for novices. Thus, if you are
interested, please get involved in improving this manual. The entire
GNU community will benefit.
The GNU utilities documented here are mostly compatible with the
POSIX standard. Please report bugs to <bug-coreutils AT gnu.org>.
Remember to include the version number, machine architecture, input
files, and any other information needed to reproduce the bug: your
input, what you expected, what you got, and why it is wrong. Diffs are
welcome, but please include a description of the problem as well, since
this is sometimes difficult to infer. *Note Bugs: (gcc)Bugs.
This manual was originally derived from the Unix man pages in the
distributions, which were written by David MacKenzie and updated by Jim
Meyering. What you are reading now is the authoritative documentation
for these utilities; the man pages are no longer being maintained. The
original `fmt' man page was written by Ross Paterson. Franc,ois Pinard
did the initial conversion to Texinfo format. Karl Berry did the
indexing, some reorganization, and editing of the results. Brian
Youmans of the Free Software Foundation office staff combined the
manuals for textutils, fileutils, and sh-utils to produce the present
omnibus manual. Richard Stallman contributed his usual invaluable
insights to the overall process.
File: coreutils.info, Node: Common options, Next: Output of entire files, Prev: Introduction, Up: Top
Common options
**************
Certain options are available in all of these programs. Rather than
writing identical descriptions for each of the programs, they are
described here. (In fact, every GNU program accepts (or should accept)
these options.)
Normally options and operands can appear in any order, and programs
act as if all the options appear before any operands. For example,
`sort -r passwd -t :' acts like `sort -r -t : passwd', since `:' is an
option-argument of `-t'. However, if the `POSIXLY_CORRECT' environment
variable is set, options must appear before operands, unless otherwise
specified for a particular command.
Some of these programs recognize the `--help' and `--version'
options only when one of them is the sole command line argument.
`--help'
Print a usage message listing all available options, then exit
successfully.
`--version'
Print the version number, then exit successfully.
`--'
Delimit the option list. Later arguments, if any, are treated as
operands even if they begin with `-'. For example, `sort -- -r'
reads from the file named `-r'.
A single `-' is not really an option, though it looks like one. It
stands for standard input, or for standard output if that is clear from
the context, and it can be used either as an operand or as an
option-argument. For example, `sort -o - -' outputs to standard output
and reads from standard input, and is equivalent to plain `sort'.
Unless otherwise specified, `-' can appear in any context that requires
a file name.
* Menu:
* Exit status:: Indicating program success or failure.
* Backup options:: -b -S -V, in some programs.
* Block size:: BLOCK_SIZE and --block-size, in some programs.
* Target directory:: --target-directory, in some programs.
* Trailing slashes:: --strip-trailing-slashes, in some programs.
* Traversing symlinks:: -H, -L, or -P, in some programs.
* Treating / specially:: --preserve-root and --no-preserve-root.
* Standards conformance:: Conformance to the POSIX standard.
File: coreutils.info, Node: Exit status, Next: Backup options, Up: Common options
Exit status
===========
Nearly every command invocation yields an integral "exit status" that
can be used to change how other commands work. For the vast majority
of commands, an exit status of zero indicates success. Failure is
indicated by a nonzero value--typically `1', though it may differ on
unusual platforms as POSIX requires only that it be nonzero.
However, some of the programs documented here do produce other exit
status values and a few associate different meanings with the values
`0' and `1'. Here are some of the exceptions: `chroot', `env', `expr',
`nice', `nohup', `printenv', `sort', `su', `test', `tty'.
File: coreutils.info, Node: Backup options, Next: Block size, Prev: Exit status, Up: Common options
Backup options
==============
Some GNU programs (at least `cp', `install', `ln', and `mv') optionally
make backups of files before writing new versions. These options
control the details of these backups. The options are also briefly
mentioned in the descriptions of the particular programs.
`-b'
`--backup[=METHOD]'
Make a backup of each file that would otherwise be overwritten or
removed. Without this option, the original versions are destroyed.
Use METHOD to determine the type of backups to make. When this
option is used but METHOD is not specified, then the value of the
`VERSION_CONTROL' environment variable is used. And if
`VERSION_CONTROL' is not set, the default backup type is
`existing'.
Note that the short form of this option, `-b' does not accept any
argument. Using `-b' is equivalent to using `--backup=existing'.
This option corresponds to the Emacs variable `version-control';
the values for METHOD are the same as those used in Emacs. This
option also accepts more descriptive names. The valid METHODs are
(unique abbreviations are accepted):
`none'
`off'
Never make backups.
`numbered'
`t'
Always make numbered backups.
`existing'
`nil'
Make numbered backups of files that already have them, simple
backups of the others.
`simple'
`never'
Always make simple backups. Please note `never' is not to be
confused with `none'.
`-S SUFFIX'
`--suffix=SUFFIX'
Append SUFFIX to each backup file made with `-b'. If this option
is not specified, the value of the `SIMPLE_BACKUP_SUFFIX'
environment variable is used. And if `SIMPLE_BACKUP_SUFFIX' is not
set, the default is `~', just as in Emacs.
`--version-control=METHOD'
This option is obsolete and will be removed in a future release.
It has been replaced with `--backup'.
File: coreutils.info, Node: Block size, Next: Target directory, Prev: Backup options, Up: Common options
Block size
==========
Some GNU programs (at least `df', `du', and `ls') display sizes in
"blocks". You can adjust the block size and method of display to make
sizes easier to read. The block size used for display is independent
of any filesystem block size. Fractional block counts are rounded up
to the nearest integer.
The default block size is chosen by examining the following
environment variables in turn; the first one that is set determines the
block size.
`DF_BLOCK_SIZE'
This specifies the default block size for the `df' command.
Similarly, `DU_BLOCK_SIZE' specifies the default for `du' and
`LS_BLOCK_SIZE' for `ls'.
`BLOCK_SIZE'
This specifies the default block size for all three commands, if
the above command-specific environment variables are not set.
`POSIXLY_CORRECT'
If neither the `COMMAND_BLOCK_SIZE' nor the `BLOCK_SIZE' variables
are set, but this variable is set, the block size defaults to 512.
If none of the above environment variables are set, the block size
currently defaults to 1024 bytes in most contexts, but this number may
change in the future. For `ls' file sizes, the block size defaults to
1 byte.
A block size specification can be a positive integer specifying the
number of bytes per block, or it can be `human-readable' or `si' to
select a human-readable format. Integers may be followed by suffixes
that are upward compatible with the SI prefixes
(http://www.bipm.fr/enus/3_SI/si-prefixes.html) for decimal multiples
and with the IEC 60027-2 prefixes for binary multiples
(http://physics.nist.gov/cuu/Units/binary.html).
With human-readable formats, output sizes are followed by a size
letter such as `M' for megabytes. `BLOCK_SIZE=human-readable' uses
powers of 1024; `M' stands for 1,048,576 bytes. `BLOCK_SIZE=si' is
similar, but uses powers of 1000 and appends `B'; `MB' stands for
1,000,000 bytes.
A block size specification preceded by `'' causes output sizes to be
displayed with thousands separators. The `LC_NUMERIC' locale specifies
the thousands separator and grouping. For example, in an American
English locale, `--block-size="'1kB"' would cause a size of 1234000
bytes to be displayed as `1,234'. In the default C locale, there is no
thousands separator so a leading `'' has no effect.
An integer block size can be followed by a suffix to specify a
multiple of that size. A bare size letter, or one followed by `iB',
specifies a multiple using powers of 1024. A size letter followed by
`B' specifies powers of 1000 instead. For example, `1M' and `1MiB' are
equivalent to `1048576', whereas `1MB' is equivalent to `1000000'.
A plain suffix without a preceding integer acts as if `1' were
prepended, except that it causes a size indication to be appended to
the output. For example, `--block-size="kB"' displays 3000 as `3kB'.
The following suffixes are defined. Large sizes like `1Y' may be
rejected by your computer due to limitations of its arithmetic.
`kB'
kilobyte: 10^3 = 1000.
`k'
`K'
`KiB'
kibibyte: 2^10 = 1024. `K' is special: the SI prefix is `k' and
the IEC 60027-2 prefix is `Ki', but tradition and POSIX use `k' to
mean `KiB'.
`MB'
megabyte: 10^6 = 1,000,000.
`M'
`MiB'
mebibyte: 2^20 = 1,048,576.
`GB'
gigabyte: 10^9 = 1,000,000,000.
`G'
`GiB'
gibibyte: 2^30 = 1,073,741,824.
`TB'
terabyte: 10^12 = 1,000,000,000,000.
`T'
`TiB'
tebibyte: 2^40 = 1,099,511,627,776.
`PB'
petabyte: 10^15 = 1,000,000,000,000,000.
`P'
`PiB'
pebibyte: 2^50 = 1,125,899,906,842,624.
`EB'
exabyte: 10^18 = 1,000,000,000,000,000,000.
`E'
`EiB'
exbibyte: 2^60 = 1,152,921,504,606,846,976.
`ZB'
zettabyte: 10^21 = 1,000,000,000,000,000,000,000
`Z'
`ZiB'
2^70 = 1,180,591,620,717,411,303,424. (`Zi' is a GNU extension to
IEC 60027-2.)
`YB'
yottabyte: 10^24 = 1,000,000,000,000,000,000,000,000.
`Y'
`YiB'
2^80 = 1,208,925,819,614,629,174,706,176. (`Yi' is a GNU
extension to IEC 60027-2.)
Block size defaults can be overridden by an explicit
`--block-size=SIZE' option. The `-k' option is equivalent to
`--block-size=1K', which is the default unless the `POSIXLY_CORRECT'
environment variable is set. The `-h' or `--human-readable' option is
equivalent to `--block-size=human-readable'. The `--si' option is
equivalent to `--block-size=si'.
File: coreutils.info, Node: Target directory, Next: Trailing slashes, Prev: Block size, Up: Common options
Target directory
================
Some GNU programs (at least `cp', `install', `ln', and `mv') allow you
to specify the target directory via this option:
`--target-directory=DIRECTORY'
Specify the destination DIRECTORY.
The interface for most programs is that after processing options
and a finite (possibly zero) number of fixed-position arguments,
the remaining argument list is either expected to be empty, or is
a list of items (usually files) that will all be handled
identically. The `xargs' program is designed to work well with
this convention.
The commands in the `mv'-family are unusual in that they take a
variable number of arguments with a special case at the _end_
(namely, the target directory). This makes it nontrivial to
perform some operations, e.g., "move all files from here to
../d/", because `mv * ../d/' might exhaust the argument space, and
`ls | xargs ...' doesn't have a clean way to specify an extra
final argument for each invocation of the subject command. (It
can be done by going through a shell command, but that requires
more human labor and brain power than it should.)
The `--target-directory' option allows the `cp', `install', `ln',
and `mv' programs to be used conveniently with `xargs'. For
example, you can move the files from the current directory to a
sibling directory, `d' like this: (However, this doesn't move
files whose names begin with `.'.)
ls |xargs mv --target-directory=../d
If you use the GNU `find' program, you can move _all_ files with
this command:
find . -mindepth 1 -maxdepth 1 \
| xargs mv --target-directory=../d
But that will fail if there are no files in the current directory
or if any file has a name containing a newline character. The
following example removes those limitations and requires both GNU
`find' and GNU `xargs':
find . -mindepth 1 -maxdepth 1 -print0 \
| xargs --null --no-run-if-empty \
mv --target-directory=../d
File: coreutils.info, Node: Trailing slashes, Next: Traversing symlinks, Prev: Target directory, Up: Common options
Trailing slashes
================
Some GNU programs (at least `cp' and `mv') allow you to remove any
trailing slashes from each SOURCE argument before operating on it. The
`--strip-trailing-slashes' option enables this behavior.
This is useful when a SOURCE argument may have a trailing slash and
specify a symbolic link to a directory. This scenario is in fact rather
common because some shells can automatically append a trailing slash
when performing file name completion on such symbolic links. Without
this option, `mv', for example, (via the system's rename function) must
interpret a trailing slash as a request to dereference the symbolic link
and so must rename the indirectly referenced _directory_ and not the
symbolic link. Although it may seem surprising that such behavior be
the default, it is required by POSIX and is consistent with other parts
of that standard.
File: coreutils.info, Node: Traversing symlinks, Next: Treating / specially, Prev: Trailing slashes, Up: Common options
Traversing symlinks
===================
The following options modify how `chown' and `chgrp' traverse a
hierarchy when the `--recursive' (`-R') option is also specified. If
more than one of the following options is specified, only the final one
takes effect. These options specify whether processing a symbolic link
to a directory entails operating on just the symbolic link or on all
files in the hierarchy rooted at that directory.
These options are independent of `--dereference' and
`--no-dereference' (`-h'), which control whether to modify a symlink or
its referent.
`-H'
If `--recursive' (`-R') is specified and a command line argument
is a symbolic link to a directory, traverse it.
`-L'
In a recursive traversal, traverse every symbolic link to a
directory that is encountered.
`-P'
Do not traverse any symbolic links. This is the default if none
of `-H', `-L', or `-P' is specified.
File: coreutils.info, Node: Treating / specially, Next: Standards conformance, Prev: Traversing symlinks, Up: Common options
Treating / specially
====================
Certain commands can operate destructively on entire hierarchies. For
example, if a user with appropriate privileges mistakenly runs `rm -rf
/ tmp/junk' or `cd /bin; rm -rf ../', that may remove all files on the
entire system. Since there are so few (1) legitimate uses for such a
command, GNU `rm' provides the `--preserve-root' option to make it so
`rm' declines to operate on any directory that resolves to `/'. The
default is still to allow `rm -rf /' to operate unimpeded. Another new
option, `--no-preserve-root', cancels the effect of any preceding
`--preserve-root' option. Note that the `--preserve-root' behavior may
become the default for `rm'.
The commands `chgrp', `chmod' and `chown' can also operate
destructively on entire hierarchies, so they too support these options.
Although, unlike `rm', they don't actually unlink files, these
commands are arguably more dangerous when operating recursively on `/',
since they often work much more quickly, and hence damage more files
before an alert user can interrupt them.
---------- Footnotes ----------
(1) If you know of one, please write to <bug-coreutils AT gnu.org>.
File: coreutils.info, Node: Standards conformance, Prev: Treating / specially, Up: Common options
Standards conformance
=====================
In a few cases, the GNU utilities' default behavior is incompatible
with the POSIX standard. To suppress these incompatibilities, define
the `POSIXLY_CORRECT' environment variable. Unless you are checking
for POSIX conformance, you probably do not need to define
`POSIXLY_CORRECT'.
Newer versions of POSIX are occasionally incompatible with older
versions. For example, older versions of POSIX required the command
`sort +1' to sort based on the second and succeeding fields in each
input line, but starting with POSIX 1003.1-2001 the same command is
required to sort the file named `+1', and you must instead use the
command `sort -k 2' to get the field-based sort.
The GNU utilities normally conform to the version of POSIX that is
standard for your system. To cause them to conform to a different
version of POSIX, define the `_POSIX2_VERSION' environment variable to
a value of the form YYYYMM specifying the year and month the standard
was adopted. Two values are currently supported for `_POSIX2_VERSION':
`199209' stands for POSIX 1003.2-1992, and `200112' stands for POSIX
1003.1-2001. For example, if you are running older software that
assumes an older version of POSIX and uses `sort +1', `head -1', or
`tail +1', you can work around the compatibility problems by setting
`_POSIX2_VERSION=199209' in your environment.
File: coreutils.info, Node: Output of entire files, Next: Formatting file contents, Prev: Common options, Up: Top
Output of entire files
**********************
These commands read and write entire files, possibly transforming them
in some way.
* Menu:
* cat invocation:: Concatenate and write files.
* tac invocation:: Concatenate and write files in reverse.
* nl invocation:: Number lines and write files.
* od invocation:: Write files in octal or other formats.
File: coreutils.info, Node: cat invocation, Next: tac invocation, Up: Output of entire files
`cat': Concatenate and write files
==================================
`cat' copies each FILE (`-' means standard input), or standard input if
none are given, to standard output. Synopsis:
cat [OPTION] [FILE]...
The program accepts the following options. Also see *Note Common
options::.
`-A'
`--show-all'
Equivalent to `-vET'.
`-B'
`--binary'
On MS-DOS and MS-Windows only, read and write the files in binary
mode. By default, `cat' on MS-DOS/MS-Windows uses binary mode
only when standard output is redirected to a file or a pipe; this
option overrides that. Binary file I/O is used so that the files
retain their format (Unix text as opposed to DOS text and binary),
because `cat' is frequently used as a file-copying program. Some
options (see below) cause `cat' to read and write files in text
mode because in those cases the original file contents aren't
important (e.g., when lines are numbered by `cat', or when line
endings should be marked). This is so these options work as
DOS/Windows users would expect; for example, DOS-style text files
have their lines end with the CR-LF pair of characters, which
won't be processed as an empty line by `-b' unless the file is
read in text mode.
`-b'
`--number-nonblank'
Number all nonblank output lines, starting with 1. On MS-DOS and
MS-Windows, this option causes `cat' to read and write files in
text mode.
`-e'
Equivalent to `-vE'.
`-E'
`--show-ends'
Display a `$' after the end of each line. On MS-DOS and
MS-Windows, this option causes `cat' to read and write files in
text mode.
`-n'
`--number'
Number all output lines, starting with 1. On MS-DOS and
MS-Windows, this option causes `cat' to read and write files in
text mode.
`-s'
`--squeeze-blank'
Replace multiple adjacent blank lines with a single blank line. On
MS-DOS and MS-Windows, this option causes `cat' to read and write
files in text mode.
`-t'
Equivalent to `-vT'.
`-T'
`--show-tabs'
Display TAB characters as `^I'.
`-u'
Ignored; for Unix compatibility.
`-v'
`--show-nonprinting'
Display control characters except for LFD and TAB using `^'
notation and precede characters that have the high bit set with
`M-'. On MS-DOS and MS-Windows, this option causes `cat' to read
files and standard input in DOS binary mode, so the CR characters
at the end of each line are also visible.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: tac invocation, Next: nl invocation, Prev: cat invocation, Up: Output of entire files
`tac': Concatenate and write files in reverse
=============================================
`tac' copies each FILE (`-' means standard input), or standard input if
none are given, to standard output, reversing the records (lines by
default) in each separately. Synopsis:
tac [OPTION]... [FILE]...
"Records" are separated by instances of a string (newline by
default). By default, this separator string is attached to the end of
the record that it follows in the file.
The program accepts the following options. Also see *Note Common
options::.
`-b'
`--before'
The separator is attached to the beginning of the record that it
precedes in the file.
`-r'
`--regex'
Treat the separator string as a regular expression. Users of `tac'
on MS-DOS/MS-Windows should note that, since `tac' reads files in
binary mode, each line of a text file might end with a CR/LF pair
instead of the Unix-style LF.
`-s SEPARATOR'
`--separator=SEPARATOR'
Use SEPARATOR as the record separator, instead of newline.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: nl invocation, Next: od invocation, Prev: tac invocation, Up: Output of entire files
`nl': Number lines and write files
==================================
`nl' writes each FILE (`-' means standard input), or standard input if
none are given, to standard output, with line numbers added to some or
all of the lines. Synopsis:
nl [OPTION]... [FILE]...
`nl' decomposes its input into (logical) pages; by default, the line
number is reset to 1 at the top of each logical page. `nl' treats all
of the input files as a single document; it does not reset line numbers
or logical pages between files.
A logical page consists of three sections: header, body, and footer.
Any of the sections can be empty. Each can be numbered in a different
style from the others.
The beginnings of the sections of logical pages are indicated in the
input file by a line containing exactly one of these delimiter strings:
`\:\:\:'
start of header;
`\:\:'
start of body;
`\:'
start of footer.
The two characters from which these strings are made can be changed
from `\' and `:' via options (see below), but the pattern and length of
each string cannot be changed.
A section delimiter is replaced by an empty line on output. Any text
that comes before the first section delimiter string in the input file
is considered to be part of a body section, so `nl' treats a file that
contains no section delimiters as a single body section.
The program accepts the following options. Also see *Note Common
options::.
`-b STYLE'
`--body-numbering=STYLE'
Select the numbering style for lines in the body section of each
logical page. When a line is not numbered, the current line number
is not incremented, but the line number separator character is
still prepended to the line. The styles are:
`a'
number all lines,
`t'
number only nonempty lines (default for body),
`n'
do not number lines (default for header and footer),
`pBRE'
number only lines that contain a match for the basic regular
expression BRE. *Note Regular Expressions: (grep)Regular
Expressions.
`-d CD'
`--section-delimiter=CD'
Set the section delimiter characters to CD; default is `\:'. If
only C is given, the second remains `:'. (Remember to protect `\'
or other metacharacters from shell expansion with quotes or extra
backslashes.)
`-f STYLE'
`--footer-numbering=STYLE'
Analogous to `--body-numbering'.
`-h STYLE'
`--header-numbering=STYLE'
Analogous to `--body-numbering'.
`-i NUMBER'
`--page-increment=NUMBER'
Increment line numbers by NUMBER (default 1).
`-l NUMBER'
`--join-blank-lines=NUMBER'
Consider NUMBER (default 1) consecutive empty lines to be one
logical line for numbering, and only number the last one. Where
fewer than NUMBER consecutive empty lines occur, do not number
them. An empty line is one that contains no characters, not even
spaces or tabs.
`-n FORMAT'
`--number-format=FORMAT'
Select the line numbering format (default is `rn'):
`ln'
left justified, no leading zeros;
`rn'
right justified, no leading zeros;
`rz'
right justified, leading zeros.
`-p'
`--no-renumber'
Do not reset the line number at the start of a logical page.
`-s STRING'
`--number-separator=STRING'
Separate the line number from the text line in the output with
STRING (default is the TAB character).
`-v NUMBER'
`--starting-line-number=NUMBER'
Set the initial line number on each logical page to NUMBER
(default 1).
`-w NUMBER'
`--number-width=NUMBER'
Use NUMBER characters for line numbers (default 6).
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: od invocation, Prev: nl invocation, Up: Output of entire files
`od': Write files in octal or other formats
===========================================
`od' writes an unambiguous representation of each FILE (`-' means
standard input), or standard input if none are given. Synopses:
od [OPTION]... [FILE]...
od --traditional [FILE] [[+]OFFSET [[+]LABEL]]
Each line of output consists of the offset in the input, followed by
groups of data from the file. By default, `od' prints the offset in
octal, and each group of file data is two bytes of input printed as a
single octal number.
The program accepts the following options. Also see *Note Common
options::.
`-A RADIX'
`--address-radix=RADIX'
Select the base in which file offsets are printed. RADIX can be
one of the following:
`d'
decimal;
`o'
octal;
`x'
hexadecimal;
`n'
none (do not print offsets).
The default is octal.
`-j BYTES'
`--skip-bytes=BYTES'
Skip BYTES input bytes before formatting and writing. If BYTES
begins with `0x' or `0X', it is interpreted in hexadecimal;
otherwise, if it begins with `0', in octal; otherwise, in decimal.
Appending `b' multiplies BYTES by 512, `k' by 1024, and `m' by
1048576.
`-N BYTES'
`--read-bytes=BYTES'
Output at most BYTES bytes of the input. Prefixes and suffixes on
`bytes' are interpreted as for the `-j' option.
`-s N'
`--strings[=N]'
Instead of the normal output, output only "string constants": at
least N consecutive ASCII graphic characters, followed by a null
(zero) byte.
If N is omitted with `--strings', the default is 3. On older
systems, GNU `od' instead supports an obsolete option `-s[N]',
where N also defaults to 3. POSIX 1003.1-2001 (*note Standards
conformance::) does not allow `-s' without an argument; use
`--strings' instead.
`-t TYPE'
`--format=TYPE'
Select the format in which to output the file data. TYPE is a
string of one or more of the below type indicator characters. If
you include more than one type indicator character in a single TYPE
string, or use this option more than once, `od' writes one copy of
each output line using each of the data types that you specified,
in the order that you specified.
Adding a trailing "z" to any type specification appends a display
of the ASCII character representation of the printable characters
to the output line generated by the type specification.
`a'
named character
`c'
ASCII character or backslash escape,
`d'
signed decimal
`f'
floating point
`o'
octal
`u'
unsigned decimal
`x'
hexadecimal
The type `a' outputs things like `sp' for space, `nl' for newline,
and `nul' for a null (zero) byte. Type `c' outputs ` ', `\n', and
`\0', respectively.
Except for types `a' and `c', you can specify the number of bytes
to use in interpreting each number in the given data type by
following the type indicator character with a decimal integer.
Alternately, you can specify the size of one of the C compiler's
built-in data types by following the type indicator character with
one of the following characters. For integers (`d', `o', `u',
`x'):
`C'
char
`S'
short
`I'
int
`L'
long
For floating point (`f'):
F
float
D
double
L
long double
`-v'
`--output-duplicates'
Output consecutive lines that are identical. By default, when two
or more consecutive output lines would be identical, `od' outputs
only the first line, and puts just an asterisk on the following
line to indicate the elision.
`-w N'
`--width[=N]'
Dump `n' input bytes per output line. This must be a multiple of
the least common multiple of the sizes associated with the
specified output types.
If this option is not given at all, the default is 16. If N is
omitted with `--width', the default is 32. On older systems, GNU
`od' instead supports an obsolete option `-w[N]', where N also
defaults to 32. POSIX 1003.1-2001 (*note Standards conformance::)
does not allow `-w' without an argument; use `--width' instead.
The next several options are shorthands for format specifications.
GNU `od' accepts any combination of shorthands and format specification
options. These options accumulate.
`-a'
Output as named characters. Equivalent to `-ta'.
`-b'
Output as octal bytes. Equivalent to `-toC'.
`-c'
Output as ASCII characters or backslash escapes. Equivalent to
`-tc'.
`-d'
Output as unsigned decimal shorts. Equivalent to `-tu2'.
`-f'
Output as floats. Equivalent to `-tfF'.
`-h'
Output as hexadecimal shorts. Equivalent to `-tx2'.
`-i'
Output as decimal shorts. Equivalent to `-td2'.
`-l'
Output as decimal longs. Equivalent to `-td4'.
`-o'
Output as octal shorts. Equivalent to `-to2'.
`-x'
Output as hexadecimal shorts. Equivalent to `-tx2'.
`--traditional'
Recognize the non-option arguments that traditional `od' accepted.
The following syntax:
od --traditional [FILE] [[+]OFFSET[.][b] [[+]LABEL[.][b]]]
can be used to specify at most one file and optional arguments
specifying an offset and a pseudo-start address, LABEL. By
default, OFFSET is interpreted as an octal number specifying how
many input bytes to skip before formatting and writing. The
optional trailing decimal point forces the interpretation of
OFFSET as a decimal number. If no decimal is specified and the
offset begins with `0x' or `0X' it is interpreted as a hexadecimal
number. If there is a trailing `b', the number of bytes skipped
will be OFFSET multiplied by 512. The LABEL argument is
interpreted just like OFFSET, but it specifies an initial
pseudo-address. The pseudo-addresses are displayed in parentheses
following any normal address.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: Formatting file contents, Next: Output of parts of files, Prev: Output of entire files, Up: Top
Formatting file contents
************************
These commands reformat the contents of files.
* Menu:
* fmt invocation:: Reformat paragraph text.
* pr invocation:: Paginate or columnate files for printing.
* fold invocation:: Wrap input lines to fit in specified width.
File: coreutils.info, Node: fmt invocation, Next: pr invocation, Up: Formatting file contents
`fmt': Reformat paragraph text
==============================
`fmt' fills and joins lines to produce output lines of (at most) a
given number of characters (75 by default). Synopsis:
fmt [OPTION]... [FILE]...
`fmt' reads from the specified FILE arguments (or standard input if
none are given), and writes to standard output.
By default, blank lines, spaces between words, and indentation are
preserved in the output; successive input lines with different
indentation are not joined; tabs are expanded on input and introduced on
output.
`fmt' prefers breaking lines at the end of a sentence, and tries to
avoid line breaks after the first word of a sentence or before the last
word of a sentence. A "sentence break" is defined as either the end of
a paragraph or a word ending in any of `.?!', followed by two spaces or
end of line, ignoring any intervening parentheses or quotes. Like TeX,
`fmt' reads entire "paragraphs" before choosing line breaks; the
algorithm is a variant of that in "Breaking Paragraphs Into Lines"
(Donald E. Knuth and Michael F. Plass, `Software--Practice and
Experience', 11 (1981), 1119-1184).
The program accepts the following options. Also see *Note Common
options::.
`-C'
Install file, unless target already exists and is the same file,
in which case the modification time is not changed.
`-c'
`--crown-margin'
"Crown margin" mode: preserve the indentation of the first two
lines within a paragraph, and align the left margin of each
subsequent line with that of the second line.
`-t'
`--tagged-paragraph'
"Tagged paragraph" mode: like crown margin mode, except that if
indentation of the first line of a paragraph is the same as the
indentation of the second, the first line is treated as a one-line
paragraph.
`-s'
`--split-only'
Split lines only. Do not join short lines to form longer ones.
This prevents sample lines of code, and other such "formatted"
text from being unduly combined.
`-u'
`--uniform-spacing'
Uniform spacing. Reduce spacing between words to one space, and
spacing between sentences to two spaces.
`-WIDTH'
`-w WIDTH'
`--width=WIDTH'
Fill output lines up to WIDTH characters (default 75). `fmt'
initially tries to make lines about 7% shorter than this, to give
it room to balance line lengths.
`-p PREFIX'
`--prefix=PREFIX'
Only lines beginning with PREFIX (possibly preceded by whitespace)
are subject to formatting. The prefix and any preceding whitespace
are stripped for the formatting and then re-attached to each
formatted output line. One use is to format certain kinds of
program comments, while leaving the code unchanged.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: pr invocation, Next: fold invocation, Prev: fmt invocation, Up: Formatting file contents
`pr': Paginate or columnate files for printing
==============================================
`pr' writes each FILE (`-' means standard input), or standard input if
none are given, to standard output, paginating and optionally
outputting in multicolumn format; optionally merges all FILEs, printing
all in parallel, one per column. Synopsis:
pr [OPTION]... [FILE]...
By default, a 5-line header is printed at each page: two blank lines;
a line with the date, the filename, and the page count; and two more
blank lines. A footer of five blank lines is also printed. With the
`-F' option, a 3-line header is printed: the leading two blank lines are
omitted; no footer is used. The default PAGE_LENGTH in both cases is 66
lines. The default number of text lines changes from 56 (without `-F')
to 63 (with `-F'). The text line of the header takes the form `DATE
STRING PAGE', with spaces inserted around STRING so that the line takes
up the full PAGE_WIDTH. Here, DATE is the date (see the `-D' or
`--date-format' option for details), STRING is the centered header
string, and PAGE identifies the page number. The `LC_MESSAGES' locale
category affects the spelling of PAGE; in the default C locale, it is
`Page NUMBER' where NUMBER is the decimal page number.
Form feeds in the input cause page breaks in the output. Multiple
form feeds produce empty pages.
Columns are of equal width, separated by an optional string (default
is `space'). For multicolumn output, lines will always be truncated to
PAGE_WIDTH (default 72), unless you use the `-J' option. For single
column output no line truncation occurs by default. Use `-W' option to
truncate lines in that case.
The following changes were made in version 1.22i and apply to later
versions of `pr': - Brian
* Some small LETTER OPTIONS (`-s', `-w') have been redefined for
better POSIX compliance. The output of some further cases has
been adapted to other Unix systems. These changes are not
compatible with earlier versions of the program.
* Some NEW CAPITAL LETTER options (`-J', `-S', `-W') have been
introduced to turn off unexpected interferences of small letter
options. The `-N' option and the second argument LAST_PAGE of
`+FIRST_PAGE' offer more flexibility. The detailed handling of
form feeds set in the input files requires the `-T' option.
* Capital letter options override small letter ones.
* Some of the option-arguments (compare `-s', `-e', `-i', `-n')
cannot be specified as separate arguments from the preceding
option letter (already stated in the POSIX specification).
The program accepts the following options. Also see *Note Common
options::.
`+FIRST_PAGE[:LAST_PAGE]'
`--pages=FIRST_PAGE[:LAST_PAGE]'
Begin printing with page FIRST_PAGE and stop with LAST_PAGE.
Missing `:LAST_PAGE' implies end of file. While estimating the
number of skipped pages each form feed in the input file results
in a new page. Page counting with and without `+FIRST_PAGE' is
identical. By default, counting starts with the first page of
input file (not first page printed). Line numbering may be
altered by `-N' option.
`-COLUMN'
`--columns=COLUMN'
With each single FILE, produce COLUMN columns of output (default
is 1) and print columns down, unless `-a' is used. The column
width is automatically decreased as COLUMN increases; unless you
use the `-W/-w' option to increase PAGE_WIDTH as well. This
option might well cause some lines to be truncated. The number of
lines in the columns on each page are balanced. The options `-e'
and `-i' are on for multiple text-column output. Together with
`-J' option column alignment and line truncation is turned off.
Lines of full length are joined in a free field format and `-S'
option may set field separators. `-COLUMN' may not be used with
`-m' option.
`-a'
`--across'
With each single FILE, print columns across rather than down. The
`-COLUMN' option must be given with COLUMN greater than one. If a
line is too long to fit in a column, it is truncated.
`-c'
`--show-control-chars'
Print control characters using hat notation (e.g., `^G'); print
other nonprinting characters in octal backslash notation. By
default, nonprinting characters are not changed.
`-d'
`--double-space'
Double space the output.
`-D FORMAT'
`--date-format=FORMAT'
Format header dates using FORMAT, using the same conventions as
for the the command `date +FORMAT'; *Note date invocation::.
Except for directives, which start with `%', characters in FORMAT
are printed unchanged. You can use this option to specify an
arbitrary string in place of the header date, e.g.,
`--date-format="Monday morning"'.
If the `POSIXLY_CORRECT' environment variable is not set, the date
format defaults to `%Y-%m-%d %H:%M' (for example, `2001-12-04
23:59'); otherwise, the format depends on the `LC_TIME' locale
category, with the default being `%b %e %H:%M %Y' (for example,
`Dec 4 23:59 2001'.
`-e[IN-TABCHAR[IN-TABWIDTH]]'
`--expand-tabs[=IN-TABCHAR[IN-TABWIDTH]]'
Expand TABs to spaces on input. Optional argument IN-TABCHAR is
the input tab character (default is the TAB character). Second
optional argument IN-TABWIDTH is the input tab character's width
(default is 8).
`-f'
`-F'
`--form-feed'
Use a form feed instead of newlines to separate output pages. The
default page length of 66 lines is not altered. But the number of
lines of text per page changes from default 56 to 63 lines.
`-h HEADER'
`--header=HEADER'
Replace the filename in the header with the centered string HEADER.
When using the shell, HEADER should be quoted and should be
separated from `-h' by a space.
`-i[OUT-TABCHAR[OUT-TABWIDTH]]'
`--output-tabs[=OUT-TABCHAR[OUT-TABWIDTH]]'
Replace spaces with TABs on output. Optional argument OUT-TABCHAR
is the output tab character (default is the TAB character).
Second optional argument OUT-TABWIDTH is the output tab
character's width (default is 8).
`-J'
`--join-lines'
Merge lines of full length. Used together with the column options
`-COLUMN', `-a -COLUMN' or `-m'. Turns off `-W/-w' line
truncation; no column alignment used; may be used with
`--sep-string[=STRING]'. `-J' has been introduced (together with
`-W' and `--sep-string') to disentangle the old (POSIX-compliant)
options `-w' and `-s' along with the three column options.
`-l PAGE_LENGTH'
`--length=PAGE_LENGTH'
Set the page length to PAGE_LENGTH (default 66) lines, including
the lines of the header [and the footer]. If PAGE_LENGTH is less
than or equal to 10 (or <= 3 with `-F'), the header and footer are
omitted, and all form feeds set in input files are eliminated, as
if the `-T' option had been given.
`-m'
`--merge'
Merge and print all FILEs in parallel, one in each column. If a
line is too long to fit in a column, it is truncated, unless the
`-J' option is used. `--sep-string[=STRING]' may be used. Empty
pages in some FILEs (form feeds set) produce empty columns, still
marked by STRING. The result is a continuous line numbering and
column marking throughout the whole merged file. Completely empty
merged pages show no separators or line numbers. The default
header becomes `DATE PAGE' with spaces inserted in the middle; this
may be used with the `-h' or `--header' option to fill up the
middle blank part.
`-n[NUMBER-SEPARATOR[DIGITS]]'
`--number-lines[=NUMBER-SEPARATOR[DIGITS]]'
Provide DIGITS digit line numbering (default for DIGITS is 5).
With multicolumn output the number occupies the first DIGITS
column positions of each text column or only each line of `-m'
output. With single column output the number precedes each line
just as `-m' does. Default counting of the line numbers starts
with the first line of the input file (not the first line printed,
compare the `--page' option and `-N' option). Optional argument
NUMBER-SEPARATOR is the character appended to the line number to
separate it from the text followed. The default separator is the
TAB character. In a strict sense a TAB is always printed with
single column output only. The TAB-width varies with the
TAB-position, e.g. with the left MARGIN specified by `-o' option.
With multicolumn output priority is given to `equal width of
output columns' (a POSIX specification). The TAB-width is fixed
to the value of the first column and does not change with
different values of left MARGIN. That means a fixed number of
spaces is always printed in the place of the NUMBER-SEPARATOR TAB.
The tabification depends upon the output position.
`-N LINE_NUMBER'
`--first-line-number=LINE_NUMBER'
Start line counting with the number LINE_NUMBER at first line of
first page printed (in most cases not the first line of the input
file).
`-o MARGIN'
`--indent=MARGIN'
Indent each line with a margin MARGIN spaces wide (default is
zero). The total page width is the size of the margin plus the
PAGE_WIDTH set with the `-W/-w' option. A limited overflow may
occur with numbered single column output (compare `-n' option).
`-r'
`--no-file-warnings'
Do not print a warning message when an argument FILE cannot be
opened. (The exit status will still be nonzero, however.)
`-s[CHAR]'
`--separator[=CHAR]'
Separate columns by a single character CHAR. The default for CHAR
is the TAB character without `-w' and `no character' with `-w'.
Without `-s' the default separator `space' is set. `-s[char]'
turns off line truncation of all three column options
(`-COLUMN'|`-a -COLUMN'|`-m') unless `-w' is set. This is a
POSIX-compliant formulation.
`-S STRING'
`--sep-string[=STRING]'
Use STRING to separate output columns. The `-S' option doesn't
affect the `-W/-w' option, unlike the `-s' option which does. It
does not affect line truncation or column alignment. Without
`-S', and with `-J', `pr' uses the default output separator, TAB.
Without `-S' or `-J', `pr' uses a `space' (same as `-S" "'). With
`-SSTRING', STRING must be nonempty; `--sep-string' with no STRING
is equivalent to `--sep-string=""'.
On older systems, `pr' instead supports an obsolete option
`-S[STRING]', where STRING is optional. POSIX 1003.1-2001 (*note
Standards conformance::) does not allow this older usage. To
specify an empty STRING portably, use `--sep-string'.
`-t'
`--omit-header'
Do not print the usual header [and footer] on each page, and do
not fill out the bottom of pages (with blank lines or a form
feed). No page structure is produced, but form feeds set in the
input files are retained. The predefined pagination is not
changed. `-t' or `-T' may be useful together with other options;
e.g.: `-t -e4', expand TAB characters in the input file to 4
spaces but don't make any other changes. Use of `-t' overrides
`-h'.
`-T'
`--omit-pagination'
Do not print header [and footer]. In addition eliminate all form
feeds set in the input files.
`-v'
`--show-nonprinting'
Print nonprinting characters in octal backslash notation.
`-w PAGE_WIDTH'
`--width=PAGE_WIDTH'
Set page width to PAGE_WIDTH characters for multiple text-column
output only (default for PAGE_WIDTH is 72). `-s[CHAR]' turns off
the default page width and any line truncation and column
alignment. Lines of full length are merged, regardless of the
column options set. No PAGE_WIDTH setting is possible with single
column output. A POSIX-compliant formulation.
`-W PAGE_WIDTH'
`--page_width=PAGE_WIDTH'
Set the page width to PAGE_WIDTH characters. That's valid with and
without a column option. Text lines are truncated, unless `-J' is
used. Together with one of the three column options (`-COLUMN',
`-a -COLUMN' or `-m') column alignment is always used. The
separator options `-S' or `-s' don't affect the `-W' option.
Default is 72 characters. Without `-W PAGE_WIDTH' and without any
of the column options NO line truncation is used (defined to keep
downward compatibility and to meet most frequent tasks). That's
equivalent to `-W 72 -J'. The header line is never truncated.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: fold invocation, Prev: pr invocation, Up: Formatting file contents
`fold': Wrap input lines to fit in specified width
==================================================
`fold' writes each FILE (`-' means standard input), or standard input
if none are given, to standard output, breaking long lines. Synopsis:
fold [OPTION]... [FILE]...
By default, `fold' breaks lines wider than 80 columns. The output
is split into as many lines as necessary.
`fold' counts screen columns by default; thus, a tab may count more
than one column, backspace decreases the column count, and carriage
return sets the column to zero.
The program accepts the following options. Also see *Note Common
options::.
`-b'
`--bytes'
Count bytes rather than columns, so that tabs, backspaces, and
carriage returns are each counted as taking up one column, just
like other characters.
`-s'
`--spaces'
Break at word boundaries: the line is broken after the last blank
before the maximum line length. If the line contains no such
blanks, the line is broken at the maximum line length as usual.
`-w WIDTH'
`--width=WIDTH'
Use a maximum line length of WIDTH columns instead of 80.
On older systems, `fold' supports an obsolete option `-WIDTH'.
POSIX 1003.1-2001 (*note Standards conformance::) does not allow
this; use `-w WIDTH' instead.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: Output of parts of files, Next: Summarizing files, Prev: Formatting file contents, Up: Top
Output of parts of files
************************
These commands output pieces of the input.
* Menu:
* head invocation:: Output the first part of files.
* tail invocation:: Output the last part of files.
* split invocation:: Split a file into fixed-size pieces.
* csplit invocation:: Split a file into context-determined pieces.
File: coreutils.info, Node: head invocation, Next: tail invocation, Up: Output of parts of files
`head': Output the first part of files
======================================
`head' prints the first part (10 lines by default) of each FILE; it
reads from standard input if no files are given or when given a FILE of
`-'. Synopsis:
head [OPTION]... [FILE]...
If more than one FILE is specified, `head' prints a one-line header
consisting of
==> FILE NAME <==
before the output for each FILE.
The program accepts the following options. Also see *Note Common
options::.
`-c N'
`--bytes=N'
Print the first N bytes, instead of initial lines. Appending `b'
multiplies N by 512, `k' by 1024, and `m' by 1048576. However, if
N starts with a `-', print all but the last N bytes of each file.
`-n N'
`--lines=N'
Output the first N lines. However, if N starts with a `-', print
all but the last N lines of each file.
`-q'
`--quiet'
`--silent'
Never print file name headers.
`-v'
`--verbose'
Always print file name headers.
On older systems, `head' supports an obsolete option
`-COUNTOPTIONS', which is recognized only if it is specified first.
COUNT is a decimal number optionally followed by a size letter (`b',
`k', `m') as in `-c', or `l' to mean count by lines, or other option
letters (`cqv'). POSIX 1003.1-2001 (*note Standards conformance::)
does not allow this; use `-c COUNT' or `-n COUNT' instead.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: tail invocation, Next: split invocation, Prev: head invocation, Up: Output of parts of files
`tail': Output the last part of files
=====================================
`tail' prints the last part (10 lines by default) of each FILE; it
reads from standard input if no files are given or when given a FILE of
`-'. Synopsis:
tail [OPTION]... [FILE]...
If more than one FILE is specified, `tail' prints a one-line header
consisting of
==> FILE NAME <==
before the output for each FILE.
GNU `tail' can output any amount of data (some other versions of
`tail' cannot). It also has no `-r' option (print in reverse), since
reversing a file is really a different job from printing the end of a
file; BSD `tail' (which is the one with `-r') can only reverse files
that are at most as large as its buffer, which is typically 32 KiB. A
more reliable and versatile way to reverse files is the GNU `tac'
command.
If any option-argument is a number N starting with a `+', `tail'
begins printing with the Nth item from the start of each file, instead
of from the end.
The program accepts the following options. Also see *Note Common
options::.
`-c BYTES'
`--bytes=BYTES'
Output the last BYTES bytes, instead of final lines. Appending
`b' multiplies BYTES by 512, `k' by 1024, and `m' by 1048576.
`-f'
`--follow[=HOW]'
Loop forever trying to read more characters at the end of the file,
presumably because the file is growing. This option is ignored
when reading from a pipe. If more than one file is given, `tail'
prints a header whenever it gets output from a different file, to
indicate which file that output is from.
There are two ways to specify how you'd like to track files with
this option, but that difference is noticeable only when a
followed file is removed or renamed. If you'd like to continue to
track the end of a growing file even after it has been unlinked,
use `--follow=descriptor'. This is the default behavior, but it
is not useful if you're tracking a log file that may be rotated
(removed or renamed, then reopened). In that case, use
`--follow=name' to track the named file by reopening it
periodically to see if it has been removed and recreated by some
other program.
No matter which method you use, if the tracked file is determined
to have shrunk, `tail' prints a message saying the file has been
truncated and resumes tracking the end of the file from the
newly-determined endpoint.
When a file is removed, `tail''s behavior depends on whether it is
following the name or the descriptor. When following by name,
tail can detect that a file has been removed and gives a message
to that effect, and if `--retry' has been specified it will
continue checking periodically to see if the file reappears. When
following a descriptor, tail does not detect that the file has
been unlinked or renamed and issues no message; even though the
file may no longer be accessible via its original name, it may
still be growing.
The option values `descriptor' and `name' may be specified only
with the long form of the option, not with `-f'.
`-F'
This option is the same as `--follow=name --retry'. That is, tail
will attempt to reopen a file when it is removed. Should this
fail, tail will keep trying until it becomes accessible again.
`--retry'
This option is meaningful only when following by name. Without
this option, when tail encounters a file that doesn't exist or is
otherwise inaccessible, it reports that fact and never checks it
again.
`--sleep-interval=NUMBER'
Change the number of seconds to wait between iterations (the
default is 1.0). During one iteration, every specified file is
checked to see if it has changed size. Historical implementations
of `tail' have required that NUMBER be an integer. However, GNU
`tail' accepts an arbitrary floating point number (using a period
before any fractional digits).
`--pid=PID'
When following by name or by descriptor, you may specify the
process ID, PID, of the sole writer of all FILE arguments. Then,
shortly after that process terminates, tail will also terminate.
This will work properly only if the writer and the tailing process
are running on the same machine. For example, to save the output
of a build in a file and to watch the file grow, if you invoke
`make' and `tail' like this then the tail process will stop when
your build completes. Without this option, you would have had to
kill the `tail -f' process yourself.
$ make >& makerr & tail --pid=$! -f makerr
If you specify a PID that is not in use or that does not correspond
to the process that is writing to the tailed files, then `tail'
may terminate long before any FILEs stop growing or it may not
terminate until long after the real writer has terminated. Note
that `--pid' cannot be supported on some systems; `tail' will
print a warning if this is the case.
`--max-unchanged-stats=N'
When tailing a file by name, if there have been N (default
n=5) consecutive iterations for which the size has remained the
same, then `open'/`fstat' the file to determine if that file name
is still associated with the same device/inode-number pair as
before. When following a log file that is rotated, this is
approximately the number of seconds between when tail prints the
last pre-rotation lines and when it prints the lines that have
accumulated in the new log file. This option is meaningful only
when following by name.
`-n N'
`--lines=N'
Output the last N lines.
`-q'
`--quiet'
`--silent'
Never print file name headers.
`-v'
`--verbose'
Always print file name headers.
On older systems, `tail' supports an obsolete option
`-COUNTOPTIONS', which is recognized only if it is specified first.
COUNT is a decimal number optionally followed by a size letter (`b',
`k', `m') as in `-c', or `l' to mean count by lines, or other option
letters (`cfqv'). Some older `tail' implementations also support an
obsolete option `+COUNT' with the same meaning as `-+COUNT'. POSIX
1003.1-2001 (*note Standards conformance::) does not allow these
options; use `-c COUNT' or `-n COUNT' instead.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: split invocation, Next: csplit invocation, Prev: tail invocation, Up: Output of parts of files
`split': Split a file into fixed-size pieces
============================================
`split' creates output files containing consecutive sections of INPUT
(standard input if none is given or INPUT is `-'). Synopsis:
split [OPTION] [INPUT [PREFIX]]
By default, `split' puts 1000 lines of INPUT (or whatever is left
over for the last section), into each output file.
The output files' names consist of PREFIX (`x' by default) followed
by a group of characters (`aa', `ab', ... by default), such that
concatenating the output files in traditional sorted order by file name
produces the original input file. If the output file names are
exhausted, `split' reports an error without deleting the output files
that it did create.
The program accepts the following options. Also see *Note Common
options::.
`-a LENGTH'
`--suffix-length=LENGTH'
Use suffixes of length LENGTH. The default LENGTH is 2.
`-l LINES'
`--lines=LINES'
Put LINES lines of INPUT into each output file.
On older systems, `split' supports an obsolete option `-LINES'.
POSIX 1003.1-2001 (*note Standards conformance::) does not allow
this; use `-l LINES' instead.
`-b BYTES'
`--bytes=BYTES'
Put the first BYTES bytes of INPUT into each output file.
Appending `b' multiplies BYTES by 512, `k' by 1024, and `m' by
1048576.
`-C BYTES'
`--line-bytes=BYTES'
Put into each output file as many complete lines of INPUT as
possible without exceeding BYTES bytes. For lines longer than
BYTES bytes, put BYTES bytes into each output file until less than
BYTES bytes of the line are left, then continue normally. BYTES
has the same format as for the `--bytes' option.
`-d'
`--numeric-suffixes'
Use digits in suffixes rather than lower-case letters.
`--verbose'
Write a diagnostic to standard error just before each output file
is opened.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: csplit invocation, Prev: split invocation, Up: Output of parts of files
`csplit': Split a file into context-determined pieces
=====================================================
`csplit' creates zero or more output files containing sections of INPUT
(standard input if INPUT is `-'). Synopsis:
csplit [OPTION]... INPUT PATTERN...
The contents of the output files are determined by the PATTERN
arguments, as detailed below. An error occurs if a PATTERN argument
refers to a nonexistent line of the input file (e.g., if no remaining
line matches a given regular expression). After every PATTERN has been
matched, any remaining input is copied into one last output file.
By default, `csplit' prints the number of bytes written to each
output file after it has been created.
The types of pattern arguments are:
`N'
Create an output file containing the input up to but not including
line N (a positive integer). If followed by a repeat count, also
create an output file containing the next N lines of the input
file once for each repeat.
`/REGEXP/[OFFSET]'
Create an output file containing the current line up to (but not
including) the next line of the input file that contains a match
for REGEXP. The optional OFFSET is an integer. If it is given,
the input up to (but not including) the matching line plus or
minus OFFSET is put into the output file, and the line after that
begins the next section of input.
`%REGEXP%[OFFSET]'
Like the previous type, except that it does not create an output
file, so that section of the input file is effectively ignored.
`{REPEAT-COUNT}'
Repeat the previous pattern REPEAT-COUNT additional times.
REPEAT-COUNT can either be a positive integer or an asterisk,
meaning repeat as many times as necessary until the input is
exhausted.
The output files' names consist of a prefix (`xx' by default)
followed by a suffix. By default, the suffix is an ascending sequence
of two-digit decimal numbers from `00' to `99'. In any case,
concatenating the output files in sorted order by filename produces the
original input file.
By default, if `csplit' encounters an error or receives a hangup,
interrupt, quit, or terminate signal, it removes any output files that
it has created so far before it exits.
The program accepts the following options. Also see *Note Common
options::.
`-f PREFIX'
`--prefix=PREFIX'
Use PREFIX as the output file name prefix.
`-b SUFFIX'
`--suffix=SUFFIX'
Use SUFFIX as the output file name suffix. When this option is
specified, the suffix string must include exactly one
`printf(3)'-style conversion specification, possibly including
format specification flags, a field width, a precision
specifications, or all of these kinds of modifiers. The format
letter must convert a binary integer argument to readable form;
thus, only `d', `i', `u', `o', `x', and `X' conversions are
allowed. The entire SUFFIX is given (with the current output file
number) to `sprintf(3)' to form the file name suffixes for each of
the individual output files in turn. If this option is used, the
`--digits' option is ignored.
`-n DIGITS'
`--digits=DIGITS'
Use output file names containing numbers that are DIGITS digits
long instead of the default 2.
`-k'
`--keep-files'
Do not remove output files when errors are encountered.
`-z'
`--elide-empty-files'
Suppress the generation of zero-length output files. (In cases
where the section delimiters of the input file are supposed to
mark the first lines of each of the sections, the first output
file will generally be a zero-length file unless you use this
option.) The output file sequence numbers always run
consecutively starting from 0, even when this option is specified.
`-s'
`-q'
`--silent'
`--quiet'
Do not print counts of output file sizes.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: Summarizing files, Next: Operating on sorted files, Prev: Output of parts of files, Up: Top
Summarizing files
*****************
These commands generate just a few numbers representing entire contents
of files.
* Menu:
* wc invocation:: Print newline, word, and byte counts.
* sum invocation:: Print checksum and block counts.
* cksum invocation:: Print CRC checksum and byte counts.
* md5sum invocation:: Print or check message-digests.
File: coreutils.info, Node: wc invocation, Next: sum invocation, Up: Summarizing files
`wc': Print newline, word, and byte counts
==========================================
`wc' counts the number of bytes, characters, whitespace-separated
words, and newlines in each given FILE, or standard input if none are
given or for a FILE of `-'. Synopsis:
wc [OPTION]... [FILE]...
`wc' prints one line of counts for each file, and if the file was
given as an argument, it prints the file name following the counts. If
more than one FILE is given, `wc' prints a final line containing the
cumulative counts, with the file name `total'. The counts are printed
in this order: newlines, words, characters, bytes. Each count is
printed right-justified in a field with at least one space between
fields so that the numbers and file names normally line up nicely in
columns. The width of the count fields varies depending on the inputs,
so you should not depend on a particular field width. However, as a
GNU extension, if only one count is printed, it is guaranteed to be
printed without leading spaces.
By default, `wc' prints three counts: the newline, words, and byte
counts. Options can specify that only certain counts be printed.
Options do not undo others previously given, so
wc --bytes --words
prints both the byte counts and the word counts.
With the `--max-line-length' option, `wc' prints the length of the
longest line per file, and if there is more than one file it prints the
maximum (not the sum) of those lengths.
The program accepts the following options. Also see *Note Common
options::.
`-c'
`--bytes'
Print only the byte counts.
`-m'
`--chars'
Print only the character counts.
`-w'
`--words'
Print only the word counts.
`-l'
`--lines'
Print only the newline counts.
`-L'
`--max-line-length'
Print only the maximum line lengths.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: sum invocation, Next: cksum invocation, Prev: wc invocation, Up: Summarizing files
`sum': Print checksum and block counts
======================================
`sum' computes a 16-bit checksum for each given FILE, or standard input
if none are given or for a FILE of `-'. Synopsis:
sum [OPTION]... [FILE]...
`sum' prints the checksum for each FILE followed by the number of
blocks in the file (rounded up). If more than one FILE is given, file
names are also printed (by default). (With the `--sysv' option,
corresponding file names are printed when there is at least one file
argument.)
By default, GNU `sum' computes checksums using an algorithm
compatible with BSD `sum' and prints file sizes in units of 1024-byte
blocks.
The program accepts the following options. Also see *Note Common
options::.
`-r'
Use the default (BSD compatible) algorithm. This option is
included for compatibility with the System V `sum'. Unless `-s'
was also given, it has no effect.
`-s'
`--sysv'
Compute checksums using an algorithm compatible with System V
`sum''s default, and print file sizes in units of 512-byte blocks.
`sum' is provided for compatibility; the `cksum' program (see next
section) is preferable in new applications.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: cksum invocation, Next: md5sum invocation, Prev: sum invocation, Up: Summarizing files
`cksum': Print CRC checksum and byte counts
===========================================
`cksum' computes a cyclic redundancy check (CRC) checksum for each
given FILE, or standard input if none are given or for a FILE of `-'.
Synopsis:
cksum [OPTION]... [FILE]...
`cksum' prints the CRC checksum for each file along with the number
of bytes in the file, and the filename unless no arguments were given.
`cksum' is typically used to ensure that files transferred by
unreliable means (e.g., netnews) have not been corrupted, by comparing
the `cksum' output for the received files with the `cksum' output for
the original files (typically given in the distribution).
The CRC algorithm is specified by the POSIX standard. It is not
compatible with the BSD or System V `sum' algorithms (see the previous
section); it is more robust.
The only options are `--help' and `--version'. *Note Common
options::.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: md5sum invocation, Prev: cksum invocation, Up: Summarizing files
`md5sum': Print or check message-digests
========================================
`md5sum' computes a 128-bit checksum (or "fingerprint" or
"message-digest") for each specified FILE. If a FILE is specified as
`-' or if no files are given `md5sum' computes the checksum for the
standard input. `md5sum' can also determine whether a file and
checksum are consistent. Synopses:
md5sum [OPTION]... [FILE]...
md5sum [OPTION]... --check [FILE]
For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating
a binary or text input file, and the filename. If FILE is omitted or
specified as `-', standard input is read.
The program accepts the following options. Also see *Note Common
options::.
`-b'
`--binary'
Treat all input files as binary. This option has no effect on Unix
systems, since they don't distinguish between binary and text
files. This option is useful on systems that have different
internal and external character representations. On MS-DOS and
MS-Windows, this is the default.
`-c'
`--check'
Read filenames and checksum information from the single FILE (or
from stdin if no FILE was specified) and report whether each named
file and the corresponding checksum data are consistent. The
input to this mode of `md5sum' is usually the output of a prior,
checksum-generating run of `md5sum'. Each valid line of input
consists of an MD5 checksum, a binary/text flag, and then a
filename. Binary files are marked with `*', text with ` '. For
each such line, `md5sum' reads the named file and computes its MD5
checksum. Then, if the computed message digest does not match the
one on the line with the filename, the file is noted as having
failed the test. Otherwise, the file passes the test. By
default, for each valid line, one line is written to standard
output indicating whether the named file passed the test. After
all checks have been performed, if there were any failures, a
warning is issued to standard error. Use the `--status' option to
inhibit that output. If any listed file cannot be opened or read,
if any valid line has an MD5 checksum inconsistent with the
associated file, or if no valid line is found, `md5sum' exits with
nonzero status. Otherwise, it exits successfully.
`--status'
This option is useful only when verifying checksums. When
verifying checksums, don't generate the default one-line-per-file
diagnostic and don't output the warning summarizing any failures.
Failures to open or read a file still evoke individual diagnostics
to standard error. If all listed files are readable and are
consistent with the associated MD5 checksums, exit successfully.
Otherwise exit with a status code indicating there was a failure.
`-t'
`--text'
Treat all input files as text files. This is the reverse of
`--binary'.
`-w'
`--warn'
When verifying checksums, warn about improperly formatted MD5
checksum lines. This option is useful only if all but a few lines
in the checked input are valid.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: Operating on sorted files, Next: Operating on fields within a line, Prev: Summarizing files, Up: Top
Operating on sorted files
*************************
These commands work with (or produce) sorted files.
* Menu:
* sort invocation:: Sort text files.
* uniq invocation:: Uniquify files.
* comm invocation:: Compare two sorted files line by line.
* ptx invocation:: Produce a permuted index of file contents.
* tsort invocation:: Topological sort.
* tsort background:: Where tsort came from.
File: coreutils.info, Node: sort invocation, Next: uniq invocation, Up: Operating on sorted files
`sort': Sort text files
=======================
`sort' sorts, merges, or compares all the lines from the given files,
or standard input if none are given or for a FILE of `-'. By default,
`sort' writes the results to standard output. Synopsis:
sort [OPTION]... [FILE]...
`sort' has three modes of operation: sort (the default), merge, and
check for sortedness. The following options change the operation mode:
`-c'
`--check'
Check whether the given files are already sorted: if they are not
all sorted, print an error message and exit with a status of 1.
Otherwise, exit successfully.
`-m'
`--merge'
Merge the given files by sorting them as a group. Each input file
must always be individually sorted. It always works to sort
instead of merge; merging is provided because it is faster, in the
case where it works.
A pair of lines is compared as follows: if any key fields have been
specified, `sort' compares each pair of fields, in the order specified
on the command line, according to the associated ordering options,
until a difference is found or no fields are left. Unless otherwise
specified, all comparisons use the character collating sequence
specified by the `LC_COLLATE' locale. (1)
If any of the global options `bdfgiMnr' are given but no key fields
are specified, `sort' compares the entire lines according to the global
options.
Finally, as a last resort when all keys compare equal (or if no
ordering options were specified at all), `sort' compares the entire
lines. The last resort comparison honors the `--reverse' (`-r') global
option. The `--stable' (`-s') option disables this last-resort
comparison so that lines in which all fields compare equal are left in
their original relative order. If no fields or global options are
specified, `--stable' (`-s') has no effect.
GNU `sort' (as specified for all GNU utilities) has no limit on
input line length or restrictions on bytes allowed within lines. In
addition, if the final byte of an input file is not a newline, GNU
`sort' silently supplies one. A line's trailing newline is not part of
the line for comparison purposes.
Exit status:
0 if no error occurred
1 if invoked with `-c' and the input is not properly sorted
2 if an error occurred
If the environment variable `TMPDIR' is set, `sort' uses its value
as the directory for temporary files instead of `/tmp'. The
`--temporary-directory' (`-T') option in turn overrides the environment
variable.
The following options affect the ordering of output lines. They may
be specified globally or as part of a specific key field. If no key
fields are specified, global options apply to comparison of entire
lines; otherwise the global options are inherited by key fields that do
not specify any special options of their own. In pre-POSIX versions of
`sort', global options affect only later key fields, so portable shell
scripts should specify global options first.
`-b'
`--ignore-leading-blanks'
Ignore leading blanks when finding sort keys in each line. The
`LC_CTYPE' locale determines character types.
`-d'
`--dictionary-order'
Sort in "phone directory" order: ignore all characters except
letters, digits and blanks when sorting. The `LC_CTYPE' locale
determines character types.
`-f'
`--ignore-case'
Fold lowercase characters into the equivalent uppercase characters
when comparing so that, for example, `b' and `B' sort as equal.
The `LC_CTYPE' locale determines character types.
`-g'
`--general-numeric-sort'
Sort numerically, using the standard C function `strtod' to convert
a prefix of each line to a double-precision floating point number.
This allows floating point numbers to be specified in scientific
notation, like `1.0e-34' and `10e100'. The `LC_NUMERIC' locale
determines the decimal-point character. Do not report overflow,
underflow, or conversion errors. Use the following collating
sequence:
* Lines that do not start with numbers (all considered to be
equal).
* NaNs ("Not a Number" values, in IEEE floating point
arithmetic) in a consistent but machine-dependent order.
* Minus infinity.
* Finite numbers in ascending numeric order (with -0 and +0
equal).
* Plus infinity.
Use this option only if there is no alternative; it is much slower
than `--numeric-sort' (`-n') and it can lose information when
converting to floating point.
`-i'
`--ignore-nonprinting'
Ignore nonprinting characters. The `LC_CTYPE' locale determines
character types. This option has no effect if the stronger
`--dictionary-order' (`-d') option is also given.
`-M'
`--month-sort'
An initial string, consisting of any amount of blanks, followed by
a month name abbreviation, is folded to UPPER case and compared in
the order `JAN' < `FEB' < ... < `DEC'. Invalid names compare low
to valid names. The `LC_TIME' locale category determines the
month spellings.
`-n'
`--numeric-sort'
Sort numerically: the number begins each line; specifically, it
consists of optional blanks, an optional `-' sign, and zero or more
digits possibly separated by thousands separators, optionally
followed by a decimal-point character and zero or more digits.
The `LC_NUMERIC' locale specifies the decimal-point character and
thousands separator.
Numeric sort uses what might be considered an unconventional
method to compare strings representing floating point numbers.
Rather than first converting each string to the C `double' type
and then comparing those values, `sort' aligns the decimal-point
characters in the two strings and compares the strings a character
at a time. One benefit of using this approach is its speed. In
practice this is much more efficient than performing the two
corresponding string-to-double (or even string-to-integer)
conversions and then comparing doubles. In addition, there is no
corresponding loss of precision. Converting each string to
`double' before comparison would limit precision to about 16
digits on most systems.
Neither a leading `+' nor exponential notation is recognized. To
compare such strings numerically, use the `--general-numeric-sort'
(`-g') option.
`-r'
`--reverse'
Reverse the result of comparison, so that lines with greater key
values appear earlier in the output instead of later.
Other options are:
`-o OUTPUT-FILE'
`--output=OUTPUT-FILE'
Write output to OUTPUT-FILE instead of standard output. If
necessary, `sort' reads input before opening OUTPUT-FILE, so you
can safely sort a file in place by using commands like `sort -o F
F' and `cat F | sort -o F'.
On newer systems, `-o' cannot appear after an input file if
`POSIXLY_CORRECT' is set, e.g., `sort F -o F'. Portable scripts
should specify `-o OUTPUT-FILE' before any input files.
`-s'
`--stable'
Make `sort' stable by disabling the last-resort comparison that is
performed in some cases. By default, when lines compare equal
based on command line options that affect ordering, those lines
are ordered using a "last-resort comparison" that takes the entire
line as the key and acts as if no ordering options were specified.
But if `--reverse' (`-r') was specified along with other ordering
options, then the last-resort comparison does use `--reverse'. In
any case, when no ordering option is specified or when only
`--reverse' is specified, the last-resort comparison is not
performed
`-S SIZE'
`--buffer-size=SIZE'
Use a main-memory sort buffer of the given SIZE. By default, SIZE
is in units of 1024 bytes. Appending `%' causes SIZE to be
interpreted as a percentage of physical memory. Appending `K'
multiplies SIZE by 1024 (the default), `M' by 1,048,576, `G' by
1,073,741,824, and so on for `T', `P', `E', `Z', and `Y'.
Appending `b' causes SIZE to be interpreted as a byte count, with
no multiplication.
This option can improve the performance of `sort' by causing it to
start with a larger or smaller sort buffer than the default.
However, this option affects only the initial buffer size. The
buffer grows beyond SIZE if `sort' encounters input lines larger
than SIZE.
`-t SEPARATOR'
`--field-separator=SEPARATOR'
Use character SEPARATOR as the field separator when finding the
sort keys in each line. By default, fields are separated by the
empty string between a non-blank character and a blank character.
That is, given the input line ` foo bar', `sort' breaks it into
fields ` foo' and ` bar'. The field separator is not considered
to be part of either the field preceding or the field following.
But note that sort fields that extend to the end of the line, as
`-k 2', or sort fields consisting of a range, as `-k 2,3', retain
the field separators present between the endpoints of the range.
To specify a zero byte (ASCII NUL (Null) character) as the field
separator, use the two-character string `\0', e.g., `sort -t '\0''.
`-T TEMPDIR'
`--temporary-directory=TEMPDIR'
Use directory TEMPDIR to store temporary files, overriding the
`TMPDIR' environment variable. If this option is given more than
once, temporary files are stored in all the directories given. If
you have a large sort or merge that is I/O-bound, you can often
improve performance by using this option to specify directories on
different disks and controllers.
`-u'
`--unique'
Normally, output only the first of a sequence of lines that compare
equal. For the `--check' (`-c') option, check that no pair of
consecutive lines compares equal.
`-k POS1[,POS2]'
`--key=POS1[,POS2]'
Specify a sort field that consists of the part of the line between
POS1 and POS2 (or the end of the line, if POS2 is omitted),
_inclusive_. Fields and character positions are numbered starting
with 1. So to sort on the second field, you'd use `--key=2,2'
(`-k 2,2'). See below for more examples.
`-z'
`--zero-terminated'
Treat the input as a set of lines, each terminated by a zero byte
(ASCII NUL (Null) character) instead of an ASCII LF (Line Feed).
This option can be useful in conjunction with `perl -0' or `find
-print0' and `xargs -0' which do the same in order to reliably
handle arbitrary pathnames (even those which contain Line Feed
characters.)
Historical (BSD and System V) implementations of `sort' have
differed in their interpretation of some options, particularly `-b',
`-f', and `-n'. GNU sort follows the POSIX behavior, which is usually
(but not always!) like the System V behavior. According to POSIX, `-n'
no longer implies `-b'. For consistency, `-M' has been changed in the
same way. This may affect the meaning of character positions in field
specifications in obscure cases. The only fix is to add an explicit
`-b'.
A position in a sort field specified with the `-k' option has the
form `F.C', where F is the number of the field to use and C is the
number of the first character from the beginning of the field. In a
start position, an omitted `.C' stands for the field's first character.
In an end position, an omitted or zero `.C' stands for the field's
last character. If the `-b' option was specified, the `.C' part of a
field specification is counted from the first nonblank character of the
field.
A sort key position may also have any of the option letters `Mbdfinr'
appended to it, in which case the global ordering options are not used
for that particular field. The `-b' option may be independently
attached to either or both of the start and end positions of a field
specification, and if it is inherited from the global options it will
be attached to both. Keys may span multiple fields.
On older systems, `sort' supports an obsolete origin-zero syntax
`+POS1 [-POS2]' for specifying sort keys. POSIX 1003.1-2001 (*note
Standards conformance::) does not allow this; use `-k' instead.
Here are some examples to illustrate various combinations of options.
* Sort in descending (reverse) numeric order.
sort -nr
* Sort alphabetically, omitting the first and second fields. This
uses a single key composed of the characters beginning at the
start of field three and extending to the end of each line.
sort -k 3
* Sort numerically on the second field and resolve ties by sorting
alphabetically on the third and fourth characters of field five.
Use `:' as the field delimiter.
sort -t : -k 2,2n -k 5.3,5.4
Note that if you had written `-k 2' instead of `-k 2,2' `sort'
would have used all characters beginning in the second field and
extending to the end of the line as the primary _numeric_ key.
For the large majority of applications, treating keys spanning
more than one field as numeric will not do what you expect.
Also note that the `n' modifier was applied to the field-end
specifier for the first key. It would have been equivalent to
specify `-k 2n,2' or `-k 2n,2n'. All modifiers except `b' apply
to the associated _field_, regardless of whether the modifier
character is attached to the field-start and/or the field-end part
of the key specifier.
* Sort the password file on the fifth field and ignore any leading
blanks. Sort lines with equal values in field five on the numeric
user ID in field three.
sort -t : -k 5b,5 -k 3,3n /etc/passwd
An alternative is to use the global numeric modifier `-n'.
sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
* Generate a tags file in case-insensitive sorted order.
find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append
The use of `-print0', `-z', and `-0' in this case means that
pathnames that contain Line Feed characters will not get broken up
by the sort operation.
Finally, to ignore both leading and trailing blanks, you could
have applied the `b' modifier to the field-end specifier for the
first key,
sort -t : -n -k 5b,5b -k 3,3 /etc/passwd
or by using the global `-b' modifier instead of `-n' and an
explicit `n' with the second key specifier.
sort -t : -b -k 5,5 -k 3,3n /etc/passwd
---------- Footnotes ----------
(1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to. In that case, set the `LC_ALL' environment
variable to `C'. Note that setting only `LC_COLLATE' has two problems.
First, it is ineffective if `LC_ALL' is also set. Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is
set to an incompatible value. For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.
File: coreutils.info, Node: uniq invocation, Next: comm invocation, Prev: sort invocation, Up: Operating on sorted files
`uniq': Uniquify files
======================
`uniq' writes the unique lines in the given `input', or standard input
if nothing is given or for an INPUT name of `-'. Synopsis:
uniq [OPTION]... [INPUT [OUTPUT]]
By default, `uniq' prints its input lines, except that it discards
all but the first of adjacent repeated lines, so that no output lines
are repeated. Optionally, it can instead discard lines that are not
repeated, or all repeated lines.
The input need not be sorted, but repeated input lines are detected
only if they are adjacent. If you want to discard non-adjacent
duplicate lines, perhaps you want to use `sort -u'.
Comparisons use the character collating sequence specified by the
`LC_COLLATE' locale category.
If no OUTPUT file is specified, `uniq' writes to standard output.
The program accepts the following options. Also see *Note Common
options::.
`-f N'
`--skip-fields=N'
Skip N fields on each line before checking for uniqueness. Use a
null string for comparison if a line has fewer than N fields.
Fields are sequences of non-space non-tab characters that are
separated from each other by at least one space or tab.
On older systems, `uniq' supports an obsolete option `-N'. POSIX
1003.1-2001 (*note Standards conformance::) does not allow this;
use `-f N' instead.
`-s N'
`--skip-chars=N'
Skip N characters before checking for uniqueness. Use a null
string for comparison if a line has fewer than N characters. If
you use both the field and character skipping options, fields are
skipped over first.
On older systems, `uniq' supports an obsolete option `+N'. POSIX
1003.1-2001 (*note Standards conformance::) does not allow this;
use `-s N' instead.
`-c'
`--count'
Print the number of times each line occurred along with the line.
`-i'
`--ignore-case'
Ignore differences in case when comparing lines.
`-d'
`--repeated'
Discard lines that are not repeated. When used by itself, this
option causes `uniq' to print the first copy of each repeated line,
and nothing else.
`-D'
`--all-repeated[=DELIMIT-METHOD]'
Do not discard the second and subsequent repeated input lines, but
discard lines that are not repeated. This option is useful mainly
in conjunction with other options e.g., to ignore case or to
compare only selected fields. The optional DELIMIT-METHOD tells
how to delimit groups of repeated lines, and must be one of the
following:
`none'
Do not delimit groups of repeated lines. This is equivalent
to `--all-repeated' (`-D').
`prepend'
Output a newline before each group of repeated lines.
`separate'
Separate groups of repeated lines with a single newline.
This is the same as using `prepend', except that there is no
newline before the first group, and hence may be better
suited for output direct to users.
Note that when groups are delimited and the input stream contains
two or more consecutive blank lines, then the output is ambiguous.
To avoid that, filter the input through `tr -s '\n'' to replace
each sequence of consecutive newlines with a single newline.
This is a GNU extension.
`-u'
`--unique'
Discard the first repeated line. When used by itself, this option
causes `uniq' to print unique lines, and nothing else.
`-w N'
`--check-chars=N'
Compare at most N characters on each line (after skipping any
specified fields and characters). By default the entire rest of
the lines are compared.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: comm invocation, Next: ptx invocation, Prev: uniq invocation, Up: Operating on sorted files
`comm': Compare two sorted files line by line
=============================================
`comm' writes to standard output lines that are common, and lines that
are unique, to two input files; a file name of `-' means standard
input. Synopsis:
comm [OPTION]... FILE1 FILE2
Before `comm' can be used, the input files must be sorted using the
collating sequence specified by the `LC_COLLATE' locale. If an input
file ends in a non-newline character, a newline is silently appended.
The `sort' command with no options always outputs a file that is
suitable input to `comm'.
With no options, `comm' produces three-column output. Column one
contains lines unique to FILE1, column two contains lines unique to
FILE2, and column three contains lines common to both files. Columns
are separated by a single TAB character.
The options `-1', `-2', and `-3' suppress printing of the
corresponding columns. Also see *Note Common options::.
Unlike some other comparison utilities, `comm' has an exit status
that does not depend on the result of the comparison. Upon normal
completion `comm' produces an exit code of zero. If there is an error
it exits with nonzero status.
File: coreutils.info, Node: ptx invocation, Next: tsort invocation, Prev: comm invocation, Up: Operating on sorted files
`ptx': Produce permuted indexes
===============================
`ptx' reads a text file and essentially produces a permuted index, with
each keyword in its context. The calling sketch is either one of:
ptx [OPTION ...] [FILE ...]
ptx -G [OPTION ...] [INPUT [OUTPUT]]
The `-G' (or its equivalent: `--traditional') option disables all
GNU extensions and reverts to traditional mode, thus introducing some
limitations and changing several of the program's default option values.
When `-G' is not specified, GNU extensions are always enabled. GNU
extensions to `ptx' are documented wherever appropriate in this
document. For the full list, see *Note Compatibility in ptx::.
Individual options are explained in the following sections.
When GNU extensions are enabled, there may be zero, one or several
FILEs after the options. If there is no FILE, the program reads the
standard input. If there is one or several FILEs, they give the name
of input files which are all read in turn, as if all the input files
were concatenated. However, there is a full contextual break between
each file and, when automatic referencing is requested, file names and
line numbers refer to individual text input files. In all cases, the
program outputs the permuted index to the standard output.
When GNU extensions are _not_ enabled, that is, when the program
operates in traditional mode, there may be zero, one or two parameters
besides the options. If there are no parameters, the program reads the
standard input and outputs the permuted index to the standard output.
If there is only one parameter, it names the text INPUT to be read
instead of the standard input. If two parameters are given, they give
respectively the name of the INPUT file to read and the name of the
OUTPUT file to produce. _Be very careful_ to note that, in this case,
the contents of file given by the second parameter is destroyed. This
behavior is dictated by System V `ptx' compatibility; GNU Standards
normally discourage output parameters not introduced by an option.
Note that for _any_ file named as the value of an option or as an
input text file, a single dash `-' may be used, in which case standard
input is assumed. However, it would not make sense to use this
convention more than once per program invocation.
* Menu:
* General options in ptx:: Options which affect general program behavior.
* Charset selection in ptx:: Underlying character set considerations.
* Input processing in ptx:: Input fields, contexts, and keyword selection.
* Output formatting in ptx:: Types of output format, and sizing the fields.
* Compatibility in ptx::
File: coreutils.info, Node: General options in ptx, Next: Charset selection in ptx, Up: ptx invocation
General options
---------------
`-C'
`--copyright'
Print a short note about the copyright and copying conditions, then
exit without further processing.
`-G'
`--traditional'
As already explained, this option disables all GNU extensions to
`ptx' and switches to traditional mode.
`--help'
Print a short help on standard output, then exit without further
processing.
`--version'
Print the program version on standard output, then exit without
further processing.
An exit status of zero indicates success, and a nonzero value
indicates failure.
File: coreutils.info, Node: Charset selection in ptx, Next: Input processing in ptx, Prev: General options in ptx, Up: ptx invocation
Charset selection
-----------------
As it is set up now, the program assumes that the input file is coded
using 8-bit ISO 8859-1 code, also known as Latin-1 character set,
_unless_ it is compiled for MS-DOS, in which case it uses the character
set of the IBM-PC. (GNU `ptx' is not known to work on smaller MS-DOS
machines anymore.) Compared to 7-bit ASCII, the set of characters
which are letters is different; this alters the behavior of regular
expression matching. Thus, the default regular expression for a
keyword allows foreign or diacriticized letters. Keyword sorting,
however, is still crude; it obeys the underlying character set ordering
quite blindly.
`-f'
`--ignore-case'
Fold lower case letters to upper case for sorting.
File: coreutils.info, Node: Input processing in ptx, Next: Output formatting in ptx, Prev: Charset selection in ptx, Up: ptx invocation
Word selection and input processing
-----------------------------------
`-b FILE'
`--break-file=FILE'
This option provides an alternative (to `-W') method of describing
which characters make up words. It introduces the name of a file
which contains a list of characters which can_not_ be part of one
word; this file is called the "Break file". Any character which
is not part of the Break file is a word constituent. If both
options `-b' and `-W' are specified, then `-W' has precedence and
`-b' is ignored.
When GNU extensions are enabled, the only way to avoid newline as a
break character is to write all the break characters in the file
with no newline at all, not even at the end of the file. When GNU
extensions are disabled, spaces, tabs and newlines are always
considered as break characters even if not included in the Break
file.
`-i FILE'
`--ignore-file=FILE'
The file associated with this option contains a list of words
which will never be taken as keywords in concordance output. It
is called the "Ignore file". The file contains exactly one word
in each line; the end of line separation of words is not subject
to the value of the `-S' option.
There is a default Ignore file used by `ptx' when this option is
not specified, usually found in `/usr/local/lib/eign' if this has
not been changed at installation time. If you want to deactivate
the default Ignore file, specify `/dev/null' instead.
`-o FILE'
`--only-file=FILE'
The file associated with this option contains a list of words
which will be retained in concordance output; any word not
mentioned in this file is ignored. The file is called the "Only
file". The file contains exactly one word in each line; the end
of line separation of words is not subject to the value of the
`-S' option.
There is no default for the Only file. When both an Only file and
an Ignore file are specified, a word is considered a keyword only
if it is listed in the Only file and not in the Ignore file.
`-r'
`--references'
On each input line, the leading sequence of non-white space
characters will be taken to be a reference that has the purpose of
identifying this input line in the resulting permuted index. For
more information about reference production, see *Note Output
formatting in ptx::. Using this option changes the default value
for option `-S'.
Using this option, the program does not try very hard to remove
references from contexts in output, but it succeeds in doing so
_when_ the context ends exactly at the newline. If option `-r' is
used with `-S' default value, or when GNU extensions are disabled,
this condition is always met and references are completely
excluded from the output contexts.
`-S REGEXP'
`--sentence-regexp=REGEXP'
This option selects which regular expression will describe the end
of a line or the end of a sentence. In fact, this regular
expression is not the only distinction between end of lines or end
of sentences, and input line boundaries have no special
significance outside this option. By default, when GNU extensions
are enabled and if `-r' option is not used, end of sentences are
used. In this case, this REGEX is imported from GNU Emacs:
[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*
Whenever GNU extensions are disabled or if `-r' option is used, end
of lines are used; in this case, the default REGEXP is just:
\n
Using an empty REGEXP is equivalent to completely disabling end of
line or end of sentence recognition. In this case, the whole file
is considered to be a single big line or sentence. The user might
want to disallow all truncation flag generation as well, through
option `-F ""'. *Note Syntax of Regular Expressions:
(emacs)Regexps.
When the keywords happen to be near the beginning of the input
line or sentence, this often creates an unused area at the
beginning of the output context line; when the keywords happen to
be near the end of the input line or sentence, this often creates
an unused area at the end of the output context line. The program
tries to fill those unused areas by wrapping around context in
them; the tail of the input line or sentence is used to fill the
unused area on the left of the output line; the head of the input
line or sentence is used to fill the unused area on the right of
the output line.
As a matter of convenience to the user, many usual backslashed
escape sequences from the C language are recognized and converted
to the corresponding characters by `ptx' itself.
`-W REGEXP'
`--word-regexp=REGEXP'
This option selects which regular expression will describe each
keyword. By default, if GNU extensions are enabled, a word is a
sequence of letters; the REGEXP used is `\w+'. When GNU
extensions are disabled, a word is by default anything which ends
with a space, a tab or a newline; the REGEXP used is `[^ \t\n]+'.
An empty REGEXP is equivalent to not using this option. *Note
Syntax of Regular Expressions: (emacs)Regexps.
As a matter of convenience to the user, many usual backslashed
escape sequences, as found in the C language, are recognized and
converted to the corresponding characters by `ptx' itself.
File: coreutils.info, Node: Output formatting in ptx, Next: Compatibility in ptx, Prev: Input processing in ptx, Up: ptx invocation
Output formatting
-----------------
Output format is mainly controlled by the `-O' and `-T' options
described in the table below. When neither `-O' nor `-T' are selected,
and if GNU extensions are enabled, the program chooses an output format
suitable for a dumb terminal. Each keyword occurrence is output to the
center of one line, surrounded by its left and right contexts. Each
field is properly justified, so the concordance output can be readily
observed. As a special feature, if automatic references are selected
by option `-A' and are output before the left context, that is, if
option `-R' is _not_ selected, then a colon is added after the
reference; this nicely interfaces with GNU Emacs `next-error'
processing. In this default output format, each white space character,
like newline and tab, is merely changed to exactly one space, with no
special attempt to compress consecutive spaces. This might change in
the future. Except for those white space characters, every other
character of the underlying set of 256 characters is transmitted
verbatim.
Output format is further controlled by the following options.
`-g NUMBER'
`--gap-size=NUMBER'
Select the size of the minimum white space gap between the fields
on the output line.
`-w NUMBER'
`--width=NUMBER'
Select the maximum output width of each final line. If references
are used, they are included or excluded from the maximum output
width depending on the value of option `-R'. If this option is not
selected, that is, when references are output before the left
context, the maximum output width takes into account the maximum
length of all references. If this option is selected, that is,
when references are output after the right context, the maximum
output width does not take into account the space taken by
references, nor the gap that precedes them.
`-A'
`--auto-reference'
Select automatic references. Each input line will have an
automatic reference made up of the file name and the line ordinal,
with a single colon between them. However, the file name will be
empty when standard input is being read. If both `-A' and `-r'
are selected, then the input reference is still read and skipped,
but the automatic reference is used at output time, overriding the
input reference.
`-R'
`--right-side-refs'
In the default output format, when option `-R' is not used, any
references produced by the effect of options `-r' or `-A' are
placed to the far right of output lines, after the right context.
With default output format, when the `-R' option is specified,
references are rather placed at the beginning of each output line,
before the left context. For any other output format, option `-R'
is ignored, with one exception: with `-R' the width of references
is _not_ taken into account in total output width given by `-w'.
This option is automatically selected whenever GNU extensions are
disabled.
`-F STRING'
`--flac-truncation=STRING'
This option will request that any truncation in the output be
reported using the string STRING. Most output fields
theoretically extend towards the beginning or the end of the
current line, or current sentence, as selected with option `-S'.
But there is a maximum allowed output line width, changeable
through option `-w', which is further divided into space for
various output fields. When a field has to be truncated because
it cannot extend beyond the beginning or the end of the current
line to fit in, then a truncation occurs. By default, the string
used is a single slash, as in `-F /'.
STRING may have more than one character, as in `-F ...'. Also, in
the particular case when STRING is empty (`-F ""'), truncation
flagging is disabled, and no truncation marks are appended in this
case.
As a matter of convenience to the user, many usual backslashed
escape sequences, as found in the C language, are recognized and
converted to the corresponding characters by `ptx' itself.
`-M STRING'
`--macro-name=STRING'
Select another STRING to be used instead of `xx', while generating
output suitable for `nroff', `troff' or TeX.
`-O'
`--format=roff'
Choose an output format suitable for `nroff' or `troff'
processing. Each output line will look like:
.xx "TAIL" "BEFORE" "KEYWORD_AND_AFTER" "HEAD" "REF"
so it will be possible to write a `.xx' roff macro to take care of
the output typesetting. This is the default output format when GNU
extensions are disabled. Option `-M' can be used to change `xx'
to another macro name.
In this output format, each non-graphical character, like newline
and tab, is merely changed to exactly one space, with no special
attempt to compress consecutive spaces. Each quote character: `"'
is doubled so it will be correctly processed by `nroff' or `troff'.
`-T'
`--format=tex'
Choose an output format suitable for TeX processing. Each output
line will look like:
\xx {TAIL}{BEFORE}{KEYWORD}{AFTER}{HEAD}{REF}
so it will be possible to write a `\xx' definition to take care of
the output typesetting. Note that when references are not being
produced, that is, neither option `-A' nor option `-r' is
selected, the last parameter of each `\xx' call is inhibited.
Option `-M' can be used to change `xx' to another macro name.
In this output format, some special characters, like `$', `%',
`&', `#' and `_' are automatically protected with a backslash.
Curly brackets `{', `}' are protected with a backslash and a pair
of dollar signs (to force mathematical mode). The backslash
itself produces the sequence `\backslash{}'. Circumflex and tilde
diacritical marks produce the sequence `^\{ }' and `~\{ }'
respectively. Other diacriticized characters of the underlying
character set produce an appropriate TeX sequence as far as
possible. The other non-graphical characters, like newline and
tab, and all other characters which are not part of ASCII, are
merely changed to exactly one space, with no special attempt to
compress consecutive spaces. Let me know how to improve this
special character processing for TeX.
File: coreutils.info, Node: Compatibility in ptx, Prev: Output formatting in ptx, Up: ptx invocation
The GNU extensions to `ptx'
---------------------------
This version of `ptx' contains a few features which do not exist in
System V `ptx'. These extra features are suppressed by using the `-G'
command line option, unless overridden by other command line options.
Some GNU extensions cannot be recovered by overriding, so the simple
rule is to avoid `-G' if you care about GNU extensions. Here are the
differences between this program and System V `ptx'.
* This program can read many input files at once, it always writes
the resulting concordance on standard output. On the other hand,
System V `ptx' reads only one file and sends the result to
standard output or, if a second FILE parameter is given on the
command, to that FILE.
Having output parameters not introduced by options is a dangerous
practice which GNU avoids as far as possible. So, for using `ptx'
portably between GNU and System V, you should always use it with a
single input file, and always expect the result on standard
output. You might also want to automatically configure in a `-G'
option to `ptx' calls in products using `ptx', if the configurator
finds that the installed `ptx' accepts `-G'.
* The only options available in System V `ptx' are options `-b',
`-f', `-g', `-i', `-o', `-r', `-t' and `-w'. All other options
are GNU extensions and are not repeated in this enumeration.
Moreover, some options have a slightly different meaning when GNU
extensions are enabled, as explained below.
* By default, concordance output is not formatted for `troff' or
`nroff'. It is rather formatted for a dumb terminal. `troff' or
`nroff' output may still be selected through option `-O'.
* Unless `-R' option is used, the maximum reference width is
subtracted from the total output line width. With GNU extensions
disabled, width of references is not taken into account in the
output line width computations.
* All 256 characters, even `NUL's, are always read and processed from
input file with no adverse effect, even if GNU extensions are
disabled. However, System V `ptx' does not accept 8-bit
characters, a few control characters are rejected, and the tilde
`~' is also rejected.
* Input line length is only limited by available memory, even if GNU
extensions are disabled. However, System V `ptx' processes only
the first 200 characters in each line.
* The break (non-word) characters default to be every character
except all letters of the underlying character set, diacriticized
or not. When GNU extensions are disabled, the break characters
default to space, tab and newline only.
* The program makes better use of output line width. If GNU
extensions are disabled, the program rather tries to imitate
System V `ptx', but still, there are some slight disposition
glitches this program does not completely reproduce.
* The user can specify both an Ignore file and an Only file. This
is not allowed with System V `ptx'.
File: coreutils.info, Node: tsort invocation, Next: tsort background, Prev: ptx invocation, Up: Operating on sorted files
`tsort': Topological sort
=========================
`tsort' performs a topological sort on the given FILE, or standard
input if no input file is given or for a FILE of `-'. For more details
and some history, see *Note tsort background::. Synopsis:
tsort [OPTION] [FILE]
`tsort' reads its input as pairs of strings, separated by blanks,
indicating a partial ordering. The output is a total ordering that
corresponds to the given partial ordering.
For example
tsort <<EOF
a b c
d
e f
b c d e
EOF
will produce the output
a
b
c
d
e
f
Consider a more realistic example. You have a large set of
functions all in one file, and they may all be declared static except
one. Currently that one (say `main') is the first function defined in
the file, and the ones it calls directly follow it, followed by those
they call, etc. Let's say that you are determined to take advantage of
prototypes, so you have to choose between declaring all of those
functions (which means duplicating a lot of information from the
definitions) and rearranging the functions so that as many as possible
are defined before they are used. One way to automate the latter
process is to get a list for each function of the functions it calls
directly. Many programs can generate such lists. They describe a call
graph. Consider the following list, in which a given line indicates
that the function on the left calls the one on the right directly.
main parse_options
main tail_file
main tail_forever
tail_file pretty_name
tail_file write_header
tail_file tail
tail_forever recheck
tail_forever pretty_name
tail_forever write_header
tail_forever dump_remainder
tail tail_lines
tail tail_bytes
tail_lines start_lines
tail_lines dump_remainder
tail_lines file_lines
tail_lines pipe_lines
tail_bytes xlseek
tail_bytes start_