| SORT(1) | General Commands Manual | SORT(1) |
sort — sort,
merge, or sequence check text and binary files
sort |
[-bCcdfgHhiMmnRrsuVz]
[-k
field1[,field2]]
[-o output]
[-S size]
[-T dir]
[-t char]
[file ...] |
The sort utility sorts the lines of text
or binary files. A line is a record separated from the subsequent record by
a newline (default) or NUL ‘\0’
character (-z option). A record can contain any
printable or unprintable characters. Comparisons are based on one or more
sort keys extracted from each line according to the specified command line
options. By default, sort uses entire lines for
comparison and sorts in ascii(7)
order.
If no file is specified, or if file is ‘-’, the standard input is used.
The options are as follows:
-C,
--check=silent|quiet-c,
--check-C, but additionally write a message to
stderr
if the input file is not sorted.-m,
--merge-o
output,
--output=output-S
size,
--buffer-size=sizesort may use up to about 90% of
available memory. If the input is too big to fit into the memory buffer,
temporary files are used.-s-T
dir,
--temporary-directory=dirTMPDIR or /tmp if
TMPDIR is not defined.-u,
--unique-s. If used with
-C or -c,
sort also checks that there are no lines with
duplicate keys.The following options override the default ordering rules. If
ordering options appear before the first -k option,
they apply globally to all sort keys. When attached to a specific key (see
-k), the ordering options override all global
ordering options for that key. Note that the ordering options intended to
apply globally should not appear after -k or results
may be unexpected.
-d,
--dictionary-order-f,
--ignore-case-g,
--general-numeric-sort,
--sort=general-numeric-n, this option handles general floating point
numbers. It has a more permissive format than that allowed by
-n but it has a significant performance
drawback.-h,
--human-numeric-sort,
--sort=human-numeric-h or -H options
(human-readable).-i,
--ignore-nonprinting-M,
--month-sort,
--sort=month-n,
--numeric-sort,
--sort=numeric-R,
--random-sort,
--sort=random--random-source. If multiple sort fields are
specified, the same random hash function is used for all of them.-r,
--reverse-V,
--version-sortWhen comparing two strings, both strings are split into substrings such that the first and every other odd-numbered substring consists of non-digit characters only, while every even-numbered substring consists of digits only. These substrings are compared in turn from left to right until a difference is found. The first substring can be empty; all others cannot.
Non-digit substrings are compared alphabetically, with upper case letters sorting before lower case letters, letters sorting before non-letters, and non-letters sorting in ascii(7) order. Substrings consisting of digits are compared as integer numbers.
At the end of each string, zero or more suffixes that start with a dot, consist only of letters, digits, and tilde characters, and do not start with a digit are ignored, equivalent to the regular expression "(\.([A-Za-z~][A-Za-z0-9~]*)?)*". This is intended for ignoring filename suffixes such as “.tar.bz2”.
In the following example, the first substring is "sort-" and the other odd-numbered substrings are all ".":
$ ls sort* | sort -V sort-1.022.tgz sort-1.23.tgz sort-1.23.1.tgz sort-1.024.tgz sort-1.024.003. sort-1.024.003.tgz sort-1.024.07.tgz sort-1.024.009.tgz
The treatment of field separators can be altered using these options:
-b,
--ignore-leading-blanks-k). If
-b is specified before the first
-k option, it applies globally to all key
specifications. Otherwise, -b can be attached
independently to each field argument of the key
specifications. Note that -b should not appear
after -k, and that it has no effect unless key
fields are specified.-k
field1[,field2],
--key=field1[,field2]-k option may be specified multiple times, in
which case subsequent keys are compared after earlier keys compare equal.
The -k option replaces the obsolete options
+pos1 and
-pos2, but the old notation
is also supported.-t
char,
--field-separator=char-t is not specified, the default field
separator is a sequence of blank-space characters, and consecutive blank
spaces do
not
delimit an empty field; further, the initial blank space
is
considered part of a field when determining key offsets. To use NUL as
field separator, use -t '\0'.-z,
--zero-terminated\0’) is used as the record
separator character.Other options:
--batch-size=numsort at once. This option affects behavior when
having many input files or using temporary files. The minimum value is 2.
The default value is 16.--compress-program=program-d option, it must decompress standard input to
standard output. If program fails,
sort will exit with an error. The
compress(1) and
gzip(1) utilities meet these
requirements.--debug--files0-from=filename--heapsort-u and
-s.--help-H,
--mergesort--mmap--qsort-u and
-s.--radixsort--random-source=filename--versionA field is defined as a maximal sequence of characters other than
the field separator and record separator (newline by default). Initial blank
spaces are included in the field unless -b has been
specified; the first blank space of a sequence of blank spaces acts as the
field separator and is included in the field (unless
-t is specified). For example, by default all blank
spaces at the beginning of a line are considered to be part of the first
field.
Fields are specified by the -k
field1[,field2] option. If
field2 is missing, the end of the key defaults to the
end of the line.
The arguments field1 and
field2 have the form m.n
(m,n > 0) and can
be followed by one or more of the modifiers b,
d, f,
i, n,
g, M and
r, which correspond to the options discussed above.
When b is specified, it applies only to
field1 or field2 where it is
specified while the rest of the modifiers apply to the whole key field
regardless if they are specified only with field1 or
field2 or both. A field1
position specified by m.n is interpreted as the
nth character from the beginning of the
mth field. A missing .n in
field1 means
‘.1’, indicating the first character
of the mth field; if the -b option
is in effect, n is counted from the first non-blank
character in the mth field; m.1b refers
to the first non-blank character in the mth field.
1.n refers to the
nth character from the beginning of the line; if
n is greater than the length of the line, the field is
taken to be empty.
nth positions are always counted from the field beginning, even if the field is shorter than the number of specified positions. Thus, the key can really start from a position in a subsequent field.
A field2 position specified by
m.n is interpreted as the nth character
(including separators) from the beginning of the mth
field. A missing .n indicates the last character of the
mth field; m = 0 designates the end of a
line. Thus the option -k
v.x,w.y is synonymous with the obsolete option
+v-1.x-1
-w-1.y; when
y is omitted,
-k v.x,w is synonymous with
+v-1.x-1
-w.0. The obsolete
+pos1
-pos2 option is still
supported, except for -w.0b,
which has no -k equivalent.
TMPDIRTMPDIR may be overridden by the
-T option.The sort utility exits with one of the
following values:
-C or -c, the input file
already met the sorting criteria.-C or
-c options.The sort utility is compliant with the
IEEE Std 1003.1-2008 (“POSIX.1”)
specification, except that it ignores the user's
locale(1) and always assumes
LC_ALL=C.
The flags [-gHhiMRSsTVz] are extensions to
that specification.
All long options are extensions to the specification. Some are
provided for compatibility with GNU sort, others are
specific to this implementation.
Some implementations of sort honor the
-b option even when no key fields are specified.
This implementation follows historic practice and IEEE Std
1003.1-2008 (“POSIX.1”) in only honoring
-b when it precedes a key field.
The historic practice of allowing the -o
option to appear after the file is supported for
compatibility with older versions of sort.
The historic key notations
+pos1 and
-pos2 are supported for
compatibility with older versions of sort but their
use is highly discouraged.
A sort command appeared in
Version 1 AT&T UNIX.
Gabor Kovesdan
<gabor@FreeBSD.org>
Oleg Moskalenko
<mom040267@gmail.com>
This implementation of sort has no limits
on input line length (other than imposed by available memory) or any
restrictions on bytes allowed within lines.
The performance depends highly on efficient choice of sort keys
and key complexity. The fastest sort is on whole lines, with option
-s. For the key specification, the simpler to
process the lines the faster the search will be.
When sorting by arithmetic value, using -n
results in much better performance than -g so its
use is encouraged whenever possible.
| April 1, 2025 | openbsd |