Linux "gawk" Command Line Options and Examples
pattern scanning and processing language

Gawk is the GNU Project's implementation of the AWK programming language. It conforms to the definition of the language in the POSIX 1003.1 Standard.


Usage:

gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
    gawk [ POSIX or GNU style options ] [ -- ] program-text file ...






Command Line Options:

--file
Read the AWK program source from the file program-file, instead of from the first command line argument. Multiple -f (or
gawk --file ...
--file)
options may be used.
gawk --file) ...
--field-separator
Use fs for the input field separator (the value of the FS predefined variable).
gawk --field-separator ...
--assign
Assign the value val to the variable var, before execution of the program begins. Such variable values are available to theBEGIN rule of an AWK program.
gawk --assign ...
--characters-as-bytes
Treat all input data as single-byte characters. In other words, don't pay any attention to the locale information whenattempting to process strings as multibyte characters. The --posix option overrides this one.
gawk --characters-as-bytes ...
--traditional
Run in compatibility mode. In compatibility mode, gawk behaves identically to Brian Kernighan's awk; none of the GNU-specificextensions are recognized. See GNU EXTENSIONS, below, for more information.
gawk --traditional ...
--copyright
Print the short version of the GNU copyright information message on the standard output and exit successfully.
gawk --copyright ...
--dump-variables[
Print a sorted list of global variables, their types and final values to file. If no file is provided, gawk uses a file namedawkvars.out in the current directory.Having a list of all the global variables is a good way to look for typographical errors in your programs. You would also usethis option if you have a large program with a lot of functions, and you want to be sure that your functions don't inadver‐tently use global variables that you meant to be local. (This is a particularly easy mistake to make with simple variablenames like i, j, and so on.)
gawk --dump-variables[ ...
--debug[
Enable debugging of AWK programs. By default, the debugger reads commands interactively from the keyboard (standard input).The optional file argument specifies a file with a list of commands for the debugger to execute non-interactively.
gawk --debug[ ...
--source
Use program-text as AWK program source code. This option allows the easy intermixing of library functions (used via the -fand --file options) with source code entered on the command line. It is intended primarily for medium to large AWK programsused in shell scripts.
gawk --source ...
--exec
Similar to -f, however, this is option is the last one processed. This should be used with #! scripts, particularly for CGIapplications, to avoid passing in options or source code (!) on the command line from a URL. This option disables command-line variable assignments.
gawk --exec ...
--gen-pot
Scan and parse the AWK program, and generate a GNU .pot (Portable Object Template) format file on standard output with entriesfor all localizable strings in the program. The program itself is not executed. See the GNU gettext distribution for moreinformation on .pot files.
gawk --gen-pot ...
--include
Load an awk source library. This searches for the library using the AWKPATH environment variable. If the initial searchfails, another attempt will be made after appending the .awk suffix. The file will be loaded only once (i.e., duplicates areeliminated), and the code does not constitute the main program source.
gawk --include ...
--load
Load a shared library lib. This searches for the library using the AWKLIBPATH environment variable. If the initial searchfails, another attempt will be made after appending the default shared library suffix for the platform. The library initial‐ization routine is expected to be named dl_load().
gawk --load ...
--lint[
Provide warnings about constructs that are dubious or non-portable to other AWK implementations. With an optional argument offatal, lint warnings become fatal errors. This may be drastic, but its use will certainly encourage the development ofcleaner AWK programs. With an optional argument of invalid, only warnings about things that are actually invalid are issued.(This is not fully implemented yet.)
gawk --lint[ ...
--bignum
Force arbitrary precision arithmetic on numbers. This option has no effect if gawk is not compiled to use the GNU MPFR and MPlibraries.
gawk --bignum ...
--non-decimal-data
Recognize octal and hexadecimal values in input data. Use this option with great caution!
gawk --non-decimal-data ...
--use-lc-numeric
This forces gawk to use the locale's decimal point character when parsing input data. Although the POSIX standard requiresthis behavior, and gawk does so when --posix is in effect, the default is to follow traditional behavior and use a period asthe decimal point, even in locales where the period is not the decimal point character. This option overrides the defaultbehavior, without the full draconian strictness of the --posix option.
gawk --use-lc-numeric ...
--pretty-print[
Output a pretty printed version of the program to file. If no file is provided, gawk uses a file named awkprof.out in thecurrent directory.
gawk --pretty-print[ ...
--optimize
Enable optimizations upon the internal representation of the program. Currently, this includes simple constant-folding, andtail call elimination for recursive functions. The gawk maintainer hopes to add additional optimizations over time.
gawk --optimize ...
--profile[
Start a profiling session, and send the profiling data to prof-file. The default is awkprof.out. The profile contains execu‐tion counts of each statement in the program in the left margin and function call counts for each user-defined function.
gawk --profile[ ...
--posix
This turns on compatibility mode, with the following additional restrictions:· \x escape sequences are not recognized.· Only space and tab act as field separators when FS is set to a single space, newline does not.· You cannot continue lines after ? and :.· The synonym func for the keyword function is not recognized.· The operators ** and **= cannot be used in place of ^ and ^=.
gawk --posix ...
--re-interval
Enable the use of interval expressions in regular expression matching (see Regular Expressions, below). Interval expressionswere not traditionally available in the AWK language. The POSIX standard added them, to make awk and egrep consistent witheach other. They are enabled by default, but this option remains for use with --traditional.
gawk --re-interval ...
--sandbox
Runs gawk in sandbox mode, disabling the system() function, input redirection with getline, output redirection with print andprintf, and loading dynamic extensions. Command execution (through pipelines) is also disabled. This effectively blocks ascript from accessing local resources (except for the files specified on the command line).
gawk --sandbox ...
--lint-old
Provide warnings about constructs that are not portable to the original version of UNIX awk.
gawk --lint-old ...
--version
Print version information for this particular copy of gawk on the standard output. This is useful mainly for knowing if thecurrent copy of gawk on your system is up to date with respect to whatever the Free Software Foundation is distributing. Thisis also useful when reporting bugs. (Per the GNU Coding Standards, these options cause an immediate, successful exit.)
gawk --version ...
--lint=fatal.
NF The number of fields in the current input record.NR The total number of input records seen so far.OFMT The output format for numbers, "%.6g", by default.OFS The output field separator, a space by default.ORS The output record separator, by default a newline.PREC The working precision of arbitrary precision floating-point numbers, 53 by default.PROCINFO The elements of this array provide access to information about the running AWK program. On some systems, there may beelements in the array, "group1" through "groupn" for some n, which is the number of supplementary groups that the processhas. Use the in operator to test for these elements. The following elements are guaranteed to be available:PROCINFO["egid"] The value of the getegid(2) system call.PROCINFO["euid"] The value of the geteuid(2) system call.PROCINFO["FS"] "FS" if field splitting with FS is in effect, "FPAT" if field splitting with FPAT is in effect, or"FIELDWIDTHS" if field splitting with FIELDWIDTHS is in effect.PROCINFO["gid"] The value of the getgid(2) system call.PROCINFO["identifiers"]A subarray, indexed by the names of all identifiers used in the text of the AWK program. The valuesindicate what gawk knows about the identifiers after it has finished parsing the program; they arenot updated while the program runs. For each identifier, the value of the element is one of thefollowing:"array"The identifier is an array."builtin"The identifier is a built-in function."extension"The identifier is an extension function loaded via @load or -l."scalar"The identifier is a scalar."untyped"The identifier is untyped (could be used as a scalar or array, gawk doesn't know yet)."user" The identifier is a user-defined function.PROCINFO["pgrpid"] The process group ID of the current process.PROCINFO["pid"] The process ID of the current process.PROCINFO["ppid"] The parent process ID of the current process.PROCINFO["strftime"] The default time format string for strftime().PROCINFO["uid"] The value of the getuid(2) system call.PROCINFO["version"] the version of gawk.The following elements are present if loading dynamic extensions is available:PROCINFO["api_major"]The major version of the extension API.PROCINFO["api_minor"]The minor version of the extension API.The following elements are available if MPFR support is compiled into gawk:PROCINFO["gmp_version"]The version of the GNU MP library used for arbitrary precision number support in gawk.PROCINFO["mpfr_version"]The version of the GNU MPFR library used for arbitrary precision number support in gawk.PROCINFO["prec_max"]The maximum precision supported by the GNU MPFR library for arbitrary precision floating-point numbers.PROCINFO["prec_min"]The minimum precision allowed by the GNU MPFR library for arbitrary precision floating-point numbers.The following elements may set by a program to change gawk's behavior:PROCINFO["command", "pty"]Use a pseudo-tty for two-way communication with command instead of setting up two one-way pipes.PROCINFO["input", "READ_TIMEOUT"]The timeout in milliseconds for reading data from input, where input is a redirection string or a filename. Avalue of zero or less than zero means no timeout.PROCINFO["sorted_in"]If this element exists in PROCINFO, then its value controls the order in which array elements are traversed in forloops. Supported values are "@ind_str_asc", "@ind_num_asc", "@val_type_asc", "@val_str_asc", "@val_num_asc","@ind_str_desc", "@ind_num_desc", "@val_type_desc", "@val_str_desc", "@val_num_desc", and "@unsorted". The valuecan also be the name of any comparison function defined as follows:function cmp_func(i1, v1, i2, v2)where i1 and i2 are the indices, and v1 and v2 are the corresponding values of the two elements being compared.It should return a number less than, equal to, or greater than 0, depending on how the elements of the array areto be ordered.ROUNDMODE The rounding mode to use for arbitrary precision arithmetic on numbers, by default "N" (IEEE-754 roundTiesToEven mode).The accepted values are "N" or "n" for roundTiesToEven, "U" or "u" for roundTowardPositive, "D" or "d" for roundToward‐Negative, "Z" or "z" for roundTowardZero, and if your version of GNU MPFR library supports it, "A" or "a" forroundTiesToAway.RS The input record separator, by default a newline.RT The record terminator. Gawk sets RT to the input text that matched the character or regular expression specified by RS.RSTART The index of the first character matched by match(); 0 if no match. (This implies that character indices start at one.)RLENGTH The length of the string matched by match(); -1 if no match.SUBSEP The character used to separate multiple subscripts in array elements, by default "\034".SYMTAB An array whose indices are the names of all currently defined global variables and arrays in the program. The array maybe used for indirect access to read or write the value of a variable:foo = 5SYMTAB["foo"] = 4print foo # prints 4The isarray() function may be used to test if an element in SYMTAB is an array. You may not use the delete statementwith the SYMTAB array.TEXTDOMAIN The text domain of the AWK program; used to find the localized translations for the program's strings.ArraysArrays are subscripted with an expression between square brackets ([ and ]). If the expression is an expression list (expr, expr...) then the array subscript is a string consisting of the concatenation of the (string) value of each expression, separated by thevalue of the SUBSEP variable. This facility is used to simulate multiply dimensioned arrays. For example:i = "A"; j = "B"; k = "C"x[i, j, k] = "hello, world\n"assigns the string "hello, world\n" to the element of the array x which is indexed by the string "A\034B\034C". All arrays in AWKare associative, i.e., indexed by string values.The special operator in may be used to test if an array has an index consisting of a particular value:if (val in array)print array[val]If the array has multiple subscripts, use (i, j) in array.The in construct may also be used in a for loop to iterate over all the elements of an array. However, the (i, j) in array constructonly works in tests, not in for loops.An element may be deleted from an array using the delete statement. The delete statement may also be used to delete the entire con‐tents of an array, just by specifying the array name without a subscript.gawk supports true multidimensional arrays. It does not require that such arrays be ``rectangular'' as in C or C++. For example:a[1] = 5a[2][1] = 6a[2][2] = 7NOTE: You may need to tell gawk that an array element is really a subarray in order to use it where gawk expects an array (such as inthe second argument to split()). You can do this by creating an element in the subarray and then deleting it with the delete state‐ment.Variable Typing And ConversionVariables and fields may be (floating point) numbers, or strings, or both. How the value of a variable is interpreted depends uponits context. If used in a numeric expression, it will be treated as a number; if used as a string it will be treated as a string.To force a variable to be treated as a number, add 0 to it; to force it to be treated as a string, concatenate it with the nullstring.Uninitialized variables have the numeric value 0 and the string value "" (the null, or empty, string).When a string must be converted to a number, the conversion is accomplished using strtod(3). A number is converted to a string byusing the value of CONVFMT as a format string for sprintf(3), with the numeric value of the variable as the argument. However, eventhough all numbers in AWK are floating-point, integral values are always converted as integers. Thus, givenCONVFMT = "%2.2f"a = 12b = a ""the variable b has a string value of "12" and not "12.00".NOTE: When operating in POSIX mode (such as with the --posix option), beware that locale settings may interfere with the way decimalnumbers are treated: the decimal separator of the numbers you are feeding to gawk must conform to what your locale would expect, beit a comma (,) or a period (.).Gawk performs comparisons as follows: If two variables are numeric, they are compared numerically. If one value is numeric and theother has a string value that is a “numeric string,” then comparisons are also done numerically. Otherwise, the numeric value isconverted to a string and a string comparison is performed. Two strings are compared, of course, as strings.Note that string constants, such as "57", are not numeric strings, they are string constants. The idea of “numeric string” onlyapplies to fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the elements of an array created by split() or pat‐split() that are numeric strings. The basic idea is that user input, and only user input, that looks numeric, should be treated thatway.Octal and Hexadecimal ConstantsYou may use C-style octal and hexadecimal constants in your AWK program source code. For example, the octal value 011 is equal todecimal 9, and the hexadecimal value 0x11 is equal to decimal 17.String ConstantsString constants in AWK are sequences of characters enclosed between double quotes (like "value"). Within strings, certain escapesequences are recognized, as in C. These are:\\ A literal backslash.\a The “alert” character; usually the ASCII BEL character.\b Backspace.\f Form-feed.\n Newline.\r Carriage return.\t Horizontal tab.\v Vertical tab.\xhex digitsThe character represented by the string of hexadecimal digits following the \x. As in ISO C, all following hexadecimal digitsare considered part of the escape sequence. (This feature should tell us something about language design by committee.) E.g.,"\x1B" is the ASCII ESC (escape) character.\ddd The character represented by the 1-, 2-, or 3-digit sequence of octal digits. E.g., "\033" is the ASCII ESC (escape) character.\c The literal character c.The escape sequences may also be used inside constant regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).In compatibility mode, the characters represented by octal and hexadecimal escape sequences are treated literally when used in regu‐lar expression constants. Thus, /a\52b/ is equivalent to /a\*b/.PATTERNS AND ACTIONSAWK is a line-oriented language. The pattern comes first, and then the action. Action statements are enclosed in { and }. Eitherthe pattern may be missing, or the action may be missing, but, of course, not both. If the pattern is missing, the action is exe‐cuted for every single record of input. A missing action is equivalent to{ print }which prints the entire record.Comments begin with the # character, and continue until the end of the line. Blank lines may be used to separate statements. Nor‐mally, a statement ends with a newline, however, this is not the case for lines ending in a comma, {, ?, :, &&, or ||. Lines endingin do or else also have their statements automatically continued on the following line. In other cases, a line can be continued byending it with a “\”, in which case the newline is ignored.Multiple statements may be put on one line by separating them with a “;”. This applies to both the statements within the action partof a pattern-action pair (the usual case), and to the pattern-action statements themselves.PatternsAWK patterns may be one of the following:BEGINENDBEGINFILEENDFILE/regular expression/relational expressionpattern && patternpattern || patternpattern ? pattern : pattern(pattern)! patternpattern1, pattern2BEGIN and END are two special kinds of patterns which are not tested against the input. The action parts of all BEGIN patterns aremerged as if all the statements had been written in a single BEGIN rule. They are executed before any of the input is read. Simi‐larly, all the END rules are merged, and executed when all the input is exhausted (or when an exit statement is executed). BEGIN andEND patterns cannot be combined with other patterns in pattern expressions. BEGIN and END patterns cannot have missing action parts.BEGINFILE and ENDFILE are additional special patterns whose bodies are executed before reading the first record of each command lineinput file and after reading the last record of each file. Inside the BEGINFILE rule, the value of ERRNO will be the empty string ifthe file was opened successfully. Otherwise, there is some problem with the file and the code should use nextfile to skip it. Ifthat is not done, gawk produces its usual fatal error for files that cannot be opened.For /regular expression/ patterns, the associated statement is executed for each input record that matches the regular expression.Regular expressions are the same as those in egrep(1), and are summarized below.A relational expression may use any of the operators defined below in the section on actions. These generally test whether certainfields match certain regular expressions.The &&, ||, and ! operators are logical AND, logical OR, and logical NOT, respectively, as in C. They do short-circuit evaluation,also as in C, and are used for combining more primitive pattern expressions. As in most languages, parentheses may be used to changethe order of evaluation.The ?: operator is like the same operator in C. If the first pattern is true then the pattern used for testing is the second pat‐tern, otherwise it is the third. Only one of the second and third patterns is evaluated.The pattern1, pattern2 form of an expression is called a range pattern. It matches all input records starting with a record thatmatches pattern1, and continuing until a record that matches pattern2, inclusive. It does not combine with any other sort of patternexpression.Regular ExpressionsRegular expressions are the extended kind found in egrep. They are composed of characters as follows:c Matches the non-metacharacter c.\c Matches the literal character c.. Matches any character including newline.^ Matches the beginning of a string.$ Matches the end of a string.[abc...] A character list: matches any of the characters abc.... You may include a range of characters by separating them with adash.[^abc...] A negated character list: matches any character except abc....r1|r2 Alternation: matches either r1 or r2.r1r2 Concatenation: matches r1, and then r2.r+ Matches one or more r's.r* Matches zero or more r's.r? Matches zero or one r's.(r) Grouping: matches r.r{n}r{n,}r{n,m} One or two numbers inside braces denote an interval expression. If there is one number in the braces, the preceding regu‐lar expression r is repeated n times. If there are two numbers separated by a comma, r is repeated n to m times. Ifthere is one number followed by a comma, then r is repeated at least n times.\y Matches the empty string at either the beginning or the end of a word.\B Matches the empty string within a word.\< Matches the empty string at the beginning of a word.\> Matches the empty string at the end of a word.\s Matches any whitespace character.\S Matches any nonwhitespace character.\w Matches any word-constituent character (letter, digit, or underscore).\W Matches any character that is not word-constituent.\` Matches the empty string at the beginning of a buffer (string).\' Matches the empty string at the end of a buffer.The escape sequences that are valid in string constants (see String Constants) are also valid in regular expressions.Character classes are a feature introduced in the POSIX standard. A character class is a special notation for describing lists ofcharacters that have a specific attribute, but where the actual characters themselves can vary from country to country and/or fromcharacter set to character set. For example, the notion of what is an alphabetic character differs in the USA and in France.A character class is only valid in a regular expression inside the brackets of a character list. Character classes consist of [:, akeyword denoting the class, and :]. The character classes defined by the POSIX standard are:[:alnum:] Alphanumeric characters.[:alpha:] Alphabetic characters.[:blank:] Space or tab characters.[:cntrl:] Control characters.[:digit:] Numeric characters.[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)[:lower:] Lowercase alphabetic characters.[:print:] Printable characters (characters that are not control characters.)[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).[:space:] Space characters (such as space, tab, and formfeed, to name a few).[:upper:] Uppercase alphabetic characters.[:xdigit:] Characters that are hexadecimal digits.For example, before the POSIX standard, to match alphanumeric characters, you would have had to write /[A-Za-z0-9]/. If your charac‐ter set had other alphabetic characters in it, this would not match them, and if your character set collated differently from ASCII,this might not even match the ASCII alphanumeric characters. With the POSIX character classes, you can write /[[:alnum:]]/, and thismatches the alphabetic and numeric characters in your character set, no matter what it is.Two additional special sequences can appear in character lists. These apply to non-ASCII character sets, which can have single sym‐bols (called collating elements) that are represented with more than one character, as well as several characters that are equivalentfor collating, or sorting, purposes. (E.g., in French, a plain “e” and a grave-accented “`” are equivalent.)Collating SymbolsA collating symbol is a multi-character collating element enclosed in [. and .]. For example, if ch is a collating element,then [[.ch.]] is a regular expression that matches this collating element, while [ch] is a regular expression that matcheseither c or h.Equivalence ClassesAn equivalence class is a locale-specific name for a list of characters that are equivalent. The name is enclosed in [= and=]. For example, the name e might be used to represent all of “e”, “´”, and “`”. In this case, [[=e=]] is a regular expres‐sion that matches any of e, ´, or `.These features are very valuable in non-English speaking locales. The library functions that gawk uses for regular expression match‐ing currently only recognize POSIX character classes; they do not recognize collating symbols or equivalence classes.The \y, \B, \<, \>, \s, \S, \w, \W, \`, and \' operators are specific to gawk; they are extensions based on facilities in the GNUregular expression libraries.The various command line options control how gawk interprets characters in regular expressions.No optionsIn the default case, gawk provides all the facilities of POSIX regular expressions and the GNU regular expression operatorsdescribed above.
gawk --lint=fatal. ...
-
* / % Multiplication, division, and modulus.
gawk - ...
-f
The AWKLIBPATH environment variable can be used to provide a list of directories that gawk searches when looking for files named viathe -l and --load options.The GAWK_READ_TIMEOUT environment variable can be used to specify a timeout in milliseconds for reading input from a terminal, pipeor two-way communication including sockets.For connection to a remote host via socket, GAWK_SOCK_RETRIES controls the number of retries, and GAWK_MSEC_SLEEP and the intervalbetween retries. The interval is in milliseconds. On systems that do not support usleep(3), the value is rounded up to an integralnumber of seconds.If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly as if --posix had been specified on the command line. If
gawk -f ...
--lint
EXIT STATUSIf the exit statement is used with a value, then gawk exits with the numeric value given to it.Otherwise, if there were no problems during execution, gawk exits with the value of the C constant EXIT_SUCCESS. This is usuallyzero.If an error occurs, gawk exits with the value of the C constant EXIT_FAILURE. This is usually one.If gawk exits because of a fatal error, the exit status is 2. On non-POSIX systems, this value may be mapped to EXIT_FAILURE.VERSION INFORMATIONThis man page documents gawk, version 4.1.AUTHORSThe original version of UNIX awk was designed and implemented by Alfred Aho, Peter Weinberger, and Brian Kernighan of Bell Laborato‐ries. Brian Kernighan continues to maintain and enhance it.Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote gawk, to be compatible with the original version of awk distrib‐uted in Seventh Edition UNIX. John Woods contributed a number of bug fixes. David Trueman, with contributions from Arnold Robbins,made gawk compatible with the new version of UNIX awk. Arnold Robbins is the current maintainer.See GAWK: Effective AWK Programming for a full list of the contributors to gawk and its documentation.See the README file in the gawk distribution for up-to-date information about maintainers and which ports are currently supported.BUG REPORTSIf you find a bug in gawk, please send electronic mail to bug-gawk@gnu.org. Please include your operating system and its revision,the version of gawk (from gawk --version), which C compiler you used to compile it, and a test program and data that are as small aspossible for reproducing the problem.Before sending a bug report, please do the following things. First, verify that you have the latest version of gawk. Many bugs(usually subtle ones) are fixed at each release, and if yours is out of date, the problem may already have been solved. Second,please see if setting the environment variable LC_ALL to LC_ALL=C causes things to behave as you expect. If so, it's a locale issue,and may or may not really be a bug. Finally, please read this man page and the reference manual carefully to be sure that what youthink is a bug really is, instead of just a quirk in the language.Whatever you do, do NOT post a bug report in comp.lang.awk. While the gawk developers occasionally read this newsgroup, posting bugreports there is an unreliable way to report bugs. Instead, please use the electronic mail addresses given above. Really.If you're using a GNU/Linux or BSD-based system, you may wish to submit a bug report to the vendor of your distribution. That'sfine, but please send a copy to the official email address as well, since there's no guarantee that the bug report will be forwardedto the gawk maintainer.BUGSThe -F option is not necessary given the command line variable assignment feature; it remains only for backwards compatibility.
gawk --lint ...