mirror of
https://github.com/Perl/perl5.git
synced 2026-01-26 08:38:23 +00:00
This update makes it more clear that the double tilde in a format implies the single tilde (see the code of doparseform in pp_ctl.c). Inspired by a question from user hotpelmen at https://www.perlmonks.org/?node_id=11152452
457 lines
16 KiB
Plaintext
457 lines
16 KiB
Plaintext
=head1 NAME
|
|
X<format> X<report> X<chart>
|
|
|
|
perlform - Perl formats
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
Perl has a mechanism to help you generate simple reports and charts. To
|
|
facilitate this, Perl helps you code up your output page close to how it
|
|
will look when it's printed. It can keep track of things like how many
|
|
lines are on a page, what page you're on, when to print page headers,
|
|
etc. Keywords are borrowed from FORTRAN: format() to declare and write()
|
|
to execute; see their entries in L<perlfunc>. Fortunately, the layout is
|
|
much more legible, more like BASIC's PRINT USING statement. Think of it
|
|
as a poor man's nroff(1).
|
|
X<nroff>
|
|
|
|
Formats, like packages and subroutines, are declared rather than
|
|
executed, so they may occur at any point in your program. (Usually it's
|
|
best to keep them all together though.) They have their own namespace
|
|
apart from all the other "types" in Perl. This means that if you have a
|
|
function named "Foo", it is not the same thing as having a format named
|
|
"Foo". However, the default name for the format associated with a given
|
|
filehandle is the same as the name of the filehandle. Thus, the default
|
|
format for STDOUT is named "STDOUT", and the default format for filehandle
|
|
TEMP is named "TEMP". They just look the same. They aren't.
|
|
|
|
Output record formats are declared as follows:
|
|
|
|
format NAME =
|
|
FORMLIST
|
|
.
|
|
|
|
If the name is omitted, format "STDOUT" is defined. A single "." in
|
|
column 1 is used to terminate a format. FORMLIST consists of a sequence
|
|
of lines, each of which may be one of three types:
|
|
|
|
=over 4
|
|
|
|
=item 1.
|
|
|
|
A comment, indicated by putting a '#' in the first column.
|
|
|
|
=item 2.
|
|
|
|
A "picture" line giving the format for one output line.
|
|
|
|
=item 3.
|
|
|
|
An argument line supplying values to plug into the previous picture line.
|
|
|
|
=back
|
|
|
|
Picture lines contain output field definitions, intermingled with
|
|
literal text. These lines do not undergo any kind of variable interpolation.
|
|
Field definitions are made up from a set of characters, for starting and
|
|
extending a field to its desired width. This is the complete set of
|
|
characters for field definitions:
|
|
X<format, picture line>
|
|
X<@> X<^> X<< < >> X<< | >> X<< > >> X<#> X<0> X<.> X<...>
|
|
X<@*> X<^*> X<~> X<~~>
|
|
|
|
@ start of regular field
|
|
^ start of special field
|
|
< pad character for left justification
|
|
| pad character for centering
|
|
> pad character for right justification
|
|
# pad character for a right-justified numeric field
|
|
0 instead of first #: pad number with leading zeroes
|
|
. decimal point within a numeric field
|
|
... terminate a text field, show "..." as truncation evidence
|
|
@* variable width field for a multi-line value
|
|
^* variable width field for next line of a multi-line value
|
|
~ suppress line with all fields empty
|
|
~~ repeat line until all fields are exhausted
|
|
|
|
Each field in a picture line starts with either "@" (at) or "^" (caret),
|
|
indicating what we'll call, respectively, a "regular" or "special" field.
|
|
The choice of pad characters determines whether a field is textual or
|
|
numeric. The tilde operators are not part of a field. Let's look at
|
|
the various possibilities in detail.
|
|
|
|
|
|
=head2 Text Fields
|
|
X<format, text field>
|
|
|
|
The length of the field is supplied by padding out the field with multiple
|
|
"E<lt>", "E<gt>", or "|" characters to specify a non-numeric field with,
|
|
respectively, left justification, right justification, or centering.
|
|
For a regular field, the value (up to the first newline) is taken and
|
|
printed according to the selected justification, truncating excess characters.
|
|
If you terminate a text field with "...", three dots will be shown if
|
|
the value is truncated. A special text field may be used to do rudimentary
|
|
multi-line text block filling; see L</Using Fill Mode> for details.
|
|
|
|
Example:
|
|
format STDOUT =
|
|
@<<<<<< @|||||| @>>>>>>
|
|
"left", "middle", "right"
|
|
.
|
|
Output:
|
|
left middle right
|
|
|
|
|
|
=head2 Numeric Fields
|
|
X<#> X<format, numeric field>
|
|
|
|
Using "#" as a padding character specifies a numeric field, with
|
|
right justification. An optional "." defines the position of the
|
|
decimal point. With a "0" (zero) instead of the first "#", the
|
|
formatted number will be padded with leading zeroes if necessary.
|
|
A special numeric field is blanked out if the value is undefined.
|
|
If the resulting value would exceed the width specified the field is
|
|
filled with "#" as overflow evidence.
|
|
|
|
Example:
|
|
format STDOUT =
|
|
@### @.### @##.### @### @### ^####
|
|
42, 3.1415, undef, 0, 10000, undef
|
|
.
|
|
Output:
|
|
42 3.142 0.000 0 ####
|
|
|
|
|
|
=head2 The Field @* for Variable-Width Multi-Line Text
|
|
X<@*>
|
|
|
|
The field "@*" can be used for printing multi-line, nontruncated
|
|
values; it should (but need not) appear by itself on a line. A final
|
|
line feed is chomped off, but all other characters are emitted verbatim.
|
|
|
|
|
|
=head2 The Field ^* for Variable-Width One-line-at-a-time Text
|
|
X<^*>
|
|
|
|
Like "@*", this is a variable-width field. The value supplied must be a
|
|
scalar variable. Perl puts the first line (up to the first "\n") of the
|
|
text into the field, and then chops off the front of the string so that
|
|
the next time the variable is referenced, more of the text can be printed.
|
|
The variable will I<not> be restored.
|
|
|
|
Example:
|
|
$text = "line 1\nline 2\nline 3";
|
|
format STDOUT =
|
|
Text: ^*
|
|
$text
|
|
~~ ^*
|
|
$text
|
|
.
|
|
Output:
|
|
Text: line 1
|
|
line 2
|
|
line 3
|
|
|
|
|
|
=head2 Specifying Values
|
|
X<format, specifying values>
|
|
|
|
The values are specified on the following format line in the same order as
|
|
the picture fields. The expressions providing the values must be
|
|
separated by commas. They are all evaluated in a list context
|
|
before the line is processed, so a single list expression could produce
|
|
multiple list elements. The expressions may be spread out to more than
|
|
one line if enclosed in braces. If so, the opening brace must be the first
|
|
token on the first line. If an expression evaluates to a number with a
|
|
decimal part, and if the corresponding picture specifies that the decimal
|
|
part should appear in the output (that is, any picture except multiple "#"
|
|
characters B<without> an embedded "."), the character used for the decimal
|
|
point is determined by the current LC_NUMERIC locale if C<use locale> is in
|
|
effect. This means that, if, for example, the run-time environment happens
|
|
to specify a German locale, "," will be used instead of the default ".". See
|
|
L<perllocale> and L</"WARNINGS"> for more information.
|
|
|
|
|
|
=head2 Using Fill Mode
|
|
X<format, fill mode>
|
|
|
|
On text fields the caret enables a kind of fill mode. Instead of an
|
|
arbitrary expression, the value supplied must be a scalar variable
|
|
that contains a text string. Perl puts the next portion of the text into
|
|
the field, and then chops off the front of the string so that the next time
|
|
the variable is referenced, more of the text can be printed. (Yes, this
|
|
means that the variable itself is altered during execution of the write()
|
|
call, and is not restored.) The next portion of text is determined by
|
|
a crude line-breaking algorithm. You may use the carriage return character
|
|
(C<\r>) to force a line break. You can change which characters are legal
|
|
to break on by changing the variable C<$:> (that's
|
|
$FORMAT_LINE_BREAK_CHARACTERS if you're using the English module) to a
|
|
list of the desired characters.
|
|
|
|
Normally you would use a sequence of fields in a vertical stack associated
|
|
with the same scalar variable to print out a block of text. You might wish
|
|
to end the final field with the text "...", which will appear in the output
|
|
if the text was too long to appear in its entirety.
|
|
|
|
|
|
=head2 Suppressing Lines Where All Fields Are Void
|
|
X<format, suppressing lines>
|
|
|
|
Using caret fields can produce lines where all fields are blank. You can
|
|
suppress such lines by putting a "~" (tilde) character anywhere in the
|
|
line. The tilde will be translated to a space upon output.
|
|
|
|
|
|
=head2 Repeating Format Lines
|
|
X<format, repeating lines>
|
|
|
|
If you put two contiguous tilde characters "~~" anywhere into a line,
|
|
then in addition to suppressing the line if all fields are blank,
|
|
the line will be repeated until all the fields on the line are exhausted,
|
|
i.e. undefined. For special (caret) text fields this will occur sooner or
|
|
later, but if you use a text field of the at variety, the expression you
|
|
supply had better not give the same value every time forever! (C<shift(@f)>
|
|
is a simple example that would work.) Don't use a regular (at) numeric
|
|
field in such lines, because it will never go blank.
|
|
|
|
|
|
=head2 Top of Form Processing
|
|
X<format, top of form> X<top> X<header>
|
|
|
|
Top-of-form processing is by default handled by a format with the
|
|
same name as the current filehandle with "_TOP" concatenated to it.
|
|
It's triggered at the top of each page. See L<perlfunc/write>.
|
|
|
|
Examples:
|
|
|
|
# a report on the /etc/passwd file
|
|
format STDOUT_TOP =
|
|
Passwd File
|
|
Name Login Office Uid Gid Home
|
|
------------------------------------------------------------------
|
|
.
|
|
format STDOUT =
|
|
@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
|
|
$name, $login, $office,$uid,$gid, $home
|
|
.
|
|
|
|
|
|
# a report from a bug report form
|
|
format STDOUT_TOP =
|
|
Bug Reports
|
|
@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
|
|
$system, $%, $date
|
|
------------------------------------------------------------------
|
|
.
|
|
format STDOUT =
|
|
Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$subject
|
|
Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$index, $description
|
|
Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$priority, $date, $description
|
|
From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$from, $description
|
|
Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$programmer, $description
|
|
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$description
|
|
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$description
|
|
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$description
|
|
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$description
|
|
~ ^<<<<<<<<<<<<<<<<<<<<<<<...
|
|
$description
|
|
.
|
|
|
|
It is possible to intermix print()s with write()s on the same output
|
|
channel, but you'll have to handle C<$-> (C<$FORMAT_LINES_LEFT>)
|
|
yourself.
|
|
|
|
=head2 Format Variables
|
|
X<format variables>
|
|
X<format, variables>
|
|
|
|
The current format name is stored in the variable C<$~> (C<$FORMAT_NAME>),
|
|
and the current top of form format name is in C<$^> (C<$FORMAT_TOP_NAME>).
|
|
The current output page number is stored in C<$%> (C<$FORMAT_PAGE_NUMBER>),
|
|
and the number of lines on the page is in C<$=> (C<$FORMAT_LINES_PER_PAGE>).
|
|
Whether to autoflush output on this handle is stored in C<$|>
|
|
(C<$OUTPUT_AUTOFLUSH>). The string output before each top of page (except
|
|
the first) is stored in C<$^L> (C<$FORMAT_FORMFEED>). These variables are
|
|
set on a per-filehandle basis, so you'll need to select() into a different
|
|
one to affect them:
|
|
|
|
select((select(OUTF),
|
|
$~ = "My_Other_Format",
|
|
$^ = "My_Top_Format"
|
|
)[0]);
|
|
|
|
Pretty ugly, eh? It's a common idiom though, so don't be too surprised
|
|
when you see it. You can at least use a temporary variable to hold
|
|
the previous filehandle: (this is a much better approach in general,
|
|
because not only does legibility improve, you now have an intermediary
|
|
stage in the expression to single-step the debugger through):
|
|
|
|
$ofh = select(OUTF);
|
|
$~ = "My_Other_Format";
|
|
$^ = "My_Top_Format";
|
|
select($ofh);
|
|
|
|
If you use the English module, you can even read the variable names:
|
|
|
|
use English;
|
|
$ofh = select(OUTF);
|
|
$FORMAT_NAME = "My_Other_Format";
|
|
$FORMAT_TOP_NAME = "My_Top_Format";
|
|
select($ofh);
|
|
|
|
But you still have those funny select()s. So just use the FileHandle
|
|
module. Now, you can access these special variables using lowercase
|
|
method names instead:
|
|
|
|
use FileHandle;
|
|
format_name OUTF "My_Other_Format";
|
|
format_top_name OUTF "My_Top_Format";
|
|
|
|
Much better!
|
|
|
|
=head1 NOTES
|
|
|
|
Because the values line may contain arbitrary expressions (for at fields,
|
|
not caret fields), you can farm out more sophisticated processing
|
|
to other functions, like sprintf() or one of your own. For example:
|
|
|
|
format Ident =
|
|
@<<<<<<<<<<<<<<<
|
|
&commify($n)
|
|
.
|
|
|
|
To get a real at or caret into the field, do this:
|
|
|
|
format Ident =
|
|
I have an @ here.
|
|
"@"
|
|
.
|
|
|
|
To center a whole line of text, do something like this:
|
|
|
|
format Ident =
|
|
@|||||||||||||||||||||||||||||||||||||||||||||||
|
|
"Some text line"
|
|
.
|
|
|
|
There is no builtin way to say "float this to the right hand side
|
|
of the page, however wide it is." You have to specify where it goes.
|
|
The truly desperate can generate their own format on the fly, based
|
|
on the current number of columns, and then eval() it:
|
|
|
|
$format = "format STDOUT = \n"
|
|
. '^' . '<' x $cols . "\n"
|
|
. '$entry' . "\n"
|
|
. "\t^" . "<" x ($cols-8) . "~~\n"
|
|
. '$entry' . "\n"
|
|
. ".\n";
|
|
print $format if $Debugging;
|
|
eval $format;
|
|
die $@ if $@;
|
|
|
|
Which would generate a format looking something like this:
|
|
|
|
format STDOUT =
|
|
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
|
|
$entry
|
|
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~
|
|
$entry
|
|
.
|
|
|
|
Here's a little program that's somewhat like fmt(1):
|
|
|
|
format =
|
|
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
|
|
$_
|
|
|
|
.
|
|
|
|
$/ = '';
|
|
while (<>) {
|
|
s/\s*\n\s*/ /g;
|
|
write;
|
|
}
|
|
|
|
=head2 Footers
|
|
X<format, footer> X<footer>
|
|
|
|
While $FORMAT_TOP_NAME contains the name of the current header format,
|
|
there is no corresponding mechanism to automatically do the same thing
|
|
for a footer. Not knowing how big a format is going to be until you
|
|
evaluate it is one of the major problems. It's on the TODO list.
|
|
|
|
Here's one strategy: If you have a fixed-size footer, you can get footers
|
|
by checking $FORMAT_LINES_LEFT before each write() and print the footer
|
|
yourself if necessary.
|
|
|
|
Here's another strategy: Open a pipe to yourself, using C<open(MYSELF, "|-")>
|
|
(see L<perlfunc/open>) and always write() to MYSELF instead of STDOUT.
|
|
Have your child process massage its STDIN to rearrange headers and footers
|
|
however you like. Not very convenient, but doable.
|
|
|
|
=head2 Accessing Formatting Internals
|
|
X<format, internals>
|
|
|
|
For low-level access to the formatting mechanism, you may use formline()
|
|
and access C<$^A> (the $ACCUMULATOR variable) directly.
|
|
|
|
For example:
|
|
|
|
$str = formline <<'END', 1,2,3;
|
|
@<<< @||| @>>>
|
|
END
|
|
|
|
print "Wow, I just stored '$^A' in the accumulator!\n";
|
|
|
|
Or to make an swrite() subroutine, which is to write() what sprintf()
|
|
is to printf(), do this:
|
|
|
|
use Carp;
|
|
sub swrite {
|
|
croak "usage: swrite PICTURE ARGS" unless @_;
|
|
my $format = shift;
|
|
$^A = "";
|
|
formline($format,@_);
|
|
return $^A;
|
|
}
|
|
|
|
$string = swrite(<<'END', 1, 2, 3);
|
|
Check me out
|
|
@<<< @||| @>>>
|
|
END
|
|
print $string;
|
|
|
|
=head1 WARNINGS
|
|
|
|
The lone dot that ends a format can also prematurely end a mail
|
|
message passing through a misconfigured Internet mailer (and based on
|
|
experience, such misconfiguration is the rule, not the exception). So
|
|
when sending format code through mail, you should indent it so that
|
|
the format-ending dot is not on the left margin; this will prevent
|
|
SMTP cutoff.
|
|
|
|
Lexical variables (declared with "my") are not visible within a
|
|
format unless the format is declared within the scope of the lexical
|
|
variable.
|
|
|
|
If a program's environment specifies an LC_NUMERIC locale and C<use
|
|
locale> is in effect when the format is declared, the locale is used
|
|
to specify the decimal point character in formatted output. Formatted
|
|
output cannot be controlled by C<use locale> at the time when write()
|
|
is called. See L<perllocale> for further discussion of locale handling.
|
|
|
|
Within strings that are to be displayed in a fixed-length text field,
|
|
each control character is substituted by a space. (But remember the
|
|
special meaning of C<\r> when using fill mode.) This is done to avoid
|
|
misalignment when control characters "disappear" on some output media.
|
|
|