2305 Commits

Author SHA1 Message Date
Eric S. Raymond
8d0162b80a Update all the examples to use the new API elements.
Add a fully reentrant example.  And update to TODO file.
2020-10-12 21:07:14 -04:00
Eric S. Raymond
68711fc4a7 Replace YY_USER_INIT, YY_USER_ACION, and YY_BREAK #defines with Flex options.
These were the last preprocessor abuses in the Flex interface.

Also, add a TODO listing tests that need to be added.
2020-10-12 16:23:59 -04:00
Eric S. Raymond
a4dd4772d2 Make yydecl an ordinary string-valued option rather than a special directive.
This corrects a minor mistake I made earlier because I did not yet understand
the full generality of the option syntax.

Also fixes some minor markup errors in the manual.
2020-10-12 11:06:26 -04:00
Eric S. Raymond
7bd654ca0c Deprecate #define YY_EXTRA_TYPE in favor of the extra-type option.
Follow this through in the examples and manual.
2020-10-12 10:38:18 -04:00
Eric S. Raymond
da0cd947c3 Make flex --help work again.
Fixes a stupid, trivial initialization-time bug introduce
during methodizatiion.
2020-10-12 09:36:44 -04:00
Eric S. Raymond
aac8fe8896 Revise the manual so the reentrant interface is recommended early.
Also add the first example friom the manual, and a reentrant variant
of it, to the example set.  And document how to integrate examples.
2020-10-12 09:16:40 -04:00
Eric S. Raymond
721622ea6e Rename unput() to yyunput(); follow through in option switches.
This change was not strictly necessary to sever a preprocessor
dependency nor make the API uniform aross both C and C++.  But it
cried out to be made, because now *all* the rule hooks are in the yy*
namespace.  This makes the API easier to document and remember.

unput() is left in place as a compatibility macro for existing
users, but only documented as a legacy interface.  The "unput"
variants of switches and options have also been retained.
2020-10-12 06:56:56 -04:00
Eric S. Raymond
dad680611b Deprecate input(); document yyinput() for the C back end...
...leaving an "input" macro in place for legacy compatibility.

input() had already become yyinput() in the C++ back end in order to
avoid collision with predefibed C++ inoput. In a multi-language world,
this is good policy in general.  There's no real reason for C to
be different, and excellent reason to pull all possible entry
points into the yy namespace.
2020-10-12 05:35:24 -04:00
Eric S. Raymond
526be1f459 Introduce the term "rule hook" so we can use the term "macro" less.
Non-C/C++ back ends won't have macros, so the documentation should
treat macro-ness as an implementation detail when it has to be
mentioned at all - and usually it doesn't.

The key thing about rule hooks like yyless(), yymore() etc. isn't that
they're macros, it's that they can only br used in rule actions (e.g.
inside the body of the genetated tyylex code.

This documentation patch removes the term "macro" where it isn't needed.
2020-10-11 16:55:08 -04:00
Eric S. Raymond
f8225f259e Implement and document yystart(), replacing YY_START.
As usual, legacy interface left in place but deprecated.
2020-10-11 15:46:05 -04:00
Eric S. Raymond
cecae8b6aa Implement and document %noyyread, replacing YY_INPUT. 2020-10-11 13:04:57 -04:00
Eric S. Raymond
956ac03b7d Refactor YY_INPUT so it calls a new yyread() internal function....
...rather than splicing a bunch of exposed guts into the middle of
yylex().  yyread() is put in the set of functions that gets
prefix-modified.

This means buffer refill can be documented without C-specific
references to YY_INPUT.

It should also enable actually having a non-macro replacement
for YY_INPUT, with a bit more work.
2020-10-11 11:05:12 -04:00
Eric S. Raymond
1094c2a137 Revise Flex manual to say useful things about multilanguage support.
No specific thing can be said about a non-C/C++ backend yet, but
this patch prepares the way by explaining which features and aspects
of the Flex interface are specific to C/C++.

It also fixes one pre-ANSI prototype - that of non-reentrant yylex(),
which should be declared yylex(void) in this day and age.

These changes make one new commitment. Observing that the YY_INPUT
macro is impossible to port out of the C/C++ context, I have observed that
it is probably extinct in the wild (due to the later introduction of
the multi-buffer primitives, though I don't say that explicitly). The
text says tat people who really need the equivalent of this capability
in a non-C/C++ back end should file an issue with the Flex maintainers.

I don't actually expect this to happen.
2020-10-11 05:14:41 -04:00
Eric S. Raymond
3158c7f072 Implement and document %option yylmax to replace #define YYLMAX.
Test in tests/test-yylmax.
2020-10-10 19:25:32 -04:00
Eric S. Raymond
0d1959c595 Implement and document %yydecl directive to replace #define YY_DECL.
Also added a deprecation note abut the old mechanism.
2020-10-10 17:21:11 -04:00
Eric S. Raymond
76affdf894 Deprecate ECHO in favor of yyecho().
I changed the tests to use yyecho(), but the ECHO macro is still
exercised in the bootstrap scanner.
2020-10-09 23:28:42 -04:00
Eric S. Raymond
3612bc281e Deprecate BEGIN in favor of yybegin(). Worst of the awkward squad. 2020-10-09 19:06:32 -04:00
Eric S. Raymond
62a13a3a3d First step towards an interface independent of C macros.
This patch implements and documents a yyreject() macro to replace
argumentless REJECT.  It does not remove REJECT, but warns that this
macro will not be supported in non-C languages and deprecates it.

This commit begins a new appendix in the Flex manual, to list
deprecated interfaces and explain why they have been superseded.
2020-10-09 08:29:55 -04:00
Eric S. Raymond
1e9c36271e Do consistent optimal packing of arrays.
Flex has a strategy of packing its arrays with in32 or imt32 depending
on length, but it wasn't applied consistently. While I don't thinlk this
kind of space optimization matters a lot in 2020, if we're going to do it at
all we should do it thoroughly.
2020-10-08 16:03:03 -04:00
Eric S. Raymond
723fd2c3b4 Remove unused code.
Indepemdent of the retargeting changes.
2020-10-08 14:13:28 -04:00
Eric S. Raymond
700f9d3b6a Eliminate the c_like backend member.
No need for it, since the skel content is in core
and the relevant hook can be searched for.

This is a postscript to the retargeting series.  It's
not necessary, but it imoroves the code slightly.
2020-10-08 14:13:28 -04:00
Eric S. Raymond
8da857acc6 Update the documentation on writing a back end,
It will doubtless need expansion and revision when we actually
write one.

No diffs in generated test code.

#70 and last in the retargeting patch series
2020-10-07 19:11:19 -04:00
Eric S. Raymond
102c7a0149 Finish up macro abstraction.
Everything Flex ships to the skeleton-file expansion phase is
now either a macro expansion or a macro call.  This almost finishes
the retargeting patch series; the wrapup will be documentation.

Sadky, this does not get us *all* the way to target-syntax independence.
The probem is the inclusion of tables_shared.c when table serialization is
enabled. Which ,eans table serialization is not practical to support
outside the C/C++ back end.

No diffs in generated test code from this commit.

#69 in the retargeting patch series
2020-10-07 17:19:31 -04:00
Eric S. Raymond
de76933658 Macroize all the remaining code-generation methods.
Produces a tediously large diff in generated test code that
is all table moving around.  This is due to them being shipped
as macros and being substitured in a fixed order determined
by the calls in the skell file, rather than veing generated as
the functrions originally emitting the tables are called.

#68 in the retargeting patch series
2020-10-07 16:35:59 -04:00
Eric S. Raymond
34771445d5 Inline all yydmap entries, get rid of %tables-yydmap.
What it used to do is now handled entirely by macro conditionals.
Besides being a good complexoty reduction in itself, this is one
of the last steps in turning C backend methods into macro deliveries.

Order of the yydmap table is perturbed. No other non-whitespace
diffs and no logic changes,

#67 in the retargeting patch series
2020-10-07 06:28:09 -04:00
Eric S. Raymond
fd8748a311 Preparationn for macroizing the last eight methods.
This commit collects several minor changes:

* Fix a minor type specification bug in a tablesext initializer.

* macroize the trans_offset, mkctbl, and mkftbl methods.

* Fix a bug in footprint computation.

This commit oroduces no code diffs in the generated test code, but the
footprint reports change due to the bug fix.

#66 in the retargeting patch series
2020-10-06 14:25:56 -04:00
Eric S. Raymond
716cb63ff2 Tweak the indent style of tables with macroexpanded bodies...
...to have an indent style uniform with the rest of the code,
and one that makes it easier noy to miss the trailing table delimiters.

Not all tables are generated this way yet.  I'm working on it.

Is isolated in its own commit so the format change can't confuse a
reviewer's eyeballs out of noticing real mutations in the table data.

#65 in the retargeting patch series
2020-10-06 01:17:19 -04:00
Eric S. Raymond
96e004a296 Macroize the yy_meta, gen_yy_trans and start_state_list methods.
Also macro-generate yydmap entry for the yymeta table.

We're npw about 75% of the way through pushing all
C syntax out of the method table.
Permutes table order in the generated code.

#64 in the retargeting patch series
2020-10-05 21:02:51 -04:00
Eric S. Raymond
64cf032806 Eliminate a lurking cpp-ism from scan.l
Also, remove now-unused functions from buf.c.
And corral another global.

Produces no diffs in generated test code.

#63 in the retargeting patch series
2020-10-05 20:55:47 -04:00
Eric S. Raymond
fafc0ef10c Replace the comment method with a hook macro.
This required addin a new 0.0 breakpoint right after the
M4_HOOK_* definitions so they will be visible early.

Produces no diffs in generated test code.

#62 in the retargeting patch series
2020-10-05 09:28:02 -04:00
Eric S. Raymond
83d8bd5fb9 Enliminate several backend methods by shipping hook macros instead.
They were: geneol, fulltable, eecs, and debug.

To accomplish this, dataend's emission of trailing } needed to be
suppressable.

Also, remove a %% mark that is no longer required.

This doesn't change any of the generated tables, but does change the
orer in which they're generated, froducing large diffs in the
generated test code that don't actually mean anything.  The reason for
this is that tables used to come out in a variable order as functions
like geneecs were called ar variable times depending on the
compressuion mode.  Now, instead, the order is fixed by where the
tanle-body macros these functions define are expanded.

More methods remain to be turnerd into macro generators.

#61 in the retargeting patch series, following an unnumbered
bugfix patch that I shipped in too much of a hurry.
2020-10-05 05:12:49 -04:00
Eric S. Raymond
aea91d7e4b Reopair a buuld recipe bug introduced in #52...
...by an icatous replace operation.  Insidious because
it's undetectibe until you run configure again,

Two-phase build systems suck.
2020-10-04 00:17:17 -04:00
Eric S. Raymond
e5386ba368 Refactor so all skelout calls are in visible sequence in flex_main().
This makes the overall control flow easier to understand.

#59 in the retargeting patch series
2020-10-03 13:03:28 -04:00
Eric S. Raymond
6c1b4a95f9 Begin replacing method table entries with hook macros
Do this for table opener/closer/continuation syntax, the trace-format
string, the state entry string, constant definitions, the state-dyad
format, and the three pieces of EOF state syntax.  The documentation
appendix on how to write a back end is also updated.

There are comment diffs because I decided generating an
explicit fallthrough marker and some other new explanatory comments
was a good idea.

#58 in the retargeting patch series
2020-10-03 13:03:28 -04:00
Eric S. Raymond
ea32296019 As of this commit, all mode symbolds are finally visible.
All symbols except a handful dependent on nultrans and the number of
backups are now written in one visible group right at the start of m4
generation.  The exception are exceptions because their values
are not known until after DFA computation.

Has comment diffs in generated test code due to one symbol rename and
symbols beoming visible. Should be the last time the latter happens.

#57 in the retargeting patch series
2020-10-02 21:57:54 -04:00
Eric S. Raymond
0fc45ce97c Clean up various sporadic symbol definitions that weren't going through ctrl.
Includes handling of --nounistd, --always_interactive, --never_interactive, --stack,
their corresponding lexer items, and and noinput.

An unavoidable side effect is that the place where "#define
YY_NO_INPUT 1" is inserted, if it's inserted. has to move because it's
done by a different route - m4 expansion rather than the action_define
function (which is now gone - this was the last use). I have put the
new insertion point just iin time for the first reference to the macro.

Otherwise the only diffs in generarted test code are symbol
definitions becoming visible.

#56 in the retargeting patch series
2020-10-02 13:13:40 -04:00
Eric S. Raymond
7e77d8f475 Move almost all m4 symbol setting to one spot.
Formerly, Flex's own lexer and the logic for pocessing command-line
options both did calls to write M4 conditionals to a buffer that was
later dumped into the befinning of the text that m4 expands, before
the body of the skel file.

This was bad layering.  Instead, both these places now set flags in
the ctrl structure.  Later, (almost) all the generated m4 conditionals
are shipped at once.

It's "almost" because there are a couple of awkward cases to be
cleaned up.  Again, this was the part that could be done
simply via almost mechanical cut and paste.

In generated code, there are some comment diffs because symbols that
used to be invisibly set are now visibly set - that is, shown at the
beginning of the generated C.

#55 in the retargeting patch series
2020-10-02 05:55:36 -04:00
Eric S. Raymond
c5d0e408e2 Methodize a suffix computation and move headerfile into ctrl.
This cleans up some loose ends before the next big move.

#54 in the retargeting patch series. #53 slipped out unnumbered.
2020-10-02 04:55:37 -04:00
Eric S. Raymond
9dbc704ad4 yytext_is_array moves to the ctrl structure.
This is separate from the big reorganization in commit #52 because
there's a comment about this variable in flexdef.h that makes me
nervous.  According to the comment this variable is a trit, but
it looks to me like flexinit sets it to false and I can't find
anywhere in the code that sets it to a non-boolean value.

This commit asumes that the comment is stale and the member
can be typed boolean. Should be audited.
2020-10-02 00:17:31 -04:00
Eric S. Raymond
99e6b1c89a Impose some namespace control on the global variables.
As I was working on some layer separation.  I realized that I
was getting confused a lot by the huge pile of globals that
control this program.

In particular, I need to be able to clearly distinguish those that set
m4 conditional symbols from those that don't.  So I've done something
about it. Almost all globals that can be set by options are now
bundled into two context structures, "ctrl" for options that have
corresponding m4 symbols and "env" for options that don't.

The few I haven't moved have sufficiently tricky interdependencies
that I'm going to break out any changes related to them into smaller
patches that can be easier to review.  In this one I did only the bulk
of straightforward changes that could be done mechanically with search
and replace.

I changed one varuable name to reflext its senantics better;
the performance_report global is now env.performance_hint.

Ideally there ought to be a third structure that bundles all the
shared state used by DFA/NDFSA table computation, so all globals would
live in one of three context structures.  I may do that in a later
commit, but this patch is already unpleasantly large as it is.

No diffs in generated test code, nor any logic changes.

#52 in the retargeting patch series.
2020-10-01 17:40:06 -04:00
Eric S. Raymond
5c6661bb2e Eliminate all uses of buf_strdefine().
There were only two left, for YY_MAIN, and that definition
was moved so it's in the visible controls.

This is a step towards making *all* conditionalization symbols
viible in generated comments.

This commit also cleans up some misnamed mode symbols.  There are
still a couple of duplicative pairs, to be cleaned up in a later
commit.

We can now report generated M4 symbols with values in the "m4
controls" part of a generated file. Partly as a result, the following
symbols become visible in generare code from the tests:
M4_MODE_PREFIX, M4_YY_TABLES_VERIFY, M4_YY_REENTRANT, and
M4_MODE_PREFIX.

No other diffs.

#51 in the retargeting patch series. #50 was accidentally
unnumbered.
2020-10-01 01:34:07 -04:00
Eric S. Raymond
7877454e42 Eliminate the epilog member from the method table.
It was a no-op anyway in the C version, there as a placeholder
in case other languages needed it.  But in the new organization
of things, with everything being done by conditional expansion in
the skeleton file, there's no point.

No diffs at all in generated test code.

Thios does remove some cpde that was conditioned out, an abandoned
attempt to undefine all #defines at the end of code generation.
2020-09-30 21:18:34 -04:00
Eric S. Raymond
8c2d23d9a8 Eliminate the prolog method from the method table.
Now that all the mode conditionals are visible early, wverything that
used to be done in the prolog can be done as conditionalized code in the
skeleton.

Whitespace and comment diffs only.

#49 in the retargeting patch series
2020-09-30 20:53:57 -04:00
Eric S. Raymond
981867ff15 Land the footprint-reporting feature.
Also, clean up some unused and duplicative symbols.

In generated test code, comment and whitespace diffs only
except for YY_INT_ALIGNED going away.

#48 in the retargeting patch series
2020-09-30 08:37:39 -04:00
Eric S. Raymond
31c3f7703d Methodize section marker output and refactor initialization.
The point of this change is to move the setting of the M4_MODE_*
controls up to the front of the generated code so that they can be
used for conditionalization earlier, notably in replacing the prolog
method.  I tried to do this in #46 but dididn't move the mode
setting far enough up.

(Also, rename instances of a duplicated mode switch.)

In generated code, the m4 controls move but nothing else changes.

#47 in the retargeting patch series
2020-09-30 07:29:43 -04:00
Eric S. Raymond
abe2c1fe70 Transplant where the mode controls are shipped to earlier.
This should make it possible to eliminate much of the C-specific
prolog code.

Sadly, because of the moves of the generated comments this makes
a rather noisy diff.  All comments and whitespace, though;
what looks like being other than that is pieces of generated code
being shifteed around.

#46 in the retargeting patch series
2020-09-29 21:23:46 -04:00
Eric S. Raymond
a9e86a3299 Eliminate two backend methods in favor of m4 expansion.
Produces only whitespace diffs in generated code for tests, except the
order of items in the initializer for table serialization changes.

#45 in the retargeting patch series
2020-09-29 17:18:43 -04:00
Eric S. Raymond
4b952cbf80 Eliminate ugly %define-yytables magic in skelout().
This feature is better implemented with m4 macroexpansion;
that way skelout() does not have to know that #define is a thing.

Also in skelout(), use the backend comment method rather than
embedding knowledge about /* and */, and int_format_define
to factor out knowedge about #define.

Produces only comment diffs in the generated test code.

#44 in the retargeting patch series
2020-09-29 16:53:47 -04:00
Eric S. Raymond
34b84d5823 Narrow the driver interface.
This patch is a pure refactoring step.  It changes the
interface between gen.c and the back end so that the
method table can shed a number of methods and no headers
are generated in gen.c any more.

Most methods now return the amount of memory they
allocate.  Eventually this will be used to add
a report on this to the generated code.

No diffs in generated code, even without ignoring whitespace.

#43 in the retargeting patch series, which turned out
not to be finished after all. These is ugly magic in skelout()
that needs to be factored out.
2020-09-29 04:53:46 -04:00
Eric S. Raymond
103634e4ca Clean up the indentation and brace usage in the skeleton file.
It was in a mix of several different styles tht made it hard
to read.  I've massaged it all ibto K&R with tabs and mandatory
braces.  No logic changes.
2020-09-27 11:08:36 -04:00