[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: about lr(1) parsers
From: |
Hans Aberg |
Subject: |
Re: about lr(1) parsers |
Date: |
Wed, 2 Aug 2006 19:44:06 +0200 |
On 2 Aug 2006, at 11:50, chinlu chinawa wrote:
Keywords, or hardwired names, are normally added
in the lexer, which may be generated by Flex. If
you want to have constructs such as 'define
<name>...', one way to do it is to make a lookup
table, where <name> is entered along with
syntactic and semantic data. When the lexer
finds a name, it checks the lookup table, and if
found, returns the correct token and passes the
semantic data to the parser.
Hello Hans,
Am, what you mean with semantic data?
With respect to the .y file, syntactic data is the grammar, and the
semantic data is what you put into the actions, plus the other
programming.
I've used your example, as I saw a simple cpp would be
a godd starting point.
However, I'm far from being able to do things like
this, if it's what you mean with semantic data:
#define max ((x)>(y)?(x):(y))
As it would require expansions and well, and a lot of
things I cannot do yet. Instead I'm using simple
constructions such like:
#define some thing
You have a .y grammar rule that recognizes the '#define' token, plus
the syntax of the definition. Then its action should put the
definition name, 'max' (resp. 'some') onto the lookup table, plus its
definition, '((x)>(y)?(x):(y))' (resp. 'thing'), in some suitable
form. In addition, if you have more than one token type to be
defined, that should be also on the lookup table.
Then the lexer (.l file) has a rule
identifier {...}
that recognizes identifiers like 'max' (resp. 'some'). It then checks
whether the name 'max' (resp. 'some') is present onto the table. If
it is not, it will just return a %token identifier.
If the name is present on the lookup table, it will use the
information there for the token return: token type plus the semantic
value of the token '((x)>(y)?(x):(y))' (resp. 'thing').
So far, I've started to understad finite automatas,
have implemente one, together with linked lists rather
than perfect hashing and lookup talbes, that allows me
to start playing around #define and #ifdef more easily
Indeed, since I started to understand finite automata,
same has happend with lr(1) parsers, and I've got two
questions today, hope you can help.
A LR parser uses what is called a push-down automaton, which in
addition has a stack of values.
In one hand, I don't seem to fully understand what a
finite automata should output, and how this output
should be feed to the parser.
If you use a program like Flex, the lexer it generates feed the
parser that Bison generates with token numbers. You define these
token numbers when putting %token ... in the .y file.
As you said, I'm using hardcoded keywords, and can
recognize for example when a #define, #ifdef or just a
comment is coming over, though none of them looks like
a good example for this question.
Supossing that I have this:
sum 1+1
And that I recognize `sum' with the finite automata,
what should I pass to the parser (appart from 1+1 I
guess)?
If 'sum' is a keyword, i.e., hardcoded, then you have in the .y file say
%token SUM "sum"
%token PLUS "+"
...
%%
summation:
"sum" NUMBER "+" NUMBER { $$ = $2 + $4; } /* Or: SUM, PLUS */
...
The lexer (.l file) the have rules like
"sum" { return SUM; }
"+" { return PLUS; }
If you want to have a loop, then you would need to build a "closure",
i.e., some code that can be executed after the parsing has been taken
place.
I though feeding a parser with `1+1' would output the
result of that operation, though if you look this:
http://en.wikipedia.org/wiki/LR_parser
At the end of the example part, it says:
"The rule numbers that will then have been written to
the output stream will be [ 5, 3, 5, 2 ] which is
indeed a rightmost derivation of the string "1 + 1" in
reverse."
And my question to this regard is, what I'm supossed
to do with that `[ 5, 3, 5, 2 ]'?
You do not really have to worry about that, if you only write your
rule actions semantically correct. I.e., you only need to worry about
how $$ should be computed from the $k values.
Hans Aberg