java_cup
Class lexer
public
class
lexer
extends Object
This class implements a small scanner (aka lexical analyzer or lexer) for
the JavaCup specification. This scanner reads characters from standard
input (System.in) and returns integers corresponding to the terminal
number of the next Symbol. Once end of input is reached the EOF Symbol is
returned on every subsequent call.
Symbols currently returned include:
Symbol Constant Returned Symbol Constant Returned
------ ----------------- ------ -----------------
"package" PACKAGE "import" IMPORT
"code" CODE "action" ACTION
"parser" PARSER "terminal" TERMINAL
"non" NON "init" INIT
"scan" SCAN "with" WITH
"start" START "precedence" PRECEDENCE
"left" LEFT "right" RIGHT
"nonassoc" NONASSOC "%prec PRECENT_PREC
[ LBRACK ] RBRACK
; SEMI
, COMMA * STAR
. DOT : COLON
::= COLON_COLON_EQUALS | BAR
identifier ID {:...:} CODE_STRING
"nonterminal" NONTERMINAL
All symbol constants are defined in sym.java which is generated by
JavaCup from parser.cup.
In addition to the scanner proper (called first via init() then with
next_token() to get each Symbol) this class provides simple error and
warning routines and keeps a count of errors and warnings that is
publicly accessible.
This class is "static" (i.e., it has only static members and methods).
Version: last updated: 7/3/96
Author: Frank Flannery
Field Summary |
protected static int | absolute_position Character position in current line. |
protected static Hashtable | char_symbols Table of single character symbols. |
protected static int | current_line Current line number for use in error messages. |
protected static int | current_position Character position in current line. |
static int | error_count Count of total errors detected so far. |
protected static int | EOF_CHAR EOF constant. |
protected static Hashtable | keywords Table of keywords. |
protected static int | next_char First character of lookahead. |
protected static int | next_char2 Second character of lookahead. |
protected static int | next_char3 Second character of lookahead. |
protected static int | next_char4 Second character of lookahead. |
static int | warning_count Count of warnings issued so far |
Method Summary |
protected static void | advance() Advance the scanner one character in the input stream. |
static Symbol | debug_next_token() Debugging version of next_token(). |
protected static Symbol | do_code_string() Swallow up a code string. |
protected static Symbol | do_id() Process an identifier. |
static void | emit_error(String message) Emit an error message. |
static void | emit_warn(String message) Emit a warning message. |
protected static int | find_single_char(int ch) Try to look up a single character symbol, returns -1 for not found. |
protected static boolean | id_char(int ch) Determine if a character is ok for the middle of an id. |
protected static boolean | id_start_char(int ch) Determine if a character is ok to start an id. |
static void | init() Initialize the scanner. |
static Symbol | next_token() Return one Symbol. |
protected static Symbol | real_next_token() The actual routine to return one Symbol. |
protected static void | swallow_comment() Handle swallowing up a comment. |
protected static int absolute_position
Character position in current line.
protected static Hashtable char_symbols
Table of single character symbols. For ease of implementation, we
store all unambiguous single character Symbols in this table of Integer
objects keyed by Integer objects with the numerical value of the
appropriate char (currently Character objects have a bug which precludes
their use in tables).
protected static int current_line
Current line number for use in error messages.
protected static int current_position
Character position in current line.
public static int error_count
Count of total errors detected so far.
protected static final int EOF_CHAR
EOF constant.
protected static Hashtable keywords
Table of keywords. Keywords are initially treated as identifiers.
Just before they are returned we look them up in this table to see if
they match one of the keywords. The string of the name is the key here,
which indexes Integer objects holding the symbol number.
protected static int next_char
First character of lookahead.
protected static int next_char2
Second character of lookahead.
protected static int next_char3
Second character of lookahead.
protected static int next_char4
Second character of lookahead.
public static int warning_count
Count of warnings issued so far
protected static void advance()
Advance the scanner one character in the input stream. This moves
next_char2 to next_char and then reads a new next_char2.
public static
Symbol debug_next_token()
Debugging version of next_token(). This routine calls the real scanning
routine, prints a message on System.out indicating what the Symbol is,
then returns it.
protected static
Symbol do_code_string()
Swallow up a code string. Code strings begin with "{:" and include
all characters up to the first occurrence of ":}" (there is no way to
include ":}" inside a code string). The routine returns a String
object suitable for return by the scanner.
protected static
Symbol do_id()
Process an identifier. Identifiers begin with a letter, underscore,
or dollar sign, which is followed by zero or more letters, numbers,
underscores or dollar signs. This routine returns a String suitable
for return by the scanner.
public static void emit_error(String message)
Emit an error message. The message will be marked with both the
current line number and the position in the line. Error messages
are printed on standard error (System.err).
Parameters: message the message to print.
public static void emit_warn(String message)
Emit a warning message. The message will be marked with both the
current line number and the position in the line. Messages are
printed on standard error (System.err).
Parameters: message the message to print.
protected static int find_single_char(int ch)
Try to look up a single character symbol, returns -1 for not found.
Parameters: ch the character in question.
protected static boolean id_char(int ch)
Determine if a character is ok for the middle of an id.
Parameters: ch the character in question.
protected static boolean id_start_char(int ch)
Determine if a character is ok to start an id.
Parameters: ch the character in question.
public static void init()
Initialize the scanner. This sets up the keywords and char_symbols
tables and reads the first two characters of lookahead.
public static
Symbol next_token()
Return one Symbol. This is the main external interface to the scanner.
It consumes sufficient characters to determine the next input Symbol
and returns it. To help with debugging, this routine actually calls
real_next_token() which does the work. If you need to debug the
parser, this can be changed to call debug_next_token() which prints
a debugging message before returning the Symbol.
protected static
Symbol real_next_token()
The actual routine to return one Symbol. This is normally called from
next_token(), but for debugging purposes can be called indirectly from
debug_next_token().
protected static void swallow_comment()
Handle swallowing up a comment. Both old style C and new style C++
comments are handled.