public class LexHTML extends LexML
This class differs slightly from LexML as follows: after certain tags,
like the <script>
tag, the body that follows is
uninterpreted data and ends only at the next, in this case,
</script>
tag, not at the just the next
"<" or ">" character. This is one way that HTML is not fully
compliant with XML.
The default set of tags that have this special processing is
<script>
, <style>
, and
<xmp>
. The user can change this by retrieving
the Vector of special tags via
getClosingTags
, and modifying it as needed.
Constructor and Description |
---|
LexHTML(java.lang.String str)
Creates a new HTML parser, which can be used to iterate over the
tokens in the given string.
|
Modifier and Type | Method and Description |
---|---|
java.util.Vector |
getClosingTags()
Get the set of HTML tags that have the special body-processing
behavior mentioned above.
|
java.lang.String |
getTag()
Gets the tag name at the begining of the current tag.
|
boolean |
nextToken()
Advances to the next token, correctly handling HTML tags that have
the special body-processing behavior mentioned above.
|
void |
replace(java.lang.String str)
Changes the string that this LexHTML is parsing.
|
findClose, getArgs, getAttributes, getBody, getLocation, getString, getToken, getType, isSingleton, rest
public LexHTML(java.lang.String str)
str
- The HTML to parse.public java.util.Vector getClosingTags()
tags
- The array of case-insensitive tag names that are only
closed by seeing their "slashed" version.public boolean nextToken()
This method returns the uninterpreted data making up the body of a
special HTML tag as a token of type LexML.STRING
, even
if the body was actually a comment or another tag.
public java.lang.String getTag()
getTag
in class LexML
null
if the
current token does not have a tag name.LexML.getTag()
public void replace(java.lang.String str)
replace
in class LexML
str
- The string that this LexHTML should now parse.LexML.rest()