org.cyberneko.html.filters

Class Purifier

public class Purifier extends DefaultFilter

This filter purifies the HTML input to ensure XML well-formedness. The purification process includes:

Illegal characters in XML names are converted to the character sequence "_u####_" where "####" is the value of the Unicode character represented in hexadecimal. Whereas illegal characters appearing in document content is converted to the character sequence "\\u####".

In comments, the character '-' is replaced by the character sequence "- " to prevent "--" from ever appearing in the comment content. For CDATA sections, the character ']' is replaced by the character sequence "] " to prevent "]]" from appearing.

The URI used for synthesized namespace bindings is "http://cyberneko.org/html/ns/synthesized/number" where number is generated to ensure uniqueness.

Version: $Id: Purifier.java,v 1.5 2005/02/14 03:56:54 andyc Exp $

Author: Andy Clark

Field Summary
protected static StringAUGMENTATIONS
Include infoset augmentations.
protected booleanfAugmentations
Augmentations.
protected booleanfInCDATASection
True if inside a CDATA section.
protected NamespaceContextfNamespaceContext
Namespace information.
protected booleanfNamespaces
Namespaces.
protected StringfPublicId
Public identifier of doctype declaration.
protected booleanfSeenDoctype
True if the doctype declaration was seen.
protected booleanfSeenRootElement
True if root element was seen.
protected intfSynthesizedNamespaceCount
Synthesized namespace binding count.
protected StringfSystemId
System identifier of doctype declaration.
protected static StringNAMESPACES
Namespaces.
protected static HTMLEventInfoSYNTHESIZED_ITEM
Synthesized event info item.
static StringSYNTHESIZED_NAMESPACE_PREFX
Synthesized namespace binding prefix.
Method Summary
voidcharacters(XMLString text, Augmentations augs)
Characters.
voidcomment(XMLString text, Augmentations augs)
Comment.
voiddoctypeDecl(String root, String pubid, String sysid, Augmentations augs)
Doctype declaration.
voidemptyElement(QName element, XMLAttributes attrs, Augmentations augs)
Empty element.
voidendCDATA(Augmentations augs)
End CDATA section.
voidendElement(QName element, Augmentations augs)
End element.
protected voidhandleStartDocument()
Handle start document.
protected voidhandleStartElement(QName element, XMLAttributes attrs)
Handle start element.
voidprocessingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.
protected StringpurifyName(String name, boolean localpart)
Purify name.
protected QNamepurifyQName(QName qname)
Purify qualified name.
protected XMLStringpurifyText(XMLString text)
Purify content.
voidreset(XMLComponentManager manager)
voidstartCDATA(Augmentations augs)
Start CDATA section.
voidstartDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.
voidstartDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.
voidstartElement(QName element, XMLAttributes attrs, Augmentations augs)
Start element.
protected voidsynthesizeBinding(XMLAttributes attrs, String ns)
Synthesize namespace binding.
protected AugmentationssynthesizedAugs()
Returns an augmentations object with a synthesized item added.
protected static StringtoHexString(int c, int padlen)
Returns a padded hexadecimal string for the given value.
voidxmlDecl(String version, String encoding, String standalone, Augmentations augs)
XML declaration.

Field Detail

AUGMENTATIONS

protected static final String AUGMENTATIONS
Include infoset augmentations.

fAugmentations

protected boolean fAugmentations
Augmentations.

fInCDATASection

protected boolean fInCDATASection
True if inside a CDATA section.

fNamespaceContext

protected NamespaceContext fNamespaceContext
Namespace information.

fNamespaces

protected boolean fNamespaces
Namespaces.

fPublicId

protected String fPublicId
Public identifier of doctype declaration.

fSeenDoctype

protected boolean fSeenDoctype
True if the doctype declaration was seen.

fSeenRootElement

protected boolean fSeenRootElement
True if root element was seen.

fSynthesizedNamespaceCount

protected int fSynthesizedNamespaceCount
Synthesized namespace binding count.

fSystemId

protected String fSystemId
System identifier of doctype declaration.

NAMESPACES

protected static final String NAMESPACES
Namespaces.

SYNTHESIZED_ITEM

protected static final HTMLEventInfo SYNTHESIZED_ITEM
Synthesized event info item.

SYNTHESIZED_NAMESPACE_PREFX

public static final String SYNTHESIZED_NAMESPACE_PREFX
Synthesized namespace binding prefix.

Method Detail

characters

public void characters(XMLString text, Augmentations augs)
Characters.

comment

public void comment(XMLString text, Augmentations augs)
Comment.

doctypeDecl

public void doctypeDecl(String root, String pubid, String sysid, Augmentations augs)
Doctype declaration.

emptyElement

public void emptyElement(QName element, XMLAttributes attrs, Augmentations augs)
Empty element.

endCDATA

public void endCDATA(Augmentations augs)
End CDATA section.

endElement

public void endElement(QName element, Augmentations augs)
End element.

handleStartDocument

protected void handleStartDocument()
Handle start document.

handleStartElement

protected void handleStartElement(QName element, XMLAttributes attrs)
Handle start element.

processingInstruction

public void processingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.

purifyName

protected String purifyName(String name, boolean localpart)
Purify name.

purifyQName

protected QName purifyQName(QName qname)
Purify qualified name.

purifyText

protected XMLString purifyText(XMLString text)
Purify content.

reset

public void reset(XMLComponentManager manager)

startCDATA

public void startCDATA(Augmentations augs)
Start CDATA section.

startDocument

public void startDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.

startDocument

public void startDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.

startElement

public void startElement(QName element, XMLAttributes attrs, Augmentations augs)
Start element.

synthesizeBinding

protected void synthesizeBinding(XMLAttributes attrs, String ns)
Synthesize namespace binding.

synthesizedAugs

protected final Augmentations synthesizedAugs()
Returns an augmentations object with a synthesized item added.

toHexString

protected static String toHexString(int c, int padlen)
Returns a padded hexadecimal string for the given value.

xmlDecl

public void xmlDecl(String version, String encoding, String standalone, Augmentations augs)
XML declaration.
(C) Copyright 2002-2005, Andy Clark. All rights reserved.