PocketLearn J2ME HTML Component

Accurate rendering of HTML on limited resource Java devices

 
       Home  Specification             Sample Renderings              Contact Us            Javadocs              Download  
 

Specification

Supported HTML Tags and Attributes

The following table contains all the HTML tags and their respective attributes that are supported by the component. The term "NO-OP" means that although the tag is in the list of supported tags, it is listed for completeness and requires no processing per se.

HTML Tag Supported Attributes Notes
<!-- --> None Structure tag… NO-OP
A Href Generates event
ABBR None  
ACRONYM None  
ADDRESS None  
B None  
BASEFONT Size,ColoFace  
BDO Dir (ltr/rtl)  
BIG none  
BLOCKQUOTE none  
BODY Bgcolor,Text,Background,Link,
Vlink,Alink
 
BR Clear (left/right/all)  
CAPTION Align (top/bottom/left/right)  
CENTER None  
CITE None  
CODE None  
DD None  
DEL None No special handling, just display text
DFN None  
DIR None  
DIV Align (right/left/center/justify)  
DL None  
DT None  
EM Non  
FONT Size,Color,Face  
H1 – H6 Align (left/center/right/justify)  
HEAD None Structure tag… NO-OP
HR Align (left/center/right),Size,
Width,Noshade
 
HTML None Structure tag… NO-OP
I None  
IMG Width,Height,Border,Vspace,Hspace,
Align,Src
Generates event
INS None No special handling, just display the text
KBD None  
LI Type (disk/square/circle/1/a/A/i/I),Value  
MENU None  
OL Type (1/a/A/i/I),Start  
P Align (left/center/right/justify)  
PRE None  
Q None  
S None Handle like <strike>
SAMP None  
SMALL None  
STRIKE None  
STRONG None  
SUB None  
SUP None  
TABLE Width,Height,Border,Cellspacing,
Cellpadding
 
TD Width,Height,Nowrap,Align (left/center/right/justify),Valign (top/middle/bottom/baseline)  
TH Width,Height,Nowrap,Align (left/center/right/justify),Valign (top/middle/bottom/baseline)  
TITLE None Structure tag… NO-OP
TR Align, Valign  
TT None  
U None  
UL Type (disk/circle/square)  
VAR None  

The J2ME HTML Renderer Component

The WebBrowser component is a displayable javax.microedition.lcdui.Canvas implementation that allows for the loading, parsing, layout, and display of standard HTML documents.

Loading

The ResourceListener Interface

To allow for customized, application-specific loading of HTML documents, the logic controlling the location of raw HTML resources is abstracted by the ResourceListener interface. This interface defines only one method:

InputStream loadResource(String uriOfHtmlDocument, Object info)

The sole purpose of a ResourceListener instance is to attempt to locate the requested HTML document, based on the URI provided, and return the document data as a readable stream. The format of this URI will be that requested from within the browser. For example, if the user activates an HTML link within a document, the "href" attribute of that link will be the target URI that is eventually passed to the ResourceListener’s implementation of loadResource. Similarly, if a direct navigitation request was passed to the browser (using the browser navigateTo method) the URI provided to the browser would be the URI provided to an attached ResourceListener implemetation. Every WebBrowser instance must have a single ResourceListener implementation attached to it, using the following method:

WebBrowser.setHtmlLoader(ResourceListener newLoader)

If no loader is provided for a WebBrowser, no HTML documents can be located or displayed on that browser instance.

The ImageResourceListener Interface

It is quite common for HTML documents to have external image content embedded within it, using the standard <img> tag or by specifying a background image. Once an HTML document is loaded by the browser, any related image data must be located as well. The responsibility of loading this data is removed from the WebBrowser class by the ImageResourceListener interface. This interface defines only one method:

Image loadImage(String uriOfImage, Object info)

In much the same fashion as the ResourceListener interface, this method must locate the document located target URI provided. However, in addition to simply locating the target document data, an ImageResourceListener implementation is also responsible for reading the located data and creating the appropriate javax.microedition.lcdui.Image instance it represents. If no Image can be located or appropriately parsed, the listener must return NULL, indicating failure. To allow for the loading of multiple image schemes and formats from within the same WebBrowser instance, any number of ImageResourceListener implementations can be attached to a browser. A loader instance can be bound to a browser using the following method:

WebBrowser.addImageResourceListener(ImageResourceListener p)

The order in which ImageResourceListeners are added to the browser determines the order in which each image loader is polled when an image is required. Consider the following code:

WebBrowser browser = …

browser.addImageResourceListener(new MyJpegLoader());

browser.addImageResourceListener(new MyGifLoader());

browser.addImageResourceListener(new DefaultPngLoader());

When the browser needs to load an image, the MyJpegLoader ImageResourceListener instance is queried with the target image URI. If it cannot load the requested image (i.e. it returns NULL), the next image loader, the MyGifLoader instance, is queried, and so on. If none of the image loaders registered with a browser can locate an Image object for the given URI, the browser will be unable to display the target image.

The NavigationListener Interface

Externally, a WebBrowser object provides the ability to request that a specific HTML document be loaded, via the method:

WebBrowser.navigateTo(String uriOfHtmlDocument)

However, this does not address any internal requests for HTML documents that may be generated, for example, when a user activates a link. It would be limiting to expect the browser itself to handle such internal requests. Instead, the WebBrowser object looks at every internal request it receives very carefully. If the browser decides the request is for an internal document location (i.e. #topOfPage) the browser will handle the event internally and the containing application will not be notified. In any other case, however, the requested link "href" URI is broadcast to any navigation listeners attached to the browser. NavigationListeners are bound to a WebBrowser instance using the following method:

WebBrowser.addNavigationListener(NavigationListener listener)

The NavigationListener interface defines one method:

void navigateTo(String uriOfRequest, Object src, Object info)

This interface allows the containing application to determine which document requests it wants to process, and which it wants to ignore. In general, a NavigationListener implementation waits for a document request, determines whether or not to allow it and, if so, performs any required URI formatted before calling the WebBrowser navigateTo method with the target URI.

Parsing

The HtmlDocumentProvider Class

In addition to the above loading-related interfaces, the WebBrowser also delegates all HTML parsing duties to another component – the HtmlDocumentProvider. The HtmlDocumentProvider depends heavily on the WebBrowser component itself for all loading abilities, and on the HtmlParser class for all text parsing. As such, it acts primarily as a translator between the browser and the parser. The default implementation HtmlDocumentProvider has three main purposes:

  • Use the ResourceListener implementation of the parent browser to load the raw HTML data stream associated with a requested URI.
  • Pass the loaded raw HTML data off to an HTML parser, and use the parsed document to create a displayable IHtmlDocument instance.
  • Use the browser’s ImageResourceListener implementations to locate any embedded image content.

In short, the HtmlDocumentProvider accepts a requested URI from the browser, and in turn provides an IHtmlDocument instance, ready for layout and display.

The HtmlParser Class

The HtmlParser class is responsible for tokenizing an input stream of raw input data and notifying a given document handler of any parse events. In essence, it is a simple SAX-like parser for HTML, instead of XML. The class exposes a single static method used to parse a text stream, and a public child interface, HtmlParser.DocumentHandler. As the raw document HTML data is being parsed, events are broadcast to the registered HtmlParser.DocumentHandler, alerting it to the change in state. The parse events currently supported by the parser are:

  • startDocument
    Triggered before the beginning tag of a new HTML document
  • startElement
    Triggered when a new tag element is parsed. The name of the tag and its attributes are made available (i.e.
    <a href="#top">)
  • endElement
    Triggered when the current tag element ends (i.e.
    </a>)
  • entity
    Triggered when an entity is found in the HTML. The entity code is made available.
  • text
    Triggered when plain text is available within an element
  • endDocument
    Triggered after the end tag of an HTML document

Currently, the HtmlDocumentProvider class implements the HtmlParser.DocumentHandler interface and serves as the default handler. As such, an HtmlDocumentProvider instance reacts to the parse events exposed from the HtmlParser.parse method and handles creating the appropriate HTML elements in the proper order.

Layout and Rendering

Non-frames based HTML documents, by definition, consist of a single body container. This container will contain other elements, which may be containers themselves, and contain further elements. In this manner, a container tree is created based at the HTML body root, defining the structure of an HTML document. This structure makes it easy to traverse a HTML document in a top-down manner. This structure is relied heavily upon by the layout and rendering operations of an IHtmlDocument.

The IHtmlElement Interface

The base building block of an HTML document is the element. All visible and non-visible elements contained within a document make up the document content and structure. In general, any recognized tag within the HTML stream is interpreted as an element. A complete HTML document consists of a collection of these elements. The IHtmlInterface exposes one property, a CssStyle collection.

The IHtmlContainer Interface

As indicated above, in order to represent the structure of a HTML document certain elements must also be able to contain other HTML elements. The IHtmlContainer interface, which is itself an IHtmlElement, defines the structure and method of any HTML element that may hold other elements. This includes the HTML body element, for example, which contains every other element within the document. The IHtmlContainer interface defines vertical and horizontal alignment properties, in addition to the basic container methods.

The IHtmlDisplayable Interface

Some HTML elements will have a visible representation, and some will not. For example, the ‘META’ and ‘EM’ elements have no visible representation in a HTML document, whereas the ‘DIV’ and ‘A’ elements do have a visible representation. The IHtmlDisplayable interface provides elements that wish to be displayed within a document with the means to publish layout information and complete rendering. The two main methods of this interface are:

int layout(int availableWidth, HtmlLayoutContext context)

int render(int availableWidth, HtmlRenderContext context)

These methods, when combined with the inherent container-based structure of an HTML document, allow layout and rendering of an entire document to happen. The layout method returns how much vertical and horizontal space is required to display the element, while the render method requests drawing of the element and returns how much horizontal space was actually used when rendering. The combination of these two methods identifies the two main functions of an HTML document – the one-time calculation of the required layout parameters for each contained element, and the repeated display of elements based on their layout criteria.

Layout and Render Contexts

The operations of layout and rendering are top-down, from the root HTML body container to the various leaf nodes. The circumstances surrounding the layout of the body container may not be the same as those for a paragraph tag nested within a table. For example, the horizontal width available for layout or rendering will vary depending on what elements have been laid out or rendered previously. At each step down the layout/render tree, the context of the current operation (layout or rendering) changes to reflect the requirements of the current container. This variable information needs to be made available to child elements when doing layout or display so that the elements know the constraints they must obey. For example, when doing layout within a paragraph, the current margins, padding, and borders of the paragraph must be removed from the available horizontal width to ensure that contained elements do no overlap the paragraph boundaries. As noted above, the layout and render operations of the IHtmlDisplayable interface require a parameter of type HtmlLayoutContext and HtmlRenderContext respectively. The HtmlLayoutContext class wrappers the state of layout/parsing at any given time within a specific container. It defines the current text font size and style, the foreground color, the current available horizontal width available, the total document width available, and various other state flags the element may require. The HtmlRenderContext, which extends the layout context, defines two additional properties - the current render line height and the javax.microedition.lcdui.Graphics object onto which elements must be rendered. As each container down the element tree is operated on, the current layout/render context is modified accordingly.

The IHtmlContextElement Interface

While many elements within an HTML document are visible (and thus implement IHtmlDisplayable), others are not. However, these elements may still have an affect on the current document context. For example, the HTML bold text tag (‘B’) has no visible document representation itself, but does indicate that any text that follows it must be rendered in bold type. The IHtmlContextElement provides this and similar elements with the means to affect the current HTML layout or render context without having to be displayed themselves. The interface defines only one method:

void updateContext(HtmlLayoutContext currentContext)

This method allows the element to change the current document layout/render context, which will affect all elements operated on from that point on. During rendering or layout, when an IHtmlContextElement instance is located its updateContext method is called, thus giving it the chance to make the appropriate changes. For example, the bold tag mentioned above could alter the context provided to the updateContext method to make the font style bold, which an end bold tag could remove any context font style. Most of the HTML text decoration elements (i.e. B, I, EM, STRONG, TT, etc.) fall under this category of element.

Layout Algorithm Overview

Layout is completed via a top-down parsing of the HTML document tree. The initial HtmlLayoutContext is generated, containing the default document font, width available on the device display, etc. Layout is then performed on the root HTML body container, given the initial context parameters. The initial layout context is modified to account for any padding, margin, etc, and layout is passed on to any child containers directly held within the body using this modified context. As each child element is laid out, the space available within the current container is modified accordingly. This behaviour continues recursively until every element held within every container within the HTML document has determined the size required for its display.

Basic HTML Containers – Box vs. Flow

An HTML document may contain only two types of container elements, box containers and flow containers.

    • Box containers
      Always occupy a rectangular area, regardless of contents, so layout and rendering are easy
    • Flow containers
      Occupy the minimum linear space required to display their contained elements relative to their own containers, and are harder to layout and render because of their irregular shape.

All elements within a HTML document are contained within the top-level body container. The body container is a box container implementation, as the display of an HTML document will always be represented within the entire rectangular device display. A paragraph, div, and table are also box implementations, as they all require a regular rectangular area for display. A span or link anchor element, however, can contain any number of elements that can be displayed in a linear fashion.

The two base implementations of these concepts, DisplayableHtmlBoxContainer and DisplayableHtmlFlowContainer, make up the bulk of the layout and rendering code for the web browser. The DisplayableHtmlBoxContainer handles layout and rendering of its child elements completely on its own, since all layout calculations for a box element start at location (0,0) within the box container. This makes the layout of box containers straightforward, as the starting and ending positions of each child element are within the dimensions of the box. The DisplayableHtmlFlowContainer, however, could be located at some unknown (x,y) location within its parent container at the time of layout. This makes it impossible for the flow container alone to handle layout or rendering of its child elements. To overcome this, flow-based containers relay all layout operations back to their parent container. Thus, layout calls for flow containers actually navigate backwards up the render tree until a parent box container is located that can adequately address the element layout request, based on a known element location. The result of this is that the only container layout algorithm that actually does any calculation is that of the DisplayableHtmlBoxContainer. This also means that DisplayableHtmlBoxContainers have much more work to do, and are slower to layout and render because of this.

Layout and Rendering using the HtmlRenderQueue

To increase the browser responsiveness and decrease the required application calculation, layout information for each of the base container implementations is cached after the initial layout. This caching involves building an ordered list of elements as they need to be displayed, grouped into rows (or lines). Specifically, all elements within a container that can be reasonably displayed within the available container context width are added to the same line of the container HtmlRenderQueue. An HtmlRenderQueue instance is essentially a list of ordered element lists (a vector of vectors). The first line in the queue will contain all the container child displayable elements that can be shown on the first container line, the second line will contain all the elements that can be displayed on the second container line (after a line break), and so on. When the layout method is called on a container element, this queue of renderable lines is used to store the following information:

  • the width and height of each line
  • the displayable HTML elements within each line
  • the absolute X,Y coordinates of the origin for each line, relative to the parent document display

Later, when the render method is called to display the container and all its elements, a simple iteration over each of the queued lines in turn allows for the rapid display of the child elements.

The WebBrowser Document Request Cycle

A complete example of the lifecycle of a HTML document will help to illuminate the methods being used to run the browser component. A WebBrowser, when initialized, has no HTML document available for display. Let the following HTML document, located at "/index.html", serve as the example file to be loaded.

<html>

<body bgcolor="red">

<h1>Hello World</h1>

<p width="75%" style="padding: 5px;">

This is an example <em>HTML</em> page!

</p>

Some additional&nbsp;

trailing text.

</body>

</html>

A request for a HTML document must be initiated by calling the browser’s navigateTo("/index.html") method. This sets in motion the load-parse-layout-display processes within the browser.

  1. When navigateTo is called, the browser asks its contained HtmlDocumentProvider to locate the target "/index.html" HTML document.
  2. The HtmlDocumentProvider asks the browser’s established ResourceListener instance to load the "/index.html" HTML file.
  3. If located, the InputStream for the raw HTML data is passed off to the HtmlParser.parse method, requesting that the HtmlDocumentProvider itself be notified of parse events
  4. As the document stream is parsed, the HtmlDocumentProvider handles creating a new IHtmlDocument instance. With each startElement event, for example, a new IHtmlELement is created. The exact sequence of parse events is as follows:
      • startDocument (<html>) – a new IHtmlDocument is created.
      • startElement (<body bgcolor="red">) – the document body tag is initialized, and the background attribute is parsed. It becomes the current container.
      • startElement (<h1>) – a new heading text container is created, it is added to the body, and becomes the current container.
      • text (Hello World) – a new displayable text element is created and added to the current container, the heading element.
      • endElement (</h1>) – the heading tag is popped, and the body again becomes the current container.
      • startElement (<p width="75%" style="padding: 5px;">) – a new paragraph box container is created, it is added to the body, and becomes the current container. The style associated with the paragraph is parsed and applied.
      • text (This is an example ) – a new displayable text element is created and added to the current container, the paragraph element.
      • startElement (<em>)a new font-altering context element is created, and added to the paragraph.
      • text (HTML) – a new displayable text element is created and added to the current container, the paragraph element.
      • endElement (<em>)a new font-restoring context element is created, and added to the paragraph.
      • text ( page!) – a new displayable text element is created and added to the current container, the paragraph element.
      • endElement (</p>) – the paragraph tag is popped, and the body again becomes the current container.
      • text (Some additional) – a new displayable text element is created and added to the current container, the body container.
      • entity (&nbsp;) - the non-breaking space entity is detected. A new HTML whitespace is added to the body container.
      • text (trailing text.) – a new displayable text element is created and added to the current container, the body container.
      • endElement (</body>) – the body tag is ended
      • endDocument (</html>) – the document is complete, and parsing finishes.
  1. Layout is initiated for the document, given an initial layout context. The document consists of a heading container element, a paragraph container element, a text element, a non-breaking space element, and a text element. A layout queue is created for the HTML body storing the elements in the order they will need to be displayed. Queues are also defined for the nested paragraph and heading container tags. Once layout is complete, it is not recalculated.
  2. Rendering of the document occurs, given an initial layout context, recursively from the body down through the heading and paragraph containers, and finally for the inline text and whitespace elements. Elements are rendered based on the queue positions and information. Rendering occurs whenever the component needs to be redisplayed.

 

CSS Supported Properties

The following CSS attributes and range values are currently supported. A good reference for the exact meaning of these can be found at http://devguru.com/technologies/css2/index.asp.

width / height

  • Supports percentage and absolute pixel values
  • Examples: 12%, 145px, 65 (assumed to be pixels)

color / background-color

  • Supports the 16 basic pre-defined color names
  • Supports hexadecimal notation #RRGGBB
  • Supports the CSS form "rgb(RRR,GGG,BBB)"

padding / margin

  • Supports the 1, 2, 3, and 4 value versions
  • Examples: padding:1px; (all sides) padding:1px 3px; (1 on top and bottom, 3 on left and right)
  • Also supports side modifiers, i.e. padding-left, margin-top, etc.

text-align

  • Supports values "center", "left", "right"

text-decoration

  • Supports "none" and "underline"

font-style

  • Supports "normal" and "italic"

font-weight

  • Supports "normal", "lighter", "bold", "bolder", and integer values from 100 to 900

font-family

  • Supports "monospace" and "courier"

font-size

  • Supports "smaller", "small", "medium", "large", "larger" and any pixel value

vertical-align

  • Supports values "top", "middle", "bottom"

border

  • Supports only two styles, "none" and "solid"
  • Size can be any pixel value, or one of "thin", "medium", or "thick"
  • Color restrictions are the same as above for color / background-color
  • All variants supported, i.e. border-left-style, border, border-top, etc.

 


Copyright © 2006 - PocketLearn Inc.  All Rights Reserved
Java is a trademark of Sun Microsystems