The Pipeline: Transforming HTML

The Sandbox has full support for client-side processing of well-structured HTML. This is necessary to support the DOM HTML-based APIs (e.g., innerHTML, document.write, etc). However, parsing the HTML string is an expensive operation. To provide better performance, an transformation is performed on the original HTML to JSON to allow faster processing.

The HTML transformation process generates a hierarchical JSON structure representing the original HTML and attributes.

Elements are transformed to:

  { 
    "elementName" :
    {
       // Attributes
       "a" : {"attributeName" : "attributeValue", ...}
       // Children consists of 0-n content strings and 0-n child elements
       "c" : ["content", element]
    }
  }

Extracting JavaScript

JavaScript has special processing. When an inline event property is parsed, the string-based script is extracted and wrapped in a standard function. This is necessary because the Web Sandbox does not support late-bound evaluating and executing of arbitrary code except for JSON objects. Below illustrates how an onclick handler is extracted:

<div onclick='alert("You Clicked Me")'>Click Here</div>
is transformed to:
  {
    "div" :
    {
      "a" : "onclick" : function() {alert("You Clicked Me")},
      "c" : ["Click Here"]
    }
  }

Script blocks are transformed differently. When a script block is encountered, it is identified and then extracted into the Gadgets meta-data:

<script>
  alert("HI")
</script>
is transformed in the HTML stream to:
  {
    "script" : "code1"
  }

The script itself is encapsulated and transformed separately in the meta-data for the gadget and associated with the identifier (code1). This architecture where each block can be managed independently is how we will enable dynamic code loading.

Special Parsing Considerations

Unfortunately tree generation is not a stable process across the different browsers. For example, the type attribute of input elements as well as various properties must be specified before the element is appended to the tree. Tables via DOM generation require a surrounding TBODY element. In some cases, we protect against these during the transformation (e.g., injecting TBODY) and in others we handle them during the parsing on the client. We will provide documentation on the full set of parsing rules.