Web Sandbox Architecture: Pipeline
The Web Sandbox's Pipeline translates all of the required source references of a web component (gadget) into a safe version that can be executed by the client side Web Sandbox's virtual machine. An additional set of Pipeline features allows for:
- Caching of referenced files within the source of a Gadget,
- Stacking of translation layers above the Pipeline.
The WebSandbox's Transformation Pipeline follows a general asynchronous execution pattern depicted in the diagram below:
STEP 1: Fetching Source
This optional Pipeline step is performed if there are no layers above the “Process HTML Phase 1” that pass the raw HTML directly into the pipeline. This step simply fetches the resource specified through a Url and passes the returned Data as the raw HTML to the next step in the Pipeline. The Url will be used as the basis for the default Base Href in the next step of the Pipeline.
STEP 2: Process HTML Phase 1
This step is the entry point in scenarios where another source is converted or translated to HTML before it is passed to the pipeline. The parameters that are passed to this step in the pipeline are, the raw HTML, a Key value (a unique identifier of the Pipeline instance currently executed), Url from where the HTML originated (this can be left null if a Base Href is included in the raw HTML), and the actual implementation of a standard interface (IFetch) used to get all the external references. This step is broken down into few sub-steps:
- Analyzing the complete source
- In this step every HTML node is visited. If the node references an external source then it is fetched via IFetch. Certain tags such as Table are analyzed to make sure they have <tbody>, <tfooter>, and/or <theader> nodes as its sub nodes, otherwise they are wrapped by a <tbody>. Other HTML elements that have relative links like image and anchor have their relative links translated to their fully qualified versions.
- Metadata and manifest information is extracted from the source.
- Fetched resources are inlined
-
In this phase all fetched source are put back in there original location but there external reference is now replaced with the inlined version. E.g.
<script type="text/javascript" src=”test.js”></script>
turns into
<script> type="text/javascript">test.js source is here</script>.
-
In this phase all fetched source are put back in there original location but there external reference is now replaced with the inlined version. E.g.
STEP 3: Process HTML Phase 2
In this WebSandbox Pipeline step, the HTML Header is re-processed and the Body of the document extracted:
- Everything above the <body> tag is reprocessed:
- All of the CSS is extracted
- The JavaScript within the <header>/</header> tags is condensed into one big chunk but order is still preserved.
- The outer HTML of the body tags is extracted.
Note: All scripts within the body are left in place for later processing to conform to the current processing standards common to all the browsers.
STEP 4: Convert to JSON Representation and Instrument (Crunch) JavaScript Code
This step translates all the data structures and markup into a form suitable for client side processing. There are a few sub-steps outlined in order below:
- The CSS is processed
- First the CSS is parsed and converted into an abstract internal representation layer,
- Then the CSS parser structure is converted into JSON representation.
- For instance, the following CSS snippet:
is converted to this JSON representation:
- The outer HTML of the body tags is processed
- During this process all the HTML nodes are converted to JSON. During this process every time a script tag is found the JavaScript is placed in a Metadata structure under scripts (which is a hashtable). In the resulting JSON representation, a reference to the Metadata structure is added so that script can be looked up on initialization.
- So this HTML snippet:
- will be converted into this structure:
Each node has either has child elements which are listed in the "c" array and or it has attributes listed in the "a" array. There is a special node now called "script" that has a string as it value. This value is a key in the Metadata scripts property for the associated JavaScript. The reason for this is that the JavaScript must be crunched separately than this JSON object. This whole structure will be processed by client side engine to create the actual HTML layout of the original HTML snippet.
- JavaScript is instrumented/crunched
- All the JavaScript above the start body tag is pre-appended by a document.initializeHTML( [{0}] ); where the "{0}" is the JSON HTML structure. Then, it is crunched by adding all of the marshalling and QOS (quality of service) calls to the client side WebSandbox code.
- Here is a sample of the crunched output:
As you can see all get, set, functions, invokes, and pre and post operators get wrapped by the sandbox. This means all calls get verified before they are executed.
- Metadata is then converted to JSON
- The complete Metadata structure is converted to JSON. It closely resembles the C# version without any methods. The scripts section is processed prior to being spit out to JSON. Each script is crunched separately and then spit to the JSON object. This is done so that each script can be executed in order and with the right amount of HTML initialized.
- Here is an example of the Metadata output:
- The "scripts" node lists all the <script> tags that were within the body tags and have been separately crunched.
At this point all of the code/content has been converted. All that is left is initializing the code on the client. This is a very simple process performed by these following lines of code in JavaScript:
- {0} - is the crunched JavaScript (see Figure 4 Crunched JavaScript)
- {1} - is a Unique ID for the code that is being registered
- {2} - is the settings for the code being registered
- The JSON CSS (see Figure 2 JSON CSS) is a sub object of this object. e.g. {{ css : {0} }} where {0} is the JSON CSS
- {3} - is the JSON Metadata object (see Figure 5 JSON Metadata)
- {0} - is the variable name of the newly created sandbox instance
- {1} - is the HTML element that the new sandbox instance will be created in
- Most of the time this will get a getElementById("id") call.
- {2} - is the Policy used for this instance of the sandbox
- {3} - is the Unique ID used to register the code for this instance
- The code only has to be registered once for each Unique ID.