Hacking

Copyright: 2010 University of Southampton, IT Innovation Centre

Introduction

This guide is for people wanting to understand or modify the source code.

Code layout

  • src/main/e/core contains the core library code.
  • src/main/e/governance contains code to implement the governance components.
  • src/main/e/gui contains code to implement the core GUI.
  • src/main/e/persistence contains a general-purpose persistence framework.
  • src/main/e/plugins contains code that extends the GUI to support particular types of service.
  • src/main/e/xml contains a simple XML support library.
  • src/test/e contains test cases.

Sturdy refs

E has two kinds of reference: live and sturdy. A sturdy ref is like an EPR: it is a persistent address which can be used to establish a connection and get a live ref. A live ref can be used to invoke methods, but dies when the TCP connection ends. e.g.

def liveRef := sturdyRef<-getRcvr()
liveRef<-method()

Because clients should generally be able to persist references to remote objects, we almost always pass sturdy refs to the client.

GUI code

Each object in the tree in the left pane is an SWT TreeItem. There is a mapping from TreeItems to Handlers. A Handler is the object-specific code that provides a GUI for a remote object. For example, a remote batch job object is represented in the GUI by a local jobHandler.

The handler has methods to respond to double-clicks and drags, and to create the details pane when selected.

Swiss numbers

When an object is to be shared, a long unguessable random number is generated for it. This is the ''Swiss base''. The Swiss base is hashed (using a secure hash) to get the ''Swiss number''. This is then hashed again to get the ''Swiss hash'':

  • Knowing the Swiss base allows you to be the object.
  • Knowing the Swiss number allows you to access the object.
  • Knowing the Swiss hash allows you to identify the object (i.e. compare it with another object).

Persistence

There is a tree of persisted objects. For each object, we save the name and arguments of a method call on its parent's builder that can be used to (re)load it. Each object has access to its own persistNode object, which can be used to add and remove child nodes and perform other persistence-related tasks. The root object is created by the application. For example, a typical setup is to use a genericServer object as the root, with hosting containers as its children:

persistNode  <-----------> Generic Server
 |---persistNode <-------> Paint Hosting Container
 `---persistNode <-------> Swirl Hosting Container
      `-- persistNode <--> Swirl Service

Each persistent object follows this pattern:

def makeFoo(persistNode, fooArg) {
      persistNode.setBuilder(def _ {
              to loadBar(childNode, [barArg]) {
                      return makeBar(childNode, barArg)
              }
      })

      return def foo {
              to createBar(barArg) {
                      return persistNode.makeChild("loadBar", [barArg]).getObject()
              }
      }
}

When the persistence system wants to revive a foo object, it creates a persistNode object for it and calls foo's parent's builder with the saved details. The parent calls makeFoo, passing in foo's persistNode and any extra arguments needed. The resulting foo object gets associated with the persistNode. The foo object can have child objects (bars in this example), so it also registers a builder of its own, used to revive them.

The foo.createBar method allows holders of the foo object to create persistent bar objects. Internally, it tells the persistence system how to make a bar (by calling loadBar in this case). makeChild returns a persistNode for the child. Calling getObject then actually loads the object (triggering a call to loadBar).

If every sub-object shares some common authority (e.g. timer), then loadBar can simply call makeBar(childNode, barArg, timer). There is no need to persist the reference to the timer (i.e. don't pass timer to makeChild).

Similarly, if objects have authority which can be derived from the parent's authority then this doesn't need to be saved either. For example, if every bar has access to a subdirectory named after barArg, then loadBar can simply call makeBar(childNode, <file:store>[barArg]). Only the leafname (barArg) is persisted, not the whole file authority. This also makes it easy to relocate the store, since we never persist full pathnames.

Extensions

It is often useful to give objects access to other authority linked to their identity. For example, we may want objects to be able to store log messages in a database, but only logged under the object's own identity and removed when the object is destroyed. The persistence system therefore allows extra features to be registered. Features used in the prototype (see core/persisterSetup.emaker) include:

  • Sturdy (allows an object to export a SturdyRef to itself, making it accessible over the network)
  • State (store arbitrary key/value pairs)
  • Usage (record usage statistics in a database table)
  • Log (as described above)
  • Access control (create child objects exposing limited access to the parent)

Comparison to E's timeMachine

This mechanism is different to the default E persistence system, but maintains the essential property that an object can't exploit the persistence system to gain extra authority (for example, a naive system might allow an object to say it should be revived with access to the filesystem, when it never had that authority originally).

Initially, we tried using the default E system, but we found the following limitations:

  • The default system has to save all objects in the system at once. This does not scale well. The new system allows the persisted arguments for each object to be independent, meaning that we only have to write out state for the objects that actually changed.
  • Each object has to save all its authority, even though many objects share the same authorities (e.g. every job has a timer). This is inefficient and makes the saved data less robust against upgrades. By moving responsibility for creating objects to their parents, we ensure that the required authority is already available.
  • There are many code-paths to test. Each object must implement three functions: create, save and restore. Objects not designed for persistence typically only implement the create function. For example, if an object is created with a verb facet (obj.method) then the object cannot be persisted because verb facets cannot be. In the new system, the builder will create a new verb facet when loading, rather than requiring any special support from the object itself. If an object can be created, it is very likely that it can also be saved and restored, because all code is exercised during creation.
  • Objects are not organised. Once you give makeSturdyRef to an object, it can create any objects it pleases. There is no way to find and remove these objects easily. The new system only allows objects to create children; when the object is removed, its children are removed too.

See:

Pipelining

To deal with high-latency networks, it is essential to pipeline messages (i.e. send more messages without waiting for previous ones to return). E ensures messages sent using a particular proxy object will be delivered in order. e.g. this code sends two messages to a and continues without waiting for either to complete:

a<-invoke("foo")
a<-invoke("bar")

More impressively, you can send messages to an object even if you don't yet know its address. For example:

def sla := template<-propose()
def service := sla<-getService("swirlService")

The call to propose returns a ''promise'' of an SLA, which the vat hosting the template object is responsible for resolving. When the response message arrives, the promise turns into a live reference to the sla. [nb: in reality, propose returns a sturdy ref, but let's keep things simple here]

Invoking getService on the promise optimistically sends the request to the template's vat, in the hope that the sla object will turn out to be nearby. By the time the getService message arrives at the remote vat, the result of the propose operation is known, and the getService message is forwarded to the resulting sla object. If the sla turned out to be remote to the template, the message would be forwarded on.

To make this work efficiently:

  • The caller should know statically what kind of response to expect (e.g. when calling template.propose the client must already know that the SLA will provide a swirl service so it can call getService without waiting).
  • The caller should be able to send as many messages as possible without waiting for replies. e.g. the client should be able to call copyFrom(src) without waiting to be notified that src contains data. The copy should simply wait until src is ready.

See: http://www.erights.org/elib/distrib/captp/index.html