A Semantics for Object-Oriented Systems

Current object-oriented design notations such as OMT [14], Booch [3] and UML [16] are syntax-bound and semantic-free in the sense that they typically employ a large and rigorously enforceable collection of construction rules, but rarely provide a model to explain what is being constructed. Whilst this omission clearly does not prevent such notations being used e ectively in the development of object-oriented software systems, it must raise questions regarding the longterm viability of notations which are not adequately anchored in a semantic theory. The aims of this work are to provide a semantic basis which is suitable for such notations and which can form a basis for rigorous object-oriented development. Our approach is to take as a starting point the computational behaviour of objects and to provide a semantic model of incremental system development. A system is de ned as the solution to a set of simultaneous equations which specify its computational behaviour and structure. We use category theory [1] [15] [9] as a tool to express the equations since this theory provides standard constructions and results which conveniently express semantics without getting unnecessarily entangled in syntactical issues. Furthermore, the aspects of category theory which we use have a constructive avour which allows us to program up the results and use them to express systems as non-deterministic speci cations. A functional language is used to express the speci cations since it supports re nement transformations which are used to produce a deterministic implementation from a speci cation. This approach has the bene t of focussing on the semantics of object-oriented systems, unlike other approaches which propose particular languages, for example Z or modal logic, as their starting point. We claim that this leads to a fundamental model of object-oriented systems behaviour which can be denoted using a variety of languages, including Z, modal logics and concrete programming languages, which are chosen to suit the development method or application. Our approach is highly compositional which allows the semantics of system components to correspond very closely to the design elements which are used


Introduction
Current object-oriented design notations such as OMT 14 , Booch 3 and UML 16 are syntax-bound and semantic-free in the sense that they typically employ a large and rigorously enforceable collection of construction rules, but rarely provide a model to explain what is being constructed.Whilst this omission clearly does not prevent such notations being used e ectively in the development of object-oriented software systems, it must raise questions regarding the longterm viability of notations which are not adequately anchored in a semantic theory.
The aims of this work are to provide a semantic basis which is suitable for such notations and which can form a basis for rigorous object-oriented development.Our approach i s t o t a k e as a starting point the computational behaviour of objects and to provide a semantic model of incremental system development.
A system is de ned as the solution to a set of simultaneous equations which specify its computational behaviour and structure.We use category theory 1 15 9 as a tool to express the equations since this theory provides standard constructions and results which conveniently express semantics without getting unnecessarily entangled in syntactical issues.
Furthermore, the aspects of category theory which w e use have a constructive avour which allows us to program up the results and use them to express systems as non-deterministic speci cations.A functional language is used to express the speci cations since it supports re nement transformations which are used to produce a deterministic implementation from a speci cation.
This approach has the bene t of focussing on the semantics of object-oriented systems, unlike other approaches which propose particular languages, for example Z or modal logic, as their starting point.We claim that this leads to a fundamental model of object-oriented systems behaviour which can be denoted using a variety of languages, including Z, modal logics and concrete programming languages, which are chosen to suit the development method or application.
Our approach is highly compositional which allows the semantics of system components to correspond very closely to the design elements which are used to denote them.This is in contrast to other approaches, for example those based on rst order logic, in which the distinction between system components is blurred.
Early results show that our approach is constructive in the sense that we can produce program elements which correspond closely to semantic components.This leads us to believe that we can produce a system development technique which i n volves mapping elements of design notations to executable components whose semantics are expressed using categorical concepts and whose proof theory and transformation system is expressed in terms of a clear semantic model.This paper reports early stages of this research and de nes a semantic model, outlines how the model can be used as a semantic basis for current objectoriented design notations and gives a simple example of how the model facilitates program development.The paper is structured as follows: Sections 2 6 de ne the basic semantic model which we use to express the behaviour of object systems.Section 7 outlines how we express properties of systems, how we might establish the properties and how we can view system development as a series of semantic preserving transformations.Section 8 outlines how we can generalise the semantic model to include dynamic system behaviour in terms of class instantiation.Section 9 describes how w e can use the model to interpret the standard features which are employed by t ypical object-oriented design notations.Section 10 gives a constructive i n terpretation to the categorical tools employed in de ning the semantic model and uses them to specify and subsequently implement a v ery simple object-oriented program.Section 11 concludes by describing further work which is planned and compares our approach with related research.

Objects
Systems are constructed as a collection of objects.Each object is a separate computational system with its own state which is modi ed in response to handling messages.A snapshot of an object is ; ms; where is an object identi er, ms is a sequence of messages, referred to as a message history and is the state of the object.All objects have a unique object identi er.There is no restriction on the contents of an object state; we m a y think of it as a tuple of simple data items.
Objects are computational systems which perform calculations in response to receiving messages.A message is a package of information which is sent from one object to another.When the message is handled by the receiver, the result is a change in state and possibly new objects and further messages.If we observed a single object, it would consume messages over a period of time and possibly change state each time a message is consumed.This gives rise to object calculations which are sequences of object states of the form: : : : 7 ,! ;m: ms; 7 ,! ; ms; 0 7 ,! : : : in which the object with identi er consumes the next message m causing its state to change from to 0 .All object calculations have a starting state, but may not have an end state if the object continues to exist inde nitely.
Objects have a number of messages which they recognise and can handle.In an object-oriented program these correspond to all the possible legal method calls for an object.Correspondingly, an object can be associated with a set of the possible message histories which it can receive during its life-time.Suppose that we represent such a set of message histories as M, and the set of possible object histories for a deterministic object O as OM.
A set of message histories M has the following structure: suppose that we select a history ms 2 M and then generate all of the history pre xes prefixesms; in order for M to be well formed, all of the pre xes ms 0 2 prefixesms m ust be also in M, ms 0 2 M. In conclusion, objects and object speci cations are functors from the category of sets of message histories with inclusions as morphisms to the categories of object calculations and object speci cation calculations.Since the former appears to be a special case of the latter, we will use object speci cations from now on.A su cient condition for to exist between two objects O 1 and O 2 occurs when O 1 represents an extension of the same object.The extension is expressed in terms of the object state by adding extra components, for example when a two-dimensional point is extended with a z co-ordinate to become a three-dimensional point.In such cases, the morphism M is constructed from a projection on the state which simply drops the extra state components, i.e. the mapping x; y; z 7 !x; y.
Consider the case where M = M 1 = M 2 but O 2 is only de ned with respect to some sub-set of M. This corresponds to the case where O 1 is an extension of O 2 and has added some new methods.In this case the morphism M : O 1 M !O 2 M must uniquely associate calculations involving the new messages in O 1 M with error calculations in O 2 M.

Systems
A system of objects is constructed by composing individual objects such that the original objects can be recovered.Consider a pair of completely independent objects O 1 and O 2 which represent sets of possible calculation sets.An object system which is made up of these two objects will have calculations which are made up of all pairs of calculations such that the original calculations can be recovered simply by projecting the pairs onto their components: 1 ; m : ms; 7 ,! 1 ; ms; 7 ,! : : : 2 ; m : ms; 7 ,! 2 ; ms; 7 ,! : : : Alternatively, t wo objects may b e i n ter-related.In order for this to be the case all calculations for the objects will have the same object identi er.The system will consist of single calculations which are produced by merging corresponding calculations from both objects.For example, if the rst object contains a calculation with an object state ;m s 1 ; 1 , and the second object contains a calculation with an object state ;m s 2 ; 2 then the calculations can be merged to produce an object state ; ms; 1 2 when ms = ms 1 = ms 2 and 1 2 is the smallest object state containing the information from both 1 and 2 .The original calculations can be recovered by dropping the extra state components in each case.Note that not all the calculations from both views" of the same object may be in the product since they have to agree on message histories.
Systems of objects can be expressed in terms of object products.Given We can think of the product as containing calculations whose behaviour is a pairing of component object calculations.Given a collection of independent objects, a product can be constructed pair-wise, forming a single system object which contains the behaviour of its components.

System Behaviour
In general, a system is not composed from a collection of completely independent objects.Objects can be used to place restrictions on the behaviour of other objects and to extend the state or message histories of other objects.These interactions are expressed on an object diagram which is a graph whose nodes are labelled with objects and whose edges are labelled with object morphisms.
Suppose that we h a ve an object O which represents sets of object calculation sets.We wish to restrict the collection of O calculations.The restriction can be expressed as an object R 1 with a morphism 1 : R 1 !O which picks out just those calculations from O which are allowed by R 1 possibly after applying some mapping.Given any n umber of these restrictions, say 7 , w e can produce a diagram: R 1 @ @ @ I 4 The behaviour of the complete system is the object which makes all of the constraints hold.We can think of the restrictions i as simultaneous equations which apply to the object O and the system as a set of calculation sets which i s a solution to the equations.In order to solve the equations we use the categorical construct of limits which has been proposed by Goguen in 8 as a means for expressing the behaviour of a system.In general, a diagram is a graph with a mapping from the nodes of to objects and a mapping from the edges of to morphisms between the source and target objects.Let n represent an object n in the diagram and e represent a morphism.A cone on a diagram is an object O which is not necessarily in the diagram together with, or each object n a morphism n : O ! n such that for all edges e : n !m in the following diagram commutes: e A cone on a system expressed as a diagram of objects is a collection of object calculations which contains some of the behaviour of all the objects on the diagram and which observes all of the restrictions on the diagram.Notice however, the there is no condition on which particular collection of calculations to pick for the cone.There are many possible choices for cones including an object which contains all possible calculation sets which are consistent with the diagram and also the object which contains no calculations at all this is an initial object which can always be mapped to another object using the empty mapping.
A cone arrow on a diagram from cone n : O ! n on to cone 0 n : O 0 !n is an object morphism : O 0 !O such that for all nodes n in the triangle below commutes: Cone arrows allow us to place restrictions on cones.For example if there is a cone arrow : O 0 !O we know that the calculations performed by O 0 are more restricted than those performed by O.
A limit on a diagram is a cone O, on such that for any other cone O 0 on there is a unique cone arrow u : O 0 !O.The de nition of a limit captures the behaviour of a system because it must contain all of the objects on the diagram and must obey all of the constraints on the diagram.Furthermore, a limit contains as many possible calculations without adding any extra state components to objects or changing the messages which objects can handle.
A standard result of category theory is that any category having a terminal object, binary products and equalizers of pairs of arrows has all nite limits 15 .This theorem allows us to construct the limit on a diagram providing that certain simple properties hold.
A terminal object in the category of objects is an object with a single possible behaviour for any message history.We could represent this as: ;m: ms; 7 ,! ; ms; 7 ,! : : : where m and ms are any message and message history respectively.A terminal object represents the degenerate case where there are no objects on a diagram; the resulting behaviour is unobservable in the sense that all messages are ignored.
A product can always be found for two objects since the product either contains pairs of calculations or merges corresponding calculations which arise from observing the same object.
An equalizer for two object morphisms is de ned as follows.In conclusion, we h a ve established that the behaviour of a system of objects is expressed as a limit on a diagram containing the component objects and relationships between them.The limit can be constructed using the categorical constructs of terminal object, product and equializer.

Communication
Objects communicate by sending messages.A source object sends a message to a target object via a communication channel.We m a y think of objects as having two message streams: an input stream message history and an output stream.Since elements of the output stream are produced as a result of handing an input message, we m a y think of the output stream as being part of an object's state; however, we will extend the model of object calculations to isolate the output stream: ;m: ms; ; M 7 ,! ; ms; 0 ; M 0 where M and M 0 are sets of output messages.The messages M 0 are produced as a result of handling the input message m.
A wide variety of design choices arise when modelling object communication.
In particular, we m ust distinguish between concurrent systems and sequential systems.In a concurrent system, multiple messages may be received by a single object and the order of system execution may be non-deterministic since objects can perform computations at the same time.We will limit the discussion to the description of object systems which will eventually be realised using sequential implementation languages.As a result, we can simplify the de nition of output messages, because an object may produce at most a single object: ;m: ms; ; o 7 ,! ; ms; 0 ; o where o and o 0 are single output messages or some unique value denoting no output message which will be omitted from a state description.
Communication between two objects occurs when the output message produced by the rst object is received by the second.The following pair of calculation schemas show h o w this works: 1 ; m : ms 1 ; 1 7 ,! 1 ; m s 1 ; 0 1 ; o 7 ,! 1 ; m s 1 ; 0 1 2 ; o : ms 2 ; 2 7 ,! 2 ; o : ms 2 ; 2 7 ,! 2 ; m s 2 ; 0 2 Initially, the object 1 is currently active and consumes the next input message m which produces an output message o which w e assume has 2 as its target.
Therefore, the next input message waiting to be consumed by the object 2 is o.Note that we are analysing sequential systems so only one object may be active at any given time.Furthermore, we restrict our systems even further by requiring that any output message must be handled by the next system transition.We do not require that message passing is synchronous since this can easily be modelled by sending a return message including result data.
Objects are de ned to produce output messages when certain conditions arise in their input.However, the objects do not necessarily know the target of the output message.We require a mechanism which allows us to plug together two objects using a communication medium.This is easily achieved using a system object which controls the ow of messages between objects.Consider the case where we h a ve t wo objects O 1 and O 2 .Both objects produce messages on their output and we wish to connect the output of one object to the input of the other.This is achieved by de ning a system object O and two object morphisms f and g: The system object O is a restriction on the freely generated system O 1 O 2 which requires that whenever an output message is produced by one of the objects it is the next input message to be handled by the other object.We can apply several separate restrictions using this technique and the overall behaviour of the system will require that all the restrictions hold.

System Properties
During system development, the software engineer must establish properties and perform system transformations.Current object-oriented design notations provide little support for these activities due to their weak semantics.A category theoretic framework for expressing the behaviour of object-oriented systems facilitates the de nition of system properties and transformations.This section give s a a vour of the structures which can be de ned in order to support system development.
The behaviour of an object-oriented system is expressed as a limit O on the diagram containing all system objects and morphisms between them.A system property can be expressed as a limit P on a second diagram 0 which describes just the behaviour which is required by the property.If the system exhibits all of the required behaviour then all of the behaviour expressed by P must be contained in O.Over the message histories M it is possible to associate every calculation in OM to some calculation in PM.

Classes
A class is a collection of objects each o f which has the same behaviour but a di erent identity.An object is brought i n to existence by instantiating a class; class instantiation can occur at any time during the life-time of a system and is performed by an object making a request for a new object of a certain type.
Classes are added to the system model described so far by a simple generalisation as follows.
Rather than build systems out of individual objects we can generalise our model and construct systems out of classes.Just as an object denotes all of the possible calculations which it can perform, a class denotes all of the possible objects which it can create.We assume that all possible objects exist, but that an object can only start to compute after another object has caused it to come into existence using a message new.A system always starts with a single active object.All other objects must be created as a result of messages initiated by the root object.
The following three objects calculations show h o w class instantiation is modelled.Initially, object 1 is active; it creates a new object 2 and sends it a message m 1 then becomes dormant.Object 2 becomes active and handles the message m 1 when it is sent and then instantiates the object 2 before sending it a message m 2 and becoming dormant.Finally, object 3 handles message m 2 .
Object-oriented notations often provide a static modelling language which expresses the structure of classes in terms of their interfaces.The semantics of such static models is given as freely generated classes see x8 where messages can make arbitrary modi cations to object state as described in x2.
Classes in static models are often organised into generalisation hierarchies where a sub-class is linked to its super-class.These links correspond to morphisms see x3 between classes such that the calculations performed by sub-class instances are equivalent to the calculations performed by super-class instances after a projection.Classes are linked using associations which indicate that instances of the classes may communicate during system execution.Such associations correspond to freely constructed systems of objects which communicate as described in x6.
The dynamic behaviour of object-oriented systems are often expressed using models which express the traces of messages between objects.These are restrictions on the calculations which may be performed by objects in a system as described in x5.Unlike current design notations, the semantic model proposed in this paper explains how individual dynamic descriptions can be composed to produce an overall system behaviour.
The dynamic behaviour of a class of objects is often expressed using state transition models.These models are used to restrict the sequence of messages which m a y legally be handled by an object and to express the state changes which occur as a result of handling a message in a particular state.Our model has parameterised object behaviours with respect to message histories and therefore provided a concrete framework within which message restrictions can be expressed and studied.

Program Development
This section shows how the proposed model of object-oriented system specication and development can be used to produce a program using step-wise re nement.The approach validates the use of the categorical machinery by showing how a system can be constructed from reusable de nitions for categorical product and equalizer.
The example system de nes a simple class for two-dimensional points.We start by de ning the behaviour of the system which corresponds to just the interface of Point, i.e. all instances of Point arbitrarily change state in response to receiving any message.
Next, we de ne two independent constraints on the behaviour.Firstly, R 1 de nes that when a Point object receives a message setx it must appropriately set the value of the x co-ordinate.Secondly, R 2 places a similar constraint o n receipt of a sety message.
We require that the behaviour of Point objects in the complete system satisfy both constraints at the same time.This is achieved by constructing a limit on the diagram containing the de nitions of Point, R 1 and R 2 .We will show that we can de ne procedures for constructing products and equalizers and then use these to construct a limit.The situation is shown in the following diagram where X is a product of R 1 and R 2 and L is constructed by nding an equalizer for arrows f 1 and g @ @ I g X @ @ @ I 1 , , , We start the program development with underspeci ed objects.This is achieved using a simple functional programming language.The language is standard except for the following features: Underspeci cation is achieved using the non-deterministic operator sel which arbitrarily selects an element from a set.The operator fail is used to induce failure.Objects are tagged and untagged with unique labels using the operators seal and unseal.
Objects are represented as tagged records where is used to concatenate records.Name lookup in records shadows on the right and the concatenation operator can be undone.
The class Point has a state x and y and a message interface containing setx and sety.In the following speci cation, the tag is supplied each time the class is instantiated.The system is now complete and is described by the diagram containing Point, R 1 , R 2 and the arrows f and g.In order to complete the system we must nd a limit on this diagram.In order to do this we can construct a product X = R 1 R 2 and then nd an equalizer of the arrows f 1 and g 2 .Consider constructing a product of two class de nitions: The product of two classes produces a triple containing a class and two projection arrows.Note that if the product class fuses" together two parts of the same larger class then we can simplify the product by merging the tags 1 and We can now construct the limit on the diagram using the following construction: L; i = equalizeX;f 1 ; g 2 where X; 1 ; 2 = productR 1 ; R 2 8 The limit is a non-deterministic program which can be thought of as performing generate-and-test when used to construct instances.Although the limit has an executable semantics, it is an infeasible program due to the amount of backtracking necessary.We shall now show h o w program transformation can be used to produce a feasible deterministic implementation of the limit on the diagram.
From 8 we note that the arrow i is not required for the purposes of generating a class de nition so we omit it.We unfold product and equalize using their de nitions 6 and 7 to produce 9.
Fr o m 9 w e unfold X using its de nition from product to produce 10.L From 10 we unfold f and g using their de nitions 4 and 5.In order to do this we i n troduce two new local variables o 3 and o 4 in 11.L ; m 2 so we can replace these pairs with a single pair.Furthermore, the non-deterministic method de nitions setx and sety from Point are shadowed by deterministic de nitions from R 1 and R 2 respectively.Therefore, we may drop these components from Point in the de nition of L given in 13.

Conclusion
The aims of this research are to identify a semantic model which is suitable for object-oriented software development and particularly suitable for analysing object-oriented designs.
We have laid the foundations of achieving this aim by describing a model of object-oriented systems based on objects as computational entities.We h a ve shown that this model is compositional and how it can be used to establish system properties and to de ne system re nement.We h a ve outlined how the model can be used to provide a semantic framework for the major features which are found in current object-oriented design notations.
We h a ve c hosen to use category theory to express the semantic model because this allows us to focus on semantic issues whilst leaving open issues relating to the notation used to express the model.Such a notation may be logical such as a modal logic or rst order such as VDM and Z.Alternatively, a suitable notation may be a calculus based on -notation.
We h a ve provided a very simple example program re nement which shows how the model may be put into practice.The example shows the utility of using standard results from category theory since we h a ve been able to code up" the categorical constructs and use them to produce a program.
The next stage of this research i s t o i n vestigate how the techniques scale-up by using them to develop some small-scale object-oriented applications.Also, we would like to investigate a close coupling between object-oriented design notations such as UML and the proposed semantic model since this will produce a mechanism for analysing the UML designs.
The analysis and semantic foundations of object-oriented designs and development is currently an active research area 13 .For example 5 shows how message diagrams equivalent to UML collaboration diagrams can be given a semantics in terms of a partial order on events; 4 shows how the speci cation language Larch can be used to give a formal semantics to static object diagrams; and, 12 can be used to produce executable object-oriented designs.
Our approach contributes to this area by identifying a potentially fruitful collaboration of ideas.The use of category theory to capture the essential characteristics of systems dates back to Goguen 10 who updated the approach to address object-oriented systems in 8 .We take a slightly less abstract view in that we wish to focus on the development of sequential object-oriented programs rather than capturing general object-oriented behaviour.
A related approach which addresses object-oriented system execution is the use of modal logics; examples are Object Calculus 2 , 6 and 11 .This approach di ers from that which i s t a k en in this work in that it uses a modal logic framework to express and analyse object execution.By abstracting away from notational issues we are able to select a notation executable or otherwise as appropriate.
A n umber of researchers such as 7 have used rst order logical notations for expressing the semantics of object-oriented design notations.Although this approach will capture the behaviour of abstract systems, these notations do not have an executable semantics and are weak at capturing temporal system properties.
Let f : O 1 !O 2 and g : O 1 !O 2 be two arrows which map calculations performed by O 1 to calculations performed by O 2 .An equalizer of f and g is an object O together with an arrow j : O ! O 1 such that for any arrow h : O 0 !O 1 there is a unique arrow k : O 0 !O such that the following diagram commutes: object morphisms f;g : O 1 !O 2 is an object O and an object morphism j : O ! O 1 .The object O can be viewed as a sub-set of O 1 calculations and j is an inclusion such that f and g agree on the sub-set O.The extra component of the de nition O 0 and k force O to be the largest sub-set which satis es the constraint.

R 1 n 1 ; n 2 = 2 R 2 n 1 ; n 2 =
Point n 1 ; n 2 seal fsetxx = R 1 n 1 + x; n 2 g Point n 1 ; n 2 seal fsetyy = R 2 n 1 ; y + n 2 g 3 We de ne morphisms f and g from the extensions to the class Point.Each morphism recovers the original de nitions of setx and sety by peeling back the extra method layer: f o = let r f setx = g = unseal o in seal r 4 g o = let r f sety = g = unseal o in seal r 5

2 and instances o 1 and o 2 .
An equalizer of two arrows is constructed using the following program: equalizec; a 1 ; a 2 = c 0 ; a where c 0 v = let o = c v in if a 1 o = a of equalize constructs a new class c 0 whose instances are just those instances of c which produce the same object under morphisms a 1 and a 2 .
Now, consider two sets of message histories for the same object M 1 and M 2 such that M 1 M 2 , i.e. there is an inclusion function f : M 1 , !M 2 .Given two sets of object calculations OM 1 and OM 2 , by de nition all the calculations in OM 1 are contained in OM 2 so Of : OM 1 , !OM 2 .
objects O 1 and O 2 , the product is an object O 1 O 2 together with two object morphisms 1 : O 1 O 2 !O 1 and 2 : O 1 O 2 !O 2 such that for any object O with morphisms f : O ! O 1 and g : O ! O 2 there is a unique arrow u : O ! O 1 O 2 such that the following diagram commutes: Assuming that the process does not go wrong and therefore no backtracking is required, there are two major types of re nement modi cation which can be applied to a system O 1 to produce O 2 .Firstly, extra properties may be added to a system for example by extending the state components in objects.In this case, each of the calculations which are contained in O 2 can be transformed into a calculation in O 1 by forgetting the extra state, therefore there is an object morphism : O 2 !O 1 .Secondly, a system may be made more deterministic by removing unwanted calculations in which case all of the calculations which are present i n O 2 will also be present i n O 1 and therefore there exists an object morphism : O 2 !O 1 .
-2During development, a system description is re ned by applying modi cations.
Each instance of Point has a di erent tag.Note that Point is underspeci ed since the methods use sel to select arbitrary integer co-ordinates.