Strategies for the Curation of CAD Engineering Models

Normal 0 Product Lifecycle Management (PLM) has become increasingly important in the engineering community over the last decade or so, due to the globalisation of markets and the rising popularity of products provided as services. It demands the efficient capture, representation, organisation, retrieval and reuse of product data over its entire life. Simultaneously, there is now a much greater reliance on CAD models for communicating designs to manufacturers, builders, maintenance crews and regulators, and for definitively expressing designs. Creating the engineering record digitally, however, presents problems not only for its long-term maintenance and accessibility - due in part to the rapid obsolescence of the hardware, software and file formats involved - but also for recording the evolution of designs, artefacts and products. We examine the curation and preservation requirements in PLM and suggest ways of alleviating the problems of sustaining CAD engineering models through the use of lightweight formats, layered annotation and the collection of Representation Information as defined in the Open Archival Information System (OAIS) Reference Model.  We describe two tools which have been specifically developed to aid in the curation of CAD engineering models in the context of PLM: Lightweight Models with Multilayered Annotation (LiMMA) and a Registry/Repository of Representation Information for Engineering (RRoRIfE).


Introduction
Within the engineering community, Product Lifecycle Management (PLM) -the management of product data across the enterprise -has become increasingly important over the last decade or so (Stark, 2007).There are several reasons for this development.The first is an increase in the globalisation of markets, resulting in collaborative practices in which product development, manufacture and maintenance occur in a geographically distributed and networked environment, with the result that much of the data relating to a particular product or artefact is dispersed over a number of organisations and locations.Secondly, there is an emerging economic and business paradigm shift in which companies that design and build products are increasingly being required to enter into contracts to provide through-life support -that is, products are no longer being purchased as artefacts, but rather as services.Within the aerospace industry, for example, the concept of "power by the hour" has recently been introduced.For products such as schools, hospitals, cruise ships, aircraft and rolling stock for railways, this could mean a commitment to providing support for as long as the product is in service, extending to 30-50 years or in some cases even longer.Consequently, PLM has gained prominence in the engineering, manufacturing, contracting and service sectors amongst others; it requires the efficient capture, representation, organisation, retrieval and reuse of product data over its entire life.
At the same time, there is a much greater reliance on Computer Aided Design (CAD) models which have now supplanted paper-based technical drawings and documentation as the main carriers of definitive product data.Within the last ten years or so, the engineering industry has gradually converted to using CAD models directly for communicating designs to manufacturers, builders, maintenance crews and regulators.This switch to creating the engineering record digitally, however, presents problems not only for its long-term maintenance and accessibility -due in part to the rapid obsolescence of the hardware, software and file formats involved -but also for recording the evolution of designs, artefacts and products.
The fragility of digital information has been widely recognised and extensively documented (Digital Curation Centre [DCC], 2007).Digital curation is a multi-faceted and complex process involving social, political, organisational and financial as well as technical issues.In order to clarify these numerous aspects and the relationships between them, the DCC has developed a Curation Lifecycle Model (DCC, 2008).The Model provides a graphical high-level overview of the stages required for the successful curation and preservation of data from initial conceptualisation to disposal.It can be used to plan activities within an organisation and enables granular functionality to be mapped against itself and particular information workflows.The DCC Curation Lifecycle Model applies to and defines both digital objects and databases, and splits curation processes into those that are: information that need to be dealt with are diverse and particularly complex, including product geometry, finite element analysis models, manufacturing process models, etc. Engineering organisations need to communicate this information with a wide range of different stakeholders, each with different information needs and access rights.The purposes to which the information may be put is also varied, from manufacturing through redesigning and upgrading to incident investigation and marketing.
We begin by discussing the curatorial issues of CAD engineering models and the additional complications caused by PLM.We then go on to suggest ways of alleviating the problems associated with the sustained representation of CAD engineering models through the use of lightweight formats, layered annotation and the collection of Representation Information (RI) as defined in the Open Archival Information System (OAIS) Reference Model (Consultative Committee for Space Data Systems [CCSDS], 2003).We describe two tools which have been specifically developed to aid in the curation of CAD engineering models within PLM: Lightweight Models with Multilayered Annotation (LiMMA) and a Registry/Repository of Representation Information for Engineering (RRoRIfE).

Product Lifecycle Management
Projects in industries such as shipbuilding, aerospace and civil engineering all need to track product development and modifications to the original design over time.We can distinguish a number of major phases that are involved in PLM: Conceptualisation (innovation, requirements); Design Organisation (people, infrastructure, knowledge); Design (product, process); Evaluation (analysis, simulation, performance, quality); Manufacture and Delivery (production, supply, delivery); Sales and Distribution (advertising, marketing); Service and Support (maintenance, upgrades, warranties); Decommissioning (retirement, recycle, disposal).
The Knowledge and Information Management through Life Project (KIM) is currently investigating the implications of PLM and the paradigm shift to a productservice approach (Ball, Patel, McMahon, Culley & Green, 2006).A scenario has been devised by the project to illustrate ideal information flows in a product's lifecycle; Figure 1 shows a summary.A major challenge is that this data needs to be shared and exchanged between multiple organisations involved in the lifecycle of the product.
It is important to actively manage all the information and data relating to a product for many reasons including: legal issues (e.g.accident investigation; failure analysis; customer delivery disputes; mergers and acquisitions; patent infringements); operational support issues (e.g.maintenance; replenishment of spare parts; recycling; disposal); and product development management issues (e.g.tracing design rationale; design reuse; customisation and upgrade; reverse engineering; testing and validation).The diversity of the information to be managed is immense and includes: design rationale; minutes from meetings and design reviews; electronic mail; CAD models; engineering drawings; test data (photographs, images, load diagrams, finite element analyses); manufacturing process plans; assembly plans; inspection, maintenance and service data; design, production and manufacturing logs; contracts; standards; procedures and processes; and many more varieties of information.Such information ranges from highly structured data through unstructured textual documents to the tacit knowledge held by employees.
The active management of all product-related data is therefore vital to PLM as The International Journal of Digital Curation this data is created, added to, modified and extracted over the course of the lifecycle of a product.To facilitate the development of new generations of a product in the face of greater awareness of environmental impact and efficiency, it is necessary to cater for long-term retention and preservation so that older designs can be reused and adapted or customised.In addition, it is apparent that these data need to be trustworthy, reliable and accurate for use in downstream processes.Although various PLM systems are under development (e.g.Arena, EDS, Dassault, MatrixOne, PTC etc.), at present they do not directly address curation and preservation issues, focusing instead on product information exchange.Furthermore, as yet no single PLM system addresses all the flows of product information that are required throughout the duration of its useful life.

Issues with Curating CAD Model Data
Whilst acknowledging the enormity of the task of developing full PLM systems that meet all the requirements alluded to in the previous section, we have been focusing largely on one particular type of data.CAD models have traditionally been used in the design, evaluation and manufacturing phases of PLM; up until the turn of the millennium, engineering software was used to support a paper-based workflow -CAD packages were used to create virtual models of designs, from which drawings and other design documentation could be produced.The manufacture or construction process was based on the resulting documentation.However, current digital environments necessitate an electronic flow of information between heterogeneous systems for Computer Aided Design (CAD), Computer Aided Engineering (CAE) and Computer Aided Manufacture (CAM) as well as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM) and Supply Chain Management (SCM).As a result, there is an increasingly greater reliance on CAD models which are now being used as the method for recording definitive product data.
Two main forms of underlying CAD model representation are in use within CAD software systems: procedural definitions that store the construction history of the model (e.g.Constructive Solid Geometry (CSG) which constructs a model as a combination of simple solid primitives, such as cuboids, cylinders, spheres and cones) The International Journal of Digital Curation Issue 1, Volume 4 |2009 and explicit representations that store the underlying mathematical forms (e.g., Boundary representation (B-rep)) which represents shapes by their external boundaries (structured collections of faces, edges and vertices) (Requicha & Rossignac, 1992).Brep is more flexible and has a much richer operation set, and therefore has been widely adopted in current commercial CAD systems.Further techniques such as freeform surface modelling and feature-based modelling have also been extensively used.Freeform surface modelling represents component parts with complex surface curvatures using functions or approximations to represent those surfaces, such as Nonuniform Rational B-spline (NURBS) and Bézier surfaces.Feature-based or parametricbased modelling aims to encapsulate the engineering significance of portions of the product geometry and, as such, is applicable in product design, product definition and reasoning about the product in a variety of applications such as manufacturing planning (Shah & Mäntylä, 1995).Currently, most CAD systems implement a hybridmodelling strategy to try to combine the strengths of the various approaches.
Whilst CAD models are an invaluable aid to designers during product development, for a complex, long-lived product it is highly likely that the product will outlive the CAD/CAE computer system that was originally used to design, produce and maintain it.In fact, in some cases the software will be replaced several times over during the lifecycle of the product -the problem is not only that the software will become obsolete within a few years, but that compatibility with newer generations tends to be very unreliable.
The CAD software industry is competitive and characterised by an array of commercial CAD systems (e.g.AutoCAD, CATIA, ProENGINEER, Solidworks), each with its own proprietary, closed file format which is often subject to frequent change (Wikipedia, 2008).As a consequence, data created using a particular application is in danger of becoming inaccessible once software is retired or replaced as part of ongoing modernisation.Not only is interoperability between such systems virtually non-existent, but it is not economically viable for an organisation to install and run multiple CAD systems in order to view or manipulate proprietary product representations.Furthermore, in such a complex and dynamic environment, it becomes extremely difficult to retrieve and trace provenance information to check the veracity, reliability and quality of data.
CAD applications typically require considerable processing and therefore make use of hardware features to enhance performance, accentuating the interdependencies between file formats, software applications and computer hardware.An additional complication is that current CAD formats native to a particular CAD application are resource-heavy: the file sizes tend to be very large, causing problems when storing product models and transferring them between organisations and users.Typically, a model of a simple component can easily be more than a megabyte in size, resulting in gigabytes of data for complex products such as aircraft or cars.
There are several strategies to be considered for preserving product model data, including reducing a 3D model to 2D and storing it as a hard-copy either on paper or other long-lived media.This option, however, entails considerable information loss as well as constraining reuse of the data.Emulating old software and hardware (virtual machines) requires a heavy investment in IT support, but may also be difficult to incorporate into more modern and evolving workflows and systems.Constant The International Journal of Digital Curation Issue 1, Volume 4 | 2009 migration to successive proprietary formats exposes the data to a continuous risk of data loss and subtle design corruption; additionally, the cost of checking and validating models after migration can be prohibitive.Even the use of open and neutral standards, such as the Initial Graphics Exchange Standard (IGES) (US Product Data Association, 1996) and the Standard for the Exchange of Product Model Data (STEP) (ISO 10303, n.d.) is not without issues; the rigours and long timescales of developing a comprehensive exchange standard for CAD models, means that it is difficult to keep up to date with the latest capabilities of commercial CAD tools.Furthermore, the level of support for such standards can vary considerably between tools.
The engineering community is highly attuned to these issues and the extension of the STEP standard is well underway to provide support for the various stages of PLM (ISO 10303-239, 2005).STEP was originally conceived to handle final explicit geometry-based models, but is now being extended to cater for procedural CAD model representations as well.Final models and construction history models each have their advantages -final models are more compact and considered to be geometrically more stable for long-term preservation than a native CAD model, whereas models that record the construction history are easier to reuse -STEP will therefore recommend keeping models in both forms.STEP is also beginning to address issues relating to the reliability and quality of data (ISO 10303-59, n.d.).
Other efforts to capture data over and above the product geometry include the NIST Core Product Model (CPM) and Open Assembly Model (OAM) (Rachuri, 2007).The CPM is based on the form, function and behaviour of a product; it is able to capture and share the full engineering context in product development.The OAM tackles the problem of attaching assembly-and system-level tolerance information to archival product models through the use of annotations.

Additional Complications Due to PLM
Traditional CAD models are centred on geometric and topologic depictions of the product and lack the ability to model high-level design and engineering context and semantics (Ding, Matthews, McMahon & Mullineux, 2007).We have seen in previous sections that product information continues to develop during the whole product lifecycle and such information needs to be reflected in the definitive product model.
Within PLM there is a requirement to support global and distributed collaboration.However, product data tends to be amongst the most valuable intellectual property (IP) of a company, which will therefore only be prepared to share selective information.For example a designer might mark up a CAD model with the reasoning behind a particular design choice.If the model subsequently needs to be passed to a partner company for manufacture this information may need to be removed to protect the IP of the original company.However, if the exchange is with another engineer from the same company, perhaps tasked with producing a variant of the product, then the full detail of the markup will need to be exposed.In addition, different stages of PLM require different subsets of product data, and different interpretations of the same data: these are known as different viewpoints on the data.For example, machining features are useful for manufacturing engineers, but not for marketing staff, for whom a visualisation of a product, unencumbered with production and manufacturing information, is of far greater use.Here the notion of significant properties comes to the fore; significant properties are those aspects of the digital object which must be

The International Journal of Digital Curation
Issue 1, Volume 4 |2009 preserved over time in order for it to remain accessible and meaningful (Wilson, 2007).In PLM the significant properties of a product vary depending on the viewpoint and the stage in the lifecycle.
Finally, as explained in the previous section, issues relating to technological obsolescence are exacerbated in PLM, mainly due to the state of the CAD software industry and the move to the product-service paradigm -complex dependencies and relationships exist between file formats, software and hardware revisions.

Proposed Strategies
Although CAD models are taking on more and more importance in engineering practice, as we have seen they suffer several specific limitations when one attempts to use them to support the information flows in PLM.Closed proprietary file formats which are reliant on commercial software applications are unsuitable for data exchange as well as long-term preservation.CAD models tend to be very large and are designed primarily to store geometric and topological information, so that it is difficult to record design rationale, in-service data and other information useful in an iterative design process.Moreover, full CAD models do not cater for multiple viewpoints or the protection of IP.It is apparent that for CAD models to support all the processes in PLM, they need to be extended or augmented in some way.
Below, we propose several strategies for improving the robustness of CAD engineering product data in PLM.A framework of Lightweight Models with Multilayered Annotations (LiMMA) combines open, lightweight CAD formats with annotations to augment geometric data with supplementary information generated throughout the life of a product.In addition, we propose the use of a preservation planning tool -a Registry/Repository of Representation Information for Engineering (RRoRIfE), which supports decisions relating to the migration of file formats for continuous and long-term access and reuse.Both LiMMA and RRoRIfE represent emerging curation tools which have been developed specifically to cater for CAD engineering models and PLM.

Lightweight Formats (LWF)
Lightweight formats are product models that are missing some of the richness of a full CAD model; they are analogous to the notion of desiccated formats (Kunze, 2005).The major characteristics of lightweight representations are: reduced file sizes via compression techniques; platform and application independence; progressive streaming; and multiple levels of detail (LOD) for rapid display.It is easier and less expensive to write software tools for such formats and therefore one can expect much wider and longer-lasting support than proprietary formats.Indeed, the benefits of such formats are not just for preservation, but also for immediate access, collaboration and dissemination.There are several different LWFs in current use, each with properties and characteristics better suited to some purposes than others.Table 1 provides a summary of the characteristics of a selected number of LWFs with a particular regard to their capabilities in respect of: fidelity to the full model, metadata storage, data security, file size reduction, support for the format by software and openness of the file format specification.At present, efforts focus on two aspects: lightweight 3D model visualisation for distributed collaborative work, for example U3D (ECMA-363, 2007) A CAD markup environment which caters for marking up models both internally and externally to a CAD application (UGS NX) has been implemented (see Figure 2).In order to facilitate application independence both the internal and external markup environments store annotated information in an XML document with references back into the CAD Model.An internal markup environment has been developed for UGS NX, as shown in Figure 3, as well as an external markup environment for the 3D PDF viewers Adobe Acrobat and Adobe Reader.Based on the markup interface in Adobe Acrobat, the annotated information is exported to an external XML document, which can then be read and linked back to a specific entity in the original CAD model through a transfer interface executed outside the CAD environment based on the NX3 Open C API.

Registry/Repository of Representation Information for engineering (RRoRIfe)
While the use of LiMMA addresses many of the challenges of PLM, there remains a problem in selecting a format which is the most appropriate not only for a particular use and view of the product, but also for long-term retention.
The Registry/Repository of Representation Information for Engineering (RRoRIfE) is a decision-making tool whose functionality is based on a narrow subset of RI concerned with the capabilities of file formats and software (see Figure 4).In general, RI is any information that is required to render, process, visualize and interpret data, and includes: file formats, software, algorithms and standards as well as semantic information (CCSDS, 2003).The premise behind the tool is that an intellectual object can only be faithfully reproduced in a new format or environment if the latter supports properties or characteristics equivalent to those used by the intellectual object in its native format or environment.Furthermore, different tools may be better or worse at re-expressing the constructs of the old format or environment in the constructs of the new.
Underlying RRoRIfE is an ontology of properties, characteristics and constructs of engineering information; as RRoRIfE is presently focused on CAD models, the current ontology includes various two-dimensional and three-dimensional geometric entities, as well as different compression techniques and forms of metadata.This ontology was derived from a superset of the properties supported by a sample of CAD

The International Journal of Digital Curation
Issue 1, Volume 4 |2009 formats.Two XML schemas have been written using this ontology.The first relates to file formats and describes whether or not the format supports a particular property.An intermediate value of 'partial' support is allowed, to indicate that support is limited in some way; for example, NURBS surfaces may be allowed, but only with 256 or fewer control points.In cases of partial support, explanatory text must be provided.
The second XML schema relates to processing software.For each format conversion the software is able to perform, it is recorded how well the conversion preserves each property.Four levels of preservation are allowed: 'none' indicates that the property has never knowingly survived the conversion intact (perhaps because the destination format does not support the property); 'good' indicates that the conversion has so far preserved examples of the property sufficiently well that it would be possible to reconstruct the original expression of the property from the new expression; 'poor' is used when tests have found it at least as likely for the property to be corrupted or lost as it is to survive; while 'fair' is used otherwise, alongside an explanatory note.Where preservation is less than 'good', it is possible to record whether the property survives in a degraded form, and if so, whether this degradation always happens in a fixed way, a configurable way or an unpredictable way.For example, when moving from a format that supports NURBS to one that only supports tessellating triangles, there may be a fixed algorithm for approximating surfaces, or one may be able to specify how detailed the approximation is.RRoRIfE reads files in these two XML formats, and uses them to answer simple preservation planning queries.As well as being able to look up the characteristics of formats and conversions individually, it also allows one to select certain characteristics as significant and discover which formats support them.It can generate possible migration pathways between two formats, and given a starting format and set of significant characteristics, it can generate a list of suitable destination formats and conversion pathways.RRoRIfE is written in Java 6.0; Figure 5

Conclusions
We have examined the digital curation challenges of CAD engineering models posed by PLM and suggested several techniques to improve the robustness of product data to serve the needs of both PLM and long-term accessibility.Full CAD models tend to have closed, proprietary formats and are difficult to exchange.Lightweight formats provide a more promising approach in that: they have open specifications; they are simpler and have smaller file sizes; they can cater to the need for multiple viewpoints; and restrict access for security purposes.In addition, multilayered annotation allows models to be extended with much valuable data including that required for preservation.Finally, the accumulation of RI facilitates informed decision making with respect to which LWFs and conversion software to use, both for data exchange within PLM and for long-term retention.Both LiMMA and RRoRIfE add to the suite of tools already available (e.g.JHOVE, DROID, PRONOM, iRODS, TRAC, DRAMBORA etc.) with a specific focus on CAD models.More broadly, there is a need for best practice guidelines and cost-benefit models to aid in choosing appropriate curation strategies since the business of deciding on a suitable path is non-trivial and contingent on many factors; for example, products have different life times, from a few months for a mobile phone to decades for an aircraft.

Figure 1 .
Figure 1.An example of information flows in the lifecycle of a product.

Figure 3 .
Figure 3. Interface of the internal markup environment.

The International Journal of Digital Curation Issue 1, Volume 4 |2009
However, in an engineering context, the curation and preservation of digital information in PLM presents its own set of challenges.For example, the types of

Table 1 .
A summary of the characteristics of selected Lightweight Formats.