Data Model Governance

From PHENOM Portal Knowledgebase
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Introduction

PHENOM™ Portal is a collaborative online ecosystem for data model management. This Knowledgebase provides documentation that describes some of the key concepts foundational to this approach, their importance, and how PHENOM uses them to serve you.

In addition, this document will explain the maturity of each of the features and when you, the user, can expect to see them integrated into the portal.

Furthermore, this will capture some guidance for how the data model might be managed by an organization.

Why?

PHENOM aspires to deliver high-quality, always valid™ data models. That is, if you export a model from PHENOM, you can be certain that it will conform to the corresponding standard. However, conformance is only a measure of cohesion to the standard – not an actual measure of model quality. As a result, it is possible to have a perfectly conformant model that is entirely useless.

By having a clear understanding of the processes and capabilities, you will be able to leverage all the capabilities of PHENOM to deliver the best product and process for your organization.

About Proprietary Data

Classes of Proprietary Data

There are two main classes of proprietary data associated with PHENOM:

  1. that which is created by Skayl LLC (the creator of PHENOM)
  2. that which is created by users

Skayl Proprietary

Skayl claims the implementation of PHENOM as its own, proprietary intellectual property. However, Skayl makes no claim on the data created with PHENOM. This is very much like the application Microsoft Word. Microsoft does not tell you how Word was created, but they do not claim any ownership of the documents you create with their tool. PHENOM acts in a similar manner.

PHENOM allows users access to all information stored in the data models. Beyond the implementation of PHENOM itself, the only thing that is maintained as proprietary data is the traceability of all model content. This information is an ongoing log of all operations on the data models and it is a significant amount of metadata (data about data) used to keep models glued together over time. Skayl has developed special algorithms that operate on this data for large scale use of data models as they change over time.

Domain Specific Data Models (DSDMs)

Although Skayl holds the copyright to the Air System DSDM (also referred to as: Skayl DSDM, uasModel, and 4586 DSDM), this model is distributed for broad use under Government Purpose Rights. This is intended to protect the government (and taxpayers) from being required to purchase the same thing many times.

PHENOM exports the data model content in standard, open formats. These formats include The Open Group's Future Airborne Capability Environment (FACE) XML Metadata Interchange (XMI) format (both versions v2.1 and 3.0) and a UML XMI v1.2 file.

PHENOM has the ability to capture additional information. At this time, this information is limited to:

  • Descriptions on View Attributes
  • On-the-wire representation of messages

These two types of information are not supported in the standard formats and the FACE XMI format does not allow any extensions in the data model since it has the ability to limit interoperability. In these cases, Skayl makes it possible for users to export this information in virtually any format they wish through a user-defined template.

User Proprietary Data

PHENOM has many different types of users. While some users could be maintaining the DSDM for the benefit of all users, others may privately be developing proprietary or competitive products. By default, all user-generated content is private until it is shared with its parent. Permission and ownership of the data is transferred when it is shared with the parent. As long as data remains within a specific project, this should not provide any concern to an organization. However, as soon as the Project Admin shares the content with the DSDM, the ownership of the information is transferred to Skayl, LLC so that it can be included in the baseline DSDM with the appropriate Government Purpose Rights.

Does Proprietary Data Hurt Data Models?

Not really.

Data Models are intended to be used for documentation of interfaces. Documentation is meant to be shared. As long as the data model is shared with the people who need to see it, then it serves its purpose.

Contributing modeled content back to the larger community has the potential for saving a lot of redundant work. It adds efficiencies to the process. The only real challenge is when duplicate semantics are created. When some information is being held in a private branch, the content may be created publicly. So while this may have long term effects on interoperability, these structures are traceable and there is a path to merge them with the shared data model should the need ever arise.

Technical Concepts

Version Control

Version Control is the process of tracking the evolution and history of a product. In its simplest form, version control insures that the distinct instances of a product are clearly and unambiguously identified. This allows specific versions of the product to be reliably and accurately be referenced.

Version Control can be handled many ways as long as the primary goal - faithfully reproduction of the model given a specific label identifier – is accomplished.

Due to the way PHENOM stores it information, version control is greatly simplified.

How PHENOM Supports Version Control?

Tag-based Version Control

With tag-based version control, users are able to mark the current state of their model with a label indicating a desired version of the model. This will allow them to pull that version of the model in the future.

Post-Hoc Version Control – Available 2Q2019

Post-hoc version control will allow users to create a label for their model at any point in the past. While it may be beneficial to create labels as you model, this feature will allow retroactive creation of these labels.

Change Management

Change management is related to the idea of version control, but it is more focused on keeping the product consistent and cohesive as multiple users simultaneously make changes. When individuals make changes in different, unrelated parts of the product, change management is very tractable. In this case, changes can exist in their respective areas of the product and they may never interact with each other.

In other cases, users may be changing related things (or even the same thing). In this case, it is not possible for the system to automatically determine which user change is the “correct” change. While modern software version control and change management systems have gotten very good at deconflicting these cases, many cases still require a human to manually merge the two changes.

Most modern systems are not entirely project aware. When two components change, they may be a downstream effect that breaks the entire system. Even if the changes appeared unrelated, the impact caused the merged product to break. There are many tools that help mitigate this problem, but this is another challenge related to change management.

Trusted Peer Environment

In an open, development environment, the workflow may be relatively simple. When the individual contributors can be trusted (in both their skill and their character), change management can be much simpler. In this situation, contributors are viewed as peers. Each contribution is accepted as an incremental improvement to the core project and verification is a part of the test and release process.

There are many implementations of this system, but type of approach is often found within a software development team that work together for a single organization. Not all teams are quite as cohesive – or as trusted.

Untrusted Peer Environment

In some projects – like an open source project – there may be an “approver” who moderates the incorporation of new changes. In these types of projects, there are many dynamics that may not (or, ideally, do not) occur in a corporate environment. Many open source projects are volunteer driven. This means that all changes are made on a best effort basis. One developer may make a significant contribution, but what if that change negatively impacts another piece of the software. It may take weeks (or months) for that other area of the software to get fixed since the developers working on that portion of the software may not be available.

Developers may even have a disagreement about the best way to implement a feature. Unchecked, this could cause conflict within a team. When the process is not managed, it is also possible for bad actors to introduce defects into the code base.

In yet other cases, there may be a need for multiple approvers (often called a change control board (CCB)) to manage a set of changes. Like an approver, a CCB typically represents a trusted set of individuals to whom the authority to approve changes has been granted.

While all CCB members may share the same level of vote (i.e. one vote per person), they may not all represent the technical aspects of the product. It is possible that some members of the CCB are appointed to maintain the overarching vision of the product to ensure that changes do not cause unnecessary deviation.

In CCBs, it is important to have clear acceptance criteria, voting guidelines, and clearly understand the role of each member.

Change Management of Digital Formats

In the change management of digital products, a simplified description of the change management process is that every byte of data is compared with the corresponding byte in the previous version. The presence (or absence) of each byte is annotated in a format that, when applied as a transformation, converts the previous version to the latest version.

Changes can be difficult to detect. When data is added or removed, it is necessary for the change tracker to determine where the original content continues. In cases where there may be repeated patterns, the annotation may be unexpectedly complex.

There may also be times when multiple independent contributors change the same aspects of the product. How does the system determine whose changes are accepted? In most cases, these changes cannot be automatically resolved. The contributors are notified of such conflict when the products are merged together and their input is requested as needed. In these cases, the deconfliction process is manual and it is up to the contributors to decide which version of the product is correct. This often requires out-of-band communication to resolve the conflict.

Change Management of Data Models

Traditional change management tools are insufficient for some highly connected forms of data such as data models. As previously described, these tools focus on the textual differences between subsequent versions of the same data. This presumes that the order of all the data is a relevant metric for determining a difference. In the case of data models, it is entirely possible to have two completely equivalent models that are not stored in a byte-wise congruent format.

If order does not matter for data models, what determines equivalence? While the order of the data is not entirely relevant, the content of the data is and, just as importantly, so are the connections between the data. Thus, for data models, the relevant meaning (beyond the data itself) is captured in the structure of the content. As such, existing tools and processes (which focus on data – primarily the order of the data) are not effective for maintaining changes to data models.

It is possible to apply existing tools to highly connected data models. The UCS Working Group successfully applied Subversion (SVN) version control software to manage their data model. While it was possible to leverage this software, it was accompanied by a 12-step check-in/check-out procedure that was required to maintain system consistency. This is not meant to be a condemnation of the UCS Working Group, it merely seeks to highlight that, while a tool may be used, it is not necessarily the best tool for the job. Furthermore, Subversion is still based on the ordering of content and primarily functioned by tracking changes in highly ordered text files. While it was, in fact, able to help the system manage changes, it required the content in the text files to be maintained in a proper order.

Change Management of UCS/FACE Data Models

UCS and FACE data models are highly connected and highly structured. Their construction follows a set of rules which must be followed to yield a valid model. These rules specify the relationships of the models – not just of the nodes themselves (this is governed by the metamodel) but also by the OCL (object constraint language) which governs the data in the model.

These data models simultaneously exist in (at least) three different levels of abstraction. Each level of the model represents a different set of properties – all of which are, in some way, related. This means that when data at one level of the model, it may (or may not) affect data in other levels of the model.

When a highly connected model like this is changed, it is possible that other nodes are impacted. As with the basic change management, it is possible for two contributors to modify the same (or related) parts of the model. In the case of a highly linked model, it is necessary to calculate the possibility impact of one change elsewhere in the model. That is, since a data is characterized by its relationships just as much as its contents, it is also necessary to manage changes to the relationships.

Change management systems tend to make no assumptions about the products they contain. For the most part, these products are just large collections of binary data. While this makes for an excellent, general purpose tool, imagine the kind of efficiencies that could be realized if the system were programmed with some knowledge of the type of data it contained.

Change Management & PHENOM

Taking the aforementioned concept of change management a step farther, PHENOM applies the rules of construction (metamodel) and validity (OCL) to data models.

How PHENOM Supports Change Management?

Branch-based Change Management

Each user gets their own model branch to which all of their changes are applied. Some users may have a branch they share with other users. In this case, users may choose their branch or another branch they have access to. If working in someone else’s branch, it is still necessary for the users to perform a little bit of manual deconfliction.

For the time being, this means that user changes are relegated to the branch. However, as a result of the way PHENOM stores these changes, this work will be able to be accessed as additional capabilities come online.

Basic Approver & Merge Support

This feature implements a statically configured approval process for a given customer. It will allow a “change approver” to be assigned. And, if the change is approved, other branches in the project will be notified and given the option to accept them.

Custom Workflows – Available 4Q2019

Organizations may have different ways they want the change management process to be implemented. Custom workflows will allow users to specify a change management process specified to their organization.

Roles in a Change Management Process

The primary user roles are contributor, approver, and change control board. These are the crucial roles and it is important to understand what needs to go into selecting each role.

Contributor

A contributor is, in effect, a user of the system. These are the members of the team making technical changes and adding information/structure to the system. It is possible for a contributor to also be a member of the Change Control Board, but user should abstain from voting in these cases.

A contributor is allowed to read and write content to the model. The data modelers and engineering team are often contributors. Although this team needs a fundamental understanding of data modeling concepts and mechanics, it is not required for them to have a comprehensive understanding. There may be patterns the contributors are required to follow, but these are subject to approval when submitted to the rest of the model.

Approver

An approver is a part entrusted to evaluate a change proposal and either approve or disapprove the change. Should the change be approved, the approver may or may not be the person who implements it. Also, a disapproved change may not be entirely rejected. It is possible for the approver to return the change and request additional information.

It is possible that approvers have varying levels of knowledge. Approvers may be technically astute and know myriad details about the content. Approvers may also be experts in a subject matter and merely approve the correctness of the change. It is also possible for approvers to know more about the desired/allowed structure of the model and have little technical knowledge whatsoever.

A change approver is an important role. This is an individual who has been granted the administrative capability of approving user-submitted content for inclusion into the baseline data model. Anyone serving as an approver should have a firm grasp of the data architecture and any architectural patterns the team wishes to espouse. They are the ultimate authority for what content is shared within their project.

The approver is also responsible for accepting new data model from its parent (the model from which it is derived). This operation requires the approver to have an idea of what impact these new changes will have. Fortunately, PHENOM will provide a full report of all impacts the approval operation will have on the data model, but it will be incumbent upon the approver to understand the impacts and their proposed resolutions.

Currently, while PHENOM may allow many approvers for a data model, approval only requires the authority of one approver. Although this is the default behavior, the customizable workflow feature (4Q2019) will allow the process to be changed.

An approver for a particular node also has the ability to push changes to the parent model. When this data is being shared within an organization (or a specific project), little additional consideration is needed. However, if the approver pushes a project's changes into the top-level model (DSDM), they are releasing these changes to the community, effectively assigning them the same data rights as the parent model.

Change Control Board (CCB)

A CCB is a committee of approvers who are all tasked with the responsibility of approving or disapproving a change. Typically, some sort of majority vote is required for a CCB to accept a change.

Although it is not a formal part of the existing workflow, it is recommended that the approver be a technical representative for (or participant in) a change control board. Ultimately, it is recommended that there be a committee who understands the technical aspects of the proposed changes. Features are either approved or disallowed based on the recommendations of the CCB.

Since the CCB is not an technical feature of PHENOM, this would be a process individual organizations would be left to implement.

Data Sharing, Protection & Partitioning

PHENOM is a collaborative data model editing platform. This article will describe how data is shared, protected, partitioned and managed across PHENOM's many users.

Allocation of Domains

In the context of data models, a domain is a collection of related information. This partitioning data limits the scope of a particular data model so it does not attempt to "capture the entire world." While it might be helpful to have a model capturing all aspects of our world, the increased content would make it more difficult to search and use. Furthermore, it would provide the ability for too much nuance in our definitions when much simpler documentation is sufficient.

As a result, PHENOM is deployed on a per-domain basis. This allows for collaborative contribution to a domain specific data model. The Air System DSDM (Skayl's flagship data model) contains over 1,000 entities and over 13,000 attributes all related to air vehicles. While not exhaustive, this model provides an extensive (and remarkably useful) starting point.

The Structure of Sharing

This section will explain how data models are shared and private at the same time. The following diagram illustrates the structure that PHENOM uses to both partition and share data.

The hierarchical nature of PHENOM's structure is characterized in the diagram. The pattern represents the canonical layout of a PHENOM deployment. The platform has tremendous flexibility to allow for the creation of different structures, however, these, too, follow the general pattern expressed below.

The DSDM lives at the top of this hierarchical organization. The DSDM is the baseline model for all models on a particular server.

The next organizational unit is a project. A project allows one group (company or other) to own a single, baselined copy of the data model. When a new project is created, it inherits a complete copy of the existing DSDM.

The next level down is a user. When a user is added to a project, they are given their own copy of the project's model. Their copy is based on the latest state of the data model with all of the accepted changes incorporated into it.

Change Approvals

Each level is isolated from another by a Change Approval process. This prevents proprietary, incorrect, or experimental modeling from leaking out of a branch. In order for any changes (additions or modifications) at one level of the model to be shared, a user must explicitly initiate a "push" action. After making changes, a user may select which changes they want pushed into the project and then issue a push command. At this point, the model remains unchanged, but the Project Admin will receive a notification that there is new content available to approve.

The Project Admin is another user who has been empowered to approve user changes to the model. During the approval process, the Project Admin may choose which changes to admit (or not).

Once changes to the project model have been accepted, users of that project will be notified that there are model updates that they may accept into their copy of the model. Within a single project, it is highly advised that users accept changes from the parent branch.

Changes to & from the DSDM

The project can also be considered the boundary of protection for proprietary data. No project information will appear in the DSDM unless it has been through the change approval process described above. Once data from a project is "pushed" to the DSDM, the DSDM Admin must approve the changes.

At this point, the changes are available for existing projects to accept as they so chose. These changes will comprise the baseline for new projects.

Even as the DSDM is improved over time, it still requires an action on behalf of the project administrator to accept these changes.

Current & Future Implementation Notes

At the current moment, Change Approvals cannot be done piecemeal. That is, a Project Admin is required to accept all or nothing for a given approval.

In the future, PHENOM will provide the ability to create custom approval workflows. This is envisioned to allow different approval schemes such a vote counting of a change management committee. It may also be possible to push changes down to user branches without requiring their explicit acceptance.

Data Protection

The structure of data management within PHENOM prevents data from leaking into other, possibly proprietary data models. Data model content can only be shared two different ways.

First, users may share each others branches. In general, this approach is not recommended because it makes it possible for two users to develop redundant content. However, in a team with co-located individuals that communicate frequently, this is a convenient mechanism to keep model content in sync.

The second mechanism is the formal data model change management process formally discussed. Changes must be passed to the parent model (the "parent" is the model from which the current model is directly derived), approved, and the accepted by the peer.