Private Cloud Infrastructure as a Service Automation and Orchestration

Automation and orchestration within a private cloud is one of the key enabling technologies that allow both consumers and providers of a Private Cloud Infrastructure as a Service platform to deliver IT services in a predictable, secure and repeatable manner that conforms to industry and business standards and practices.

All layers of the Private Cloud Reference Model are influenced or operated upon by automation in the private cloud. Since a private cloud is the composition of several process and implementation technologies throughout the layers of the reference architecture there are similarly several forms of automation technologies provided by cloud computing platforms that are available to the architect.

This article discusses automation in a Private Cloud Infrastructure as a Service platform in the context of how automation applies to the consumer and provider of automation in a private cloud.


Note:
This document is part of a collection of documents that comprise the Reference Architecture for Private Cloud document set. The Reference Architecture for Private Cloud documentation is a community collaboration project. Please feel free to edit this document to improve its quality. If you would like to be recognized for your work on improving this article, please include your name and any contact information you wish to share at the bottom of this page.

This article is no longer being updated by the Microsoft team that originally published it.  It remains online for the community to update, if desired.  Current documents from Microsoft that help you plan for cloud solutions with Microsoft products are found at the TechNet Library Solutions or Cloud and Datacenter Solutions pages.


1 Capability as a Service

In cloud computing there are several domain areas where “as a Service” is used to describe the capability provided by the domain. In Infrastructure as a Service the Private Cloud Reference Architecture defines cloud like characteristics presented by the platform by taking the service providers approach. This approach has implications and expectations for both the provider and consumer of the platform providing the service.

Automation is the enabler that allows the provider of the Infrastructure as a Service platform to offer building blocks of resources to consumers that request these resources and compose them into higher level services to meet their business needs.

The automated collection of resources and services combined with management and service operations offered by the platform form the essence of providing infrastructure as a service capability.

1.1 Consumer View

Consumers are quickly expecting certain characteristics of IT in fulfilling their business needs. These expectations align with three NIST definition characteristics of cloud computing. These are:

  • On-demand self-service. A consumer can independently and unilaterally provision computing capabilities, such as compute time, network connectivity and storage, as needed automatically without requiring human interaction with each service’s provider.
  • Rapid elasticity. Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out, and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
  • Measured Service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, compute, bandwidth, active user accounts, etc.). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.

Consumers of IT platform services have the business on their minds and have decomposed needs and problem areas into processes, tasks and timelines for addressing those needs. In short they have a plan that includes IT capabilities. Part of that plan will include execution steps for development, test, staging and deployment of business applications hosted on IT resources. The consumer wants to select from a set of IT resources and services using self-service to acquire them and compose into a higher level of service to meet their needs.

Over time consumers will need to expand or collapse resources that make up services to respond to changes in business need. This elastic capability also carries the implied characteristics of performing these elastic responses in a rapid or agile manner.

The consumer also has the expectation that resources will be managed and optimized on their behalf in the most efficient and secure manner. That is the user is not concerned with the details of how a resource presents a capability. Their concern is that the capability is there reliably and they are charged appropriately for the amount that they use the resource or service.

To compose services holistically the platform must provide the capability to stitch together resources and services offered by the platform into user defined services. The user defined services become an automated operation available to the user through self-service.

1.2 Provider View

The provider view takes the form of internal IT delivering the Private Cloud Infrastructure as a Service platform within their organization. In addition to the above cloud computing characteristics the remaining characteristics also apply:

  • Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms.
  • Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources, but may be able to specify location at a higher level of abstraction (e.g., country, state, region or datacenter). Examples of computing resources include storage, processing (compute), memory, network bandwidth, and virtual machines.

The cloud computing Deployment Models defined by NIST also drive automation requirements and capabilities since resources and services may be deployed privately on-premise, hosted or in a combined hybrid cloud deployment model.

Providers require automation to drive predictable results in every layer of the Private Cloud Reference Architecture starting with the foundation of providing Infrastructure as a Service through higher service models that include Platform and Software as a Service.

Starting at the lowest layers of the infrastructure, providers use automation to perform bare-metal provisioning and configuration of hardware resources that compose pools of these resources that are allocated by provisioning jobs.

IT uses automation to create and update the enterprise service catalog and configuration management data stores dynamically as resources are built. The service and configuration stores are leveraged by automation to make further downstream decisions such as host selection and intelligent placement.

Automation continues throughout the service management lifecycle to include business processes such as the onboarding of new applications or users to the platform and performing routine management and updates of the platform.

2 Alignment to the Reference Model

As mentioned earlier in this article the automation capability is key in enabling the platform deliver on cloud computing characteristics. In this section we’ll look at each layer of the reference architecture and their respective components and define where automation influences component operation.

2.1 Service Delivery

The Service Delivery layer is the interface between business and IT. It serves as the conduit for translating business requirements into IT services and is responsible for managing ongoing delivery of those services.

Service Delivery contains several components that are directly integrated into, have a dependency upon or influence private cloud platform automation. Each component is listed here. For more information about the component refer to the Private Cloud Reference Model.

2.1.1 Financial Management

Financial Management incorporates the functions and processes used to meet a service provider’s budgeting, accounting, metering, and charging requirements. While the primary concerns for Financial Management are for providing cost transparency, the overall costs of operating infrastructure components is directly tied to how efficiently these components are operated throughout the lifecycle of the infrastructure.

Financial Management must provide data artifacts about facility operating costs that include power, cooling and other environmental costs that impact the total operating costs of the infrastructure. This information may also include data that define peak periods where costs are at their highest so automation activities can take this data into account when determining task start times and areas of the infrastructure that can be taken offline and powered down.

2.1.2 Demand Management

Demand Management involves understanding and influencing customer demands for services, plus the provision of capacity to meet these demands. The process of understanding service demand is a business intelligence function that allows the organization to model the characteristics of service needs and project them on the Infrastructure as a Service platform to define new areas of automation that must be defined to meet demand.

Demand Management initiates automation to respond to business demand for resources while insuring the necessary resiliency exists to meet demand in the presence of resource decay.

2.1.3 Service Catalog and Life Cycle Management

Service Catalog Management is influenced by Business Relationship Management and takes the end-to-end view of services offered by the platform. Automation includes activities that continually update the status or state of the service catalog and their attributes.

Service Life Cycle Management includes continued improvement of a service and that will involve the review and evolution of automation capabilities included in the service.

2.1.4 Service Level Management

In delivering a platform capability the provider designs for and builds capability that will allow it to meet its published Service Level Agreement (SLA). Deciding factors in determining an SLA is the predictability in establishing and operating the service in the steady state and during periods of degraded state. Automation is used not only in establishing a service but also during periods of diminished capability reacting to failure conditions. Automated response to failure is referred to as the remediation of failures and takes the form of some level of automation. This automation is always triggered as the result of an indident.

2.1.5 Continuity and Availability Management

Availability Management defines processes necessary to achieve the perception of continuous availability. Those processes will always include the use of automated procedures throughout the infrastructure fabric management to create redundancy where needed and resilient set of resources to maintain published SLA availability.

2.1.6 Capacity Management

Capacity Management defines the processes necessary to achieve the perception of infinite capacity. Capacity must be managed to meet existing and future peak demand while controlling under-utilization. Capacity Management is closely related to Demand Management and the same resource automation is leveraged to provision and collapse capacity as demand need changes.

2.1.7 Information Security Management

Information Security Management strives to make sure that all requirements are met for confidentiality, integrity, and availability of the organization’s assets, information, data, and services. Infrastructure automation must take into account security attributes assigned to resources during task operations to meet multi-tenancy requirements to prevent information security issues from occurring during the management of pooled resources.

2.2 Infrastructure Layer

The Infrastructure Layer provides hypervisor services (VM resources) to the Platform and Software Layers. It defines the capabilities necessary for these VMs to execute; it includes hypervisor, physical servers, network devices, storage systems, and facilities (which include space, power, cooling, and physical interconnects).

The Infrastructure Layer includes the physical hardware from many vendors. This creates the opportunity for many different types of automation technologies that may need to interact with others present in this layer.

The automation that exists in the layer will directly affect facility and hardware configuration while updating state in the service and configuration data stores.

2.2.1 Facilities

The facilities component is quite broad and contains many industrial control interfaces for monitoring and operation of power, cooling, airflow and other environmental concerns. The component also includes the interfaces that operate and monitor hardware racks in the datacenter.

Facilities also include the core communication capabilities and interconnect between devices in the datacenter. Automation in facility equipment may be specific to a component or hardware vendor and only communicate the resulting state of the component through its automation. For example the failure of an air handling unit that causes the shutdown of the unit and trigger an alarm that must be acted upon by a different set of automation.

2.2.2 Compute

Compute components include the physical servers used to host physical application workloads or virtual machine hosts. It is inclusive of all the device components within the server that can be operated and monitored externally.

Compute component automation includes the bare-metal provisioning of server hardware up to the point where the server is configured into the private cloud fabric management to assume the role of physical application host or virtual machine host.

2.2.3 Storage

Storage components represent physical storage devices that present units of storage consistent with the architecture of the component and present proprietary, industry standard (SMI-S) or both management interfaces allowing the discovery and provisioning storage capability provided by the component.

Automation at the component level will include:

  • Automated creation of higher level storage units such as arrays or volumes.
  • The establishing of hardware provided high availability characteristics.
  • Path and connectivity management.
  • Unit Management

2.2.3 Network

Network services provide addressing and packet delivery for the provider’s physical infrastructure and the consumer’s VMs. Network capability includes physical and virtual network switches, routers, firewalls, and Virtual Local Area Network (VLAN).

Network automation is provided by propriety management interfaces although there are emerging industry wide standards efforts that the architect should monitor.

2.3 Service Operations Layer

The Operations Layer defines the operational processes and procedures necessary to deliver IT as a Service. The main focus of the Service Operations Layer is to define the business requirements of the organization. Cloud-like service attributes cannot be achieved through technology alone; mature IT service management is also required.

2.3.1 Change Management

Change Management is responsible for controlling the life cycle of all changes. Its primary objective is to implement beneficial changes with minimum disruption to the perception of continuous availability.

Changes are developed in a non-production environment that mirrors the production environment assuring that the development and testing efforts occur in an environment that is most likely to show the same results as in the production environment. Testing of changes are final tested in a staging environment.

Changes are implemented into the fabric management automation and therefore a contributing technology that raises the overall maturity of the IT service management capability. Automation is software development and must come under the organizations Software Development Lifecycle (SDL) practices. In fact the automation is under change control and subject to the same change control processes as the rest of the IT organization.

2.3.2 Service Asset and Configuration Management

Automation is a consumer of configuration management and influences service assets. The automation components of a private cloud are designed and authored to consume declarative data held in a configuration store. Avoiding the hardcoding of configuration data allows automation reuse in private cloud fabric management scenarios simply by updating configuration held in the configuration data stores.

Reuse results in less development and testing effort of the fabric management automation. Over time this has the effect of increased predictability and agility of fabric management operations in the private cloud.

2.3.3 Release Management

Instantiation or upgrade of a release is accomplished by transitioning the release from staging to production through the use of automation that has been previously tested on the staging environment.

Automation is used by fabric management to perform updates of the private cloud infrastructure by defining a resource upgrade domain and performing the update in a predictable and repeatable manner. Updates continue for each upgrade domain until completed and the appropriate configuration and service management records are updated to reflect the change.

2.3.4 Knowledge Management

Knowledge Management is responsible for sharing and storing information in the enterprise. Automation plays both a direct and an indirect role in knowledge management. The notification processes used within an enterprise are driven by configuration data and automation. This automation usually implements a decision tree to select notification levels based on the severity or type of event being raised.

The fabric management automation is responsible for updating the service and configuration data stores when management operations are performed and this has the indirect effect of sharing information in the organization.

2.3.5 Incident and Problem Management

Incident management benefits from automation by creating well defined incident management processes utilized by staff to record and process an incident. Fabric Management uses automation to initially triage or remediate issues systematically. When automated remediation is not possible the data collection artifacts gathered by automation aids problem resolution.

2.3.6 Request Fulfillment

Requests for fulfillment of IT operations are the result of users making requests of the platform or through systematic activities requiring a resource change. These requests trigger automation to allocate, provision or change resources on behalf of the user or process making the request.

2.3.7 Access Management

Automation is used by fabric management to configure access control on resources that have been provisioned on behalf of a user or process. A service hosted on the private cloud infrastructure contains many access control boundaries that must be configured for appropriate access before the service is made available to the consumer. Automation of access control requests increases service predictability.

2.3.8 Systems Management

Systems management involves encoding common tasks that are performed often or on a scheduled basis. IT staff have long encoded common tasks into scripts that are reused as needed. This is a form of automation that carries forward to the private cloud. This same automation may be integrated into fabric management to perform common or remedial tasks on the infrastructure. In the case of the private cloud this system management automation is subject to change control.

2.4 Management Layer

The Management Layer contains the tooling capabilities required to execute and implement the Service Operations and Service Delivery processes and procedures that support IaaS, PaaS, and SaaS. These capabilities are incremental moving up through the Infrastructure, Platform and Software Layers. 

2.4.1 Service Reporting

Service Reporting in a private cloud can be a complex operation since the number instances reported on may be quite large. Automation may be used to facilitate the gathering of event and performance information from instances and correlating that data to a service and tenant.

This level of automation is generally provided by the private cloud platform monitoring tools to handle the details of collection and correlation of resource instance data. Once data has been captured and correlated the architect must define the appropriate thresholds for triggering Service Management operations.

2.4.2 Configuration Management

The configuration management system is a critical component of the private cloud responsible for collecting, maintain and exposing configuration data to all layers of the private cloud. Automation of management operations in each respective area of the private cloud causes configuration items for a resource to be created or updated. Automation continually uses configuration management data to define create or update services on the platform.

Over time configuration may fall out of the desired state and trigger remediation automation to correct the condition and update the appropriate data stores.

2.4.3 Fabric Management

Fabric Management is the toolset responsible for managing workloads of virtual hosts, virtual networks, and storage. Fabric Management provides the automation necessary to manage the life cycle of a consumer’s workload.

2.4.4 Deployment Management

Instantiation or upgrade of a release is accomplished by transitioning the release from staging to production through the use of automation that has been previously tested on the staging environment.

Automation is used by fabric management to perform updates of private cloud infrastructure by defining a resource upgrade domain and performing the update in a predictable and repeatable manner. Updates continue for each upgrade domain until completed and the appropriate configuration and service management records are updated to reflect the change.