Interior
Chapter 7
Distributed Systems
Management Architecture
Version 2.0

Chapter 7. Distributed Systems Management Architecture
7.1 Introduction and Background
Principle 1: Provide Reliable Metrics
Principle
2: Maintain Network Interoperability
Principle
3: Support Business
Continuity
Principle
4: Information Access
Principle 5: Reuse Technology Components
Principle
6: Support Security, Privacy
and Confidentiality
7.3.1 Authentication / Single Sign-on
(SSO)
7.3.2 Supporting Network Services
The focus of the Interior Enterprise Architecture is on providing guidance for information technology (IT) issues and initiatives that are Interior-wide or multi-bureau in scope. The Distributed Systems Management (DSM) architecture defines how the hardware and software components of the environment will be controlled. Perhaps more than any other domain, the success of distributed systems management depends upon comprehensive governance policies, procedures and processes being in place and enforced.
If used correctly, the Interior Enterprise Architecture will act as a catalyst for those looking to capitalize on its contents and better understand the full meaning of its guidance. This understanding will permit IT personnel to better engage the non-IT organization in discussions around tradeoffs and priorities within the proper governance structure (e.g., Management Improvement Team (MIT), Information Technology Management Committee)). The Interior Enterprise Architecture is not intended to be the “last word” (e.g., some automated checklist for product selection). It is intended to be one of the “first words” to assure that Interior’s mission priorities and its IT priorities remain closely aligned.
Because Interior is incorporating the OMB’s Federal Enterprise Architecture (FEA) models, the technical guidance provided by the subject area experts within a domain spans both the Service Component Reference Model (SRM) as well as the Technical Reference Model (TRM). For the Distributed Systems Management domain, the SRM elements are as follows:
Service Domain(s): The Back Office Services Domain defines the set of capabilities that support the management of enterprise planning and transactional-based functions.
The Customer Services Domain defines the set of capabilities that are directly related to an internal or external customer, the business’ interaction with the customer, and the customer driven activities or functions. The Customer Services domain represents those capabilities and services that are at the front end of a business, and interface at varying levels with the customer.
The Support Services Domain defines the set of cross-functional capabilities that can be leveraged independent of Service Domain objective and / or mission.
The Business Management Services Domain defines the set of capabilities that support the management of business functions and organizational activities that maintain continuity across the business and value chain participants. The Business Management Services domain represents those capabilities and services that are necessary for projects, programs and planning within a business operation to successfully be managed.
Service Type(s): Assets / Materials Management – defines the set of capabilities that support the acquisition, oversight and tracking of an organization's assets.
Customer Relationship Management - defines the set of capabilities that are used to plan, schedule and control the activities between the customer and the enterprise both before and after a product or service is offered.
Communication - defines the set of capabilities that support the transmission of data, messages and information in multiple formats and protocols.
Customer Initiated Assistance - defines the set of capabilities that allow customers to proactively seek assistance and service from an organization.
Systems Management – defines the set of capabilities that support the administration and upkeep of an organization’s technology assets, including the hardware, software, infrastructure, licenses and components that comprise those assets.
Organizational Management – defines the set of capabilities that support both collaboration and communication within an organization.
Security Management – defines the set of capabilities that support the protection of an organization's hardware/software and related assets.
Component(s): Asset Cataloging / Identification – defines the set of capabilities that support the listing and specification of available assets.
Asset Transfer, Allocation, and Maintenance – defines the set of capabilities that support the movement, assignment, and replacement of assets.
Computers / Automation Management – defines the set of capabilities that support the identification, upgrade, allocation and replacement of physical devices, including servers and desktops, used to facilitate production and process-driven activities.
Facilities Management – defines the set of capabilities that support the construction, management and maintenance of facilities for an organization.
Property / Asset Management – defines the set of capabilities that support the identification, planning and allocation of an organization's physical capital and resources.
Contact Management – defines the set of capabilities that keep track of people and the related activities of an organization.
Customer Analytics - defines the set of capabilities that allow for the analysis of an organization's customers as well as the scoring of third party information as it relates to an organization’s customers.
Customer / Account Management – defines the set of capabilities that support the retention and delivery of a service or product to an organization's clients.
Customer Feedback – defines the set of capabilities that are used to collect, analyze and handle comments and feedback from an organization's customers.
Event / News Management – defines the set of capabilities that monitor servers, workstations and network devices for routine and non-routine events.
Assistance Request - defines the set of capabilities that support the solicitation of support from a customer.
Online Help – defines the set of capabilities that provide an electronic interface to customer assistance.
Self-Service – defines the set of capabilities that allow an organization's customers to sign up for a particular service at their own initiative.
Change Management – defines the set of capabilities that control the process for updates or modifications to the existing documents, software or business processes of an organization.
Configuration Management – defines the set of capabilities that control the hardware and software environments, as well as documents of an organization.
License Management – defines the set of capabilities that support the purchase, upgrade and tracking of legal usage contracts for system software and applications.
Remote Systems Control – defines the set of capabilities that support the monitoring, administration and usage of applications and enterprise systems from locations outside of the immediate system environment.
Software Distribution – defines the set of capabilities that support the propagation, installation and upgrade of written computer programs, applications and components.
System Resource Monitoring – defines the set of capabilities that support the balance and allocation of memory, usage, disk space and performance on computers and their applications.
Network Management - defines the set of capabilities involved in monitoring and maintaining a communications network in order to diagnose problems, gather statistics and provide general usage.
Role / Privilege Management - defines the set of capabilities that support the granting of abilities to users or groups of users of a computer, application or network.
User Management – defines the set of capabilities that support the administration of computer, application and network accounts within an organization.
These SRM service elements are likewise supported by Interior’s IT (technical) infrastructure (e.g., servers, networks). Within this infrastructure are individual TRM components for which this domain team is providing guidance. The graphic below outlines those TRM elements for this domain that support the service needs of the SRM.
Additionally,
it’s doubtful that a single domain chapter from the TRM can be used to address
a substantive issue. More realistically,
a few architecture domains may need to be reviewed when addressing an important
IT decision. For example, if Interior
was considering the creation of a new Interior-wide Web application that could
be used both by the general public and Interior personnel, then the TRM
chapters like Data Management Technologies, Information Security, Distributed
Systems Management and Application Development might all need to be reviewed.
The
principles listed below provide guidance for the design and selection of
technology components that will support the distributed systems management
needs of Interior-wide IT initiatives.
Principle 1:
Provide Reliable Metrics
|
|
|
|
Select appropriate tools to provide reliable metrics information and reports for proactive distributed systems management. Rationale:
Implications
4.
Need to ensure that the “overhead” of management tools
6.
Need appropriate training for tools to
understand and utilize full capabilities. |
|
|
|
Principle 2: Maintain Network Interoperability |
|
|
|
Use networks management, systems management and performance monitoring tools to maintain the interoperability of the network. Rationale:
Implications
|
|
|
|
Principle
3: Support Business Continuity
|
|
|
|
Use distributed systems management tools to support business continuity planning and operations. Rationale:
Implications
|
|
|
|
Principle 4: Information Access |
|
|
|
Ensure that information is stored so that it is accessible for short and long term needs. Rationale:
· Provides capacity and growth planning metrics. Implications
|
|
|
|
Principle 5: Reuse Technology Components |
|
|
|
Use distributed systems management tools to determine the availability and appropriateness of reusable technology components. Rationale:
Implications
4.
Need a formal mechanism to proactively promote technology reuse
opportunities. |
|
|
|
Principle
6: Support
Security, Privacy and Confidentiality
|
|
|
|
Select distributed systems
management tools that are aligned with security, privacy and confidentiality
legislation and policies. Rationale: ·
Reduces the likelihood of
divulging employee and customer privacy information or sensitive systems
information. ·
Enhances Interior’s security
posture. ·
Reduces Interior’s legal
risk. ·
Enhances public trust. Implications 1. Need to be
aware of unintended consequences (i.e., using certain tools increases the
risk of exposing sensitive information). 2. DSM tools
and IT staff need to have a high level of authority to function; therefore,
the IT staff needs higher security awareness and accountability. 3. Need to
know which security, privacy and confidentiality legislation and policies are
in place. 4. IT staff
needs appropriate training on DSM tools; users and managers need to be
informed about the purpose, appropriate use, functionality, capabilities and
limitations of DSM. 5. DSM tool
usage needs to be limited to the appropriate operational levels. |
The Distributed Systems Management components in this domain include:
· Authentication / Single Sign-on (SSO) – Refers a method that provides users with the ability to log-in one time, getting authenticated access to all their applications and resources.
· Supporting Network Services - These consist of the protocols that define the format and structure of data and information that is either accessed from a directory or exchanged through communications.
· Deployment Management – Refers to the capability of software delivery to remote networked desktops, servers, and mobile devices across an enterprise.
· Other Applications – Refers to software applications that do not fit in any of the other aforementioned software categories. Due to the nature of distributed systems management application is this category will be differentiated by the TRM Sub-component.
The classifications for any products or standards within this domain are:
Life Cycle Definition/
Classifications Meaning
Preferred Product/standard of choice; support available; recommended.
Contained Develop solutions using these standards or products only if there are no suitable alternatives categorized as preferred; if a preferred product is available that will meet the requirements, plans should be developed to move from contained to preferred as soon as practical.
Obsolete Being phased out; (e.g., vendor support ending); plans should be developed to rapidly phase out and replace (often to avoid substantial risks).
Research Product/standard to be used in conjunction with technology research efforts only (e.g., testing, pilots).
Rejected Product/standard has been evaluated and found not to meet technical architecture needs.
Authentication / Single Sign-on (SSO) refers a method that provides users with the ability to log-in one time, getting authenticated access to all their applications and resources.
Supporting Network Services consist of the protocols that define the format and structure of data and information that is either accessed from a directory or exchanged through communications.
Standards:
Products:
Deployment Management refers to the capability of software delivery to remote networked desktops, servers, and mobile devices across an enterprise. Deployment automation tools provide centralized and accelerated delivery of applications to users via push technologies, eliminating the need for manual installation and configuration.
Other Applications refers to software products that do not
fit in any of the other aforementioned software categories but also are used in
conjunction with data management processes. Applications in this category perform a wide
range of distributed systems management functions and are represented by TRM
Service Sub-standard.
Network Element Manager:
LAN and System Element Manager: (See Network Element Manager above for the preferred tool.)
Software Distribution:
Asset Management:
Help Desk:
Performance Management:
Capacity Planning:
Change Control:
Backup & Recovery:
Event Fault Manager:
Miscellaneous:
The Domain Principles, because they are derived from Interior's business direction and strategies, provide the primary direction and guidance around technology decisions within this domain. Additional benefit may sometimes be obtained by reviewing Select Best Practices. These reflect the valuable insights from either domain team members’ experiences or other public sector organizations.
SRM Focus
Select
Best Practice 1: Resolution databases- Resolution databases that contain solutions to recurring problems should
be built to improve quality and contain costs. The effort to resolve recurring
problems is significantly reduced. Education of new personnel is improved
because a knowledge base is developed and available to more quickly resolve new
problems with similarities to those previously encountered.
Select
Best Practice 2: Tiered support-: Multiple tiers or levels of client support should be employed to leverage
support resources and provide effective client support. Front-line support staff can handle most
problems, while more difficult problems will need quick escalation to
additional levels of expertise. A tiered system with defined response times
uses limited talent most effectively.
Select
Best Practice 3: Single point of contact- All technical support or help desk
implementations should have a single point of contact. The users are those being served and should
not have to expend additional effort to report problems and faults. Simplifying the reporting process helps
ensure rapid action.
Select
Best Practice 4: Designing to support an enterprise model- A single consolidated Tier-1 help desk
supports an enterprise model. A
consolidated help desk does not have to be physically located in one place.
However, it should have one constituency, one phone number, one set of
procedures, one set of defined services, and one set of integrated network
systems management (NSM) platforms and applications. It should take advantage of advanced
technology tools to help improve responsiveness and results in user support.
Select
Best Practice 5: Define Metrics- Reliable metrics and reports should be defined and used to assist managers, help desk staff, and the client community to assess the effectiveness of the help desk in meeting organizational goals. Both consolidated high level and low-level detailed measures are critical to successful service desk operations. Metrics should be used to identify trends and to support a proactive management approach that anticipates and avoids problems. Methods and procedures to solve problems should be developed, published, followed and measured. Service level agreements (SLA’s) should be developed stating responsibilities of both the help desk and its clients. SLA criteria are one method to evaluate help desk performance
Select
Best Practice 6: Automated Report Card/Dashboard- To track performance and/or tuning results,
interface all areas toward a central repository and owner, with appropriate
authority granted to the owner. Outline
a common report card for monitoring performance results, and provide online via
Web-services. Determine the method of
tracking a “dashboard” for current values and issues within the system. This
can coincide with the report card but should be managed and implemented
differently due to the timeliness of data from these mechanisms. Publish SLAs for both expected and actual
results, and educate the staff on their meanings.
Select
Best Practice 7: Design to Share Information- Geographically dispersed help desk units must inter-operate and share information. All requests for service should reside in a database that is shared by technology and application-based help desk units serving specific constituencies throughout the organization. This process shares information and makes it possible for one help desk to electronically pass a service request to another help desk without forcing the user to make another contact attempt. The use of technological advances, such as distributed processing, dynamic control of users desktop, improved telephony, and client support software, make it possible for geographically dispersed help desk groups to function as a cohesive support unit.
Select
Best Practice 8: Configure for Remote Management and Support- Equipment deployed in virtual data centers must be configured to facilitate remote management and support. Identical configurations of rack-mounted servers are placed in secure locations (closets). For reliability and ease of support, each major application should be placed on a uniformly configured server. This may require that each major application be implemented on its own server. Use the same reference configuration on these servers. Important items to consider when planning for consistency, include using the same versions of network software, using the same network hardware cards, etc. Systems management tools, consistently applied, allow management of multiple instances of the identical network configurations at remote sites as if they were on the data center floor.
Select
Best Practice 9: Manage the Life Cycle of Software/Applications- Leveraging common life cycle management techniques will help reduce costs in software and replacement, along with the effort required to implement new applications and services. Develop single standard installation process for workstations and servers, providing a common platform and standards to use for application installs and upgrades. Develop standard process for certifying products and upgrades, and use this in testing and certifying new applications, hardware components, etc. This will allow for verification of these products as new operating systems are implemented. Develop a joint business/IT process for exceptions-escalation that is business needs driven and integrated into the acquisition process.
Select
Best Practice 10: Structure for Audit and Policy Management- Systems and tools selected for distributed systems should be deployed with tools for auditing and managing storage processes. This would include the following areas: auditing and reporting on space usage, aging of data/files, and ownership of data; trend reporting for space usage to plan ahead for future needs.
Select
Best Practice 11: Develop and Maintain Simple Designs- An attempt at tuning may generate worse
results when engineers try to make changes that are unproven. The focus of IT
should be to minimize changes from the industry defaults/standards unless such
change is warranted and proven as appropriate. Changes in defaults are often of
little value and may result in future work to keep the standard in place. Maintain default values unless tested and
proven, or instructed from appropriate resources within the product
vendor. Document changes from defaults
and reasoning for changes in a Change Log Manual. Review changes semi-annually and determine if
additional issues should be reviewed or updated.
Select
Best Practice 12: Communicate the Maintenance Schedule- All systems require periodic maintenance,
enhancements and changes. Distributed systems require consistent maintenance
also, which involves scheduled downtime. It should be communicated to the user
community and followed appropriately to ensure the maximum availability for
users and mobile staff. Outline the
expectations and schedule for maintenance and communicate to the users. Provide a change management process for
testing and implementing maintenance that does not impact the users negatively
and reduces risk of a poorly implemented change. Ensure all phases of the maintenance schedule
are communicated including completion to ensure closer to ensure users can plan
work accordingly.
The quality of the Interior-wide guidance provided within this TRM chapter is a reflection of the efforts of the Distributed Systems Management Architecture team. The members of the team are:
Organization Name
Office of the Special Trustee Frank Olguin
Office of Surface Mining Ron Bryan
Minerals Management Service Greg Mormile
National Park Service John Snyder
Bureau of Reclamation Kevin Kelly
Fish and Wildlife Service Rhoda Upshur-Dunn
Bureau of Land Management David Pearson
Bureau of Land Management Bruce Allen
US Geological Survey Bill Reilly
| Disclaimer | Privacy Statement | FOIA | E-Gov | USA.gov | White House | DOI Home |