U.S. Department of the Interior

 

 

Interior Enterprise Architecture

 

 

 

 

Chapter 7

Distributed Systems Management Architecture

Version 2.0

 

 

 

image 002

 

 

October 15, 2003



9.17.1              Introduction and Background

 

The focus of the Interior Enterprise Architecture is on providing guidance for information technology (IT) issues and initiatives that are Interior-wide or multi-bureau in scope. The Distributed Systems Management (DSM) architecture defines how the hardware and software components of the environment will be controlled.  Perhaps more than any other domain, the success of distributed systems management depends upon comprehensive governance policies, procedures and processes being in place and enforced.

 

If used correctly, the Interior Enterprise Architecture will act as a catalyst for those looking to capitalize on its contents and better understand the full meaning of its guidance. This understanding will permit IT personnel to better engage the non-IT organization in discussions around tradeoffs and priorities within the proper governance structure (e.g., Management Improvement Team (MIT), Information Technology Management Committee)). The Interior Enterprise Architecture is not intended to be the “last word” (e.g., some automated checklist for product selection). It is intended to be one of the “first words” to assure that Interior’s mission priorities and its IT priorities remain closely aligned.

 

Because Interior is incorporating the OMB’s Federal Enterprise Architecture (FEA) models, the technical guidance provided by the subject area experts within a domain spans both the Service Component Reference Model (SRM) as well as the Technical Reference Model (TRM). For the Distributed Systems Management domain, the SRM elements are as follows:

 

Service Domain(s):    The Back Office Services Domain defines the set of capabilities that support the management of enterprise planning and transactional-based functions.

 

The Customer Services Domain defines the set of capabilities that are directly related to an internal or external customer, the business’ interaction with the customer, and the customer driven activities or functions. The Customer Services domain represents those capabilities and services that are at the front end of a business, and interface at varying levels with the customer.

 

The Support Services Domain defines the set of cross-functional capabilities that can be leveraged independent of Service Domain objective and / or mission.

 

The Business Management Services Domain defines the set of capabilities that support the management of business functions and organizational activities that maintain continuity across the business and value chain participants. The Business Management Services domain represents those capabilities and services that are necessary for projects, programs and planning within a business operation to successfully be managed.

 

Service Type(s):         Assets / Materials Management – defines the set of capabilities that support the acquisition, oversight and tracking of an organization's assets.

 

Customer Relationship Management - defines the set of capabilities that are used to plan, schedule and control the activities between the customer and the enterprise both before and after a product or service is offered.

 

Communication - defines the set of capabilities that support the transmission of data, messages and information in multiple formats and protocols.

 

Customer Initiated Assistance - defines the set of capabilities that allow customers to proactively seek assistance and service from an organization.

 

Systems Management – defines the set of capabilities that support the administration and upkeep of an organization’s technology assets, including the hardware, software, infrastructure, licenses and components that comprise those assets.

 

Organizational Management – defines the set of capabilities that support both collaboration and communication within an organization.

 

Security Management – defines the set of capabilities that support the protection of an organization's hardware/software and related assets.

                                   

Component(s):            Asset Cataloging / Identification – defines the set of capabilities that support the listing and specification of available assets.

 

Asset Transfer, Allocation, and Maintenance – defines the set of capabilities that support the movement, assignment, and replacement of assets.

 

Computers / Automation Management – defines the set of capabilities that support the identification, upgrade, allocation and replacement of physical devices, including servers and desktops, used to facilitate production and process-driven activities.

 

Facilities Management – defines the set of capabilities that support the construction, management and maintenance of facilities for an organization.

 

Property / Asset Management – defines the set of capabilities that support the identification, planning and allocation of an organization's physical capital and resources.

 

Call Center Management - defines the set of capabilities that handle telephone sales and/or service to the end customer.

 

Contact Management – defines the set of capabilities that keep track of people and the related activities of an organization.

 

Customer Analytics - defines the set of capabilities that allow for the analysis of an organization's customers as well as the scoring of third party information as it relates to an organization’s customers.

 

Customer / Account Management – defines the set of capabilities that support the retention and delivery of a service or product to an organization's clients.

 

Customer Feedback – defines the set of capabilities that are used to collect, analyze and handle comments and feedback from an organization's customers.

 

Event / News Management – defines the set of capabilities that monitor servers, workstations and network devices for routine and non-routine events.

 

Assistance Request - defines the set of capabilities that support the solicitation of support from a customer.

 

Online Help – defines the set of capabilities that provide an electronic interface to customer assistance.

 

Self-Service – defines the set of capabilities that allow an organization's customers to sign up for a particular service at their own initiative.

 

Change Management – defines the set of capabilities that control the process for updates or modifications to the existing documents, software or business processes of an organization.

 

Configuration Management – defines the set of capabilities that control the hardware and software environments, as well as documents of an organization.

 

License Management – defines the set of capabilities that support the purchase, upgrade and tracking of legal usage contracts for system software and applications.

 

Remote Systems Control – defines the set of capabilities that support the monitoring, administration and usage of applications and enterprise systems from locations outside of the immediate system environment.

 

Software Distribution – defines the set of capabilities that support the propagation, installation and upgrade of written computer programs, applications and components.

 

System Resource Monitoring – defines the set of capabilities that support the balance and allocation of memory, usage, disk space and performance on computers and their applications.

 

Network Management - defines the set of capabilities involved in monitoring and maintaining a communications network in order to diagnose problems, gather statistics and provide general usage.

 

Role / Privilege Management - defines the set of capabilities that support the granting of abilities to users or groups of users of a computer, application or network.

 

User Management – defines the set of capabilities that support the administration of computer, application and network accounts within an organization.

 

These SRM service elements are likewise supported by Interior’s IT (technical) infrastructure (e.g., servers, networks). Within this infrastructure are individual TRM components for which this domain team is providing guidance. The graphic below outlines those TRM elements for this domain that support the service needs of the SRM.

image 004Additionally, it’s doubtful that a single domain chapter from the TRM can be used to address a substantive issue.  More realistically, a few architecture domains may need to be reviewed when addressing an important IT decision.  For example, if Interior was considering the creation of a new Interior-wide Web application that could be used both by the general public and Interior personnel, then the TRM chapters like Data Management Technologies, Information Security, Distributed Systems Management and Application Development might all need to be reviewed.

 

7.2       Architectural Principles

 

The principles listed below provide guidance for the design and selection of technology components that will support the distributed systems management needs of Interior-wide IT initiatives.

 

Principle 1: Provide Reliable Metrics

 

Select appropriate tools to provide reliable metrics information and reports for proactive distributed systems management.

 

Rationale:

  • Creates reliable metrics and reports to measure success, provide feedback and enable future planning.

 

Implications

  1. Need to develop and follow an appropriate use policy for using measurement tools.
  2. Need to understand and follow the laws and regulations governing monitoring.
  3. Need to develop metrics definitions and select tools that support those definitions.

4.      Need to ensure that the “overhead” of management tools areis not too intrusive so that it outweighs the value they provide.

  1. Need to overcome resistance to having machines touched by monitoring tools; need management buy-in.

6.      Need appropriate training for tools to understand and utilize full capabilities.

 

 

Principle 2:     Maintain Network Interoperability

 

Use networks management, systems management and performance monitoring tools to maintain the interoperability of the network.

 

Rationale:

  • Enhances sharing of data and information.

 

  • Enable near-real-time fault identification.

 

Implications

  1. Will drive Interior to standardized network architectures.
  2. Requires changes to the network environment (money and resources).
  3. Enables optimization of IT resources.

 

 

Principle 3:      Support Business Continuity

 

Use distributed systems management tools to support business continuity planning and operations.

 

Rationale:

  • Contributes to capturing total cost of ownership information and utilization.

 

  • Enables assessment and development of continuity of operations and disaster recovery plans; thereby, enabling better ability to recover from a disaster.

 

  • Allows the Interior to better recognize failure by establishing baselines and monitoring the IT infrastructure.

 

  • Reduces the downtime resulting from failures.

 

  • Supports compliance with Office of Management and Budget (OMB) A-130, “Management of Federal Information Resources.”

 

  • Enhances public trust.

 

  • Enables future planning.

 

Implications

  1. Need identified owners responsible for establishing and maintaining the baseline.
  2. Need to develop an understanding of cost/risk relationship; costs will increase to mitigate the risk of disaster.
  3. Need an agreed upon process/approach to conducting and maintaining the baseline.
  4. Requires a cultural change for IT staff to report changes in the IT environment.
  5. Better baseline information enables better planning and decision-making.

 

 

Principle 4:     Information Access

 

Ensure that information is stored so that it is accessible for short and long term needs.

 

Rationale:

  • Enables data reuse.

 

  • Enable data and disaster recovery.

 

  • Ensures access to current information in a format that is useful internally and externally.

 

·        Provides capacity and growth planning metrics.

 

Implications

  1. Software must be distributed that enables stewards to provide their data to users.
  2. Need replication technology as appropriate for the information.
  3. Need to follow departmental records management policies and procedures.
  4. Need to ensure that storage products are industry standard and included as part of the data or information when retired; need to identify storage media and hardware that has a significant expected longevity.
  5. Need to maintain a controlled information storage environment.
  6. Will require proper capacity planning, performance monitoring, network, LANand LAN/systems tools.

 

 

Principle 5: Reuse Technology Components

 

Use distributed systems management tools to determine the availability and appropriateness of reusable technology components.

 

Rationale:

  • Allows redistribution of resources.

 

  • Enables proactive planning.

 

  • Helps inform reuse, buy, or build decisions.

Implications

  1. Need to establish and maintain a baseline.
  2. Enables the appropriate allocation of budget dollars.
  3. Need to dedicate resources to actively monitor systems performance and assets.

4.      Need a formal mechanism to proactively promote technology reuse opportunities.

 

Principle 6:      Support Security, Privacy and Confidentiality

 

Select distributed systems management tools that are aligned with security, privacy and confidentiality legislation and policies.

 

Rationale:

·        Reduces the likelihood of divulging employee and customer privacy information or sensitive systems information.

 

·        Enhances Interior’s security posture.

 

·        Reduces Interior’s legal risk.

 

·        Enhances public trust.

 

Implications

1.      Need to be aware of unintended consequences (i.e., using certain tools increases the risk of exposing sensitive information).

2.      DSM tools and IT staff need to have a high level of authority to function; therefore, the IT staff needs higher security awareness and accountability.

3.      Need to know which security, privacy and confidentiality legislation and policies are in place.

4.      IT staff needs appropriate training on DSM tools; users and managers need to be informed about the purpose, appropriate use, functionality, capabilities and limitations of DSM.

5.      DSM tool usage needs to be limited to the appropriate operational levels.

 

9.37.3              Technology Components

 

The Distributed Systems Management components in this domain include:

·        Authentication / Single Sign-on (SSO) – Refers a method that provides users with the ability to log-in one time, getting authenticated access to all their applications and resources.

·        Supporting Network Services - These consist of the protocols that define the format and structure of data and information that is either accessed from a directory or exchanged through communications.

·        Deployment Management – Refers to the capability of software delivery to remote networked desktops, servers, and mobile devices across an enterprise.

·        Other Applications – Refers to software applications that do not fit in any of the other aforementioned software categories.  Due to the nature of distributed systems management application is this category will be differentiated by the TRM Sub-component.

 

The classifications for any products or standards within this domain are:

 

Life Cycle                     Definition/

Classifications               Meaning

 

Preferred                      Product/standard of choice; support available; recommended.

                       

Contained                     Develop solutions using these standards or products only if there are no suitable alternatives categorized as preferred; if a preferred product is available that will meet the requirements, plans should be developed to move from contained to preferred as soon as practical.

 

Obsolete                      Being phased out; (e.g., vendor support ending); plans should be developed to rapidly phase out and replace (often to avoid substantial risks).

                                               

Research                      Product/standard to be used in conjunction with technology research efforts only (e.g., testing, pilots).

                       

Rejected                       Product/standard has been evaluated and found not to meet technical architecture needs.

 

7.3.1    Authentication / Single Sign-on (SSO)

 

Authentication / Single Sign-on (SSO) refers a method that provides users with the ability to log-in one time, getting authenticated access to all their applications and resources.

 

  • Use of Novell eDirectory (Version 8.7+) is classified as Preferred

 

7.3.2    Supporting Network Services

 

Supporting Network Services consist of the protocols that define the format and structure of data and information that is either accessed from a directory or exchanged through communications.

 

Standards:

 

  • Use of LDAP is classified as Preferred

 

  • Use of Simple Network Management Protocol (SNMP) (Version 1 or 3) is classified as Preferred

 

  • Use of Management Information Base (MIB) (Version II) is classified as Preferred

 

Products:

 

  • Use of Novell eDirectory (Version 8.7+) is classified as Preferred

 

  • Use of Microsoft Active Directory is classified as Preferred

 

7.3.3    Deployment Management

 

Deployment Management refers to the capability of software delivery to remote networked desktops, servers, and mobile devices across an enterprise. Deployment automation tools provide centralized and accelerated delivery of applications to users via push technologies, eliminating the need for manual installation and configuration.

 

  • Use of Microsoft Systems Management Server (SMS) is classified as Preferred

 

  • Use of Novell Zenworks is classified as Preferred

 

7.3.4    Other Applications

 

Other Applications refers to software products that do not fit in any of the other aforementioned software categories but also are used in conjunction with data management processes.  Applications in this category perform a wide range of distributed systems management functions and are represented by TRM Service Sub-standard.

 

Network Element Manager:

  • Use of Cisco Campus Manager is classified as Preferred
  • Use of Cisco VPN Management Solutions is classified as Preferred
  • Use of Cisco Works is classified as Contained
  • Use of Fore Systems Foreview is classified as Contained
  • Use of Nortel Networks Optivity is classified as Contained
  • Use of Prism TechSpectrum View is classified as Contained
  • Use of Express Software Manager is classified as Contained
  • Use of Internet Performance Manager is classified as Contained
  • Use of NetManager is classified as Contained
  • Use of Legato NetWorker is classified as Contained
  • Use of NetViz is classified as Research

LAN and System Element Manager: (See Network Element Manager above for the preferred tool.)

  • Use of Network Associates ZAC is classified as Contained
  • Use of Tivoli Tivoli Suite is classified as Contained
  • Use of Intel LanDesk is classified as Contained
  • Use of LANview is classified as Contained
  • Use of Microsoft LAN Workplace is classified as Contained

Software Distribution:

  • Use of Microsoft Systems Management Server (SMS) is classified as Preferred
  • Use of Novell Zenworks is classified as Preferred
  • Use of Network Associates ZAC is classified as Contained
  • Use of Tivoli Tivoli Suite is classified as Contained
  • Use of Intel LanDesk is classified as Contained

Asset Management:

  • Use of Cisco Cisco Works 2000 is classified as Preferred
  • Use of Microsoft Systems Management Server (SMS) is classified as Preferred
  • Use of Sun StorEdge Volume Manager Server Administration is classified as Preferred
  • Use of Novell Zenworks is classified as Preferred
  • Use of Computer Associates Unicenter is classified as Contained
  • Use of Tivoli Tivoli Suite is classified as Contained
  • Use of Network Associates ZAC is classified as Contained
  • Use of WRQ Express Meter (16-bit) is classified as Contained
  • Use of WRQ Express Software Manager Client (16-bit) is classified as Contained
  • Use of Zero Administration Kit for Windows NT is classified as Contained
  • Use of NetCensus is classified as Obsolete
  • Use of Powerquest BootMagic is classified as Research

Help Desk:

  • Use of Front Range HEAT is classified as Preferred
  • Use of Magic Magic Total Service is classified as Preferred
  • Use of Remedy Action Request System is classified as Contained
  • Use of Blue Ocean Track IT is classified as Contained
  • Use of Clarify Clarify Suite is classified as Contained
  • Use of Siebel Siebel Suite is classified as Contained
  • Use of Vantive Helpdesk is classified as Contained

Performance Management:

  • Use of Cisco Cisco Works 2000 is classified as Preferred
  • Use of Microsoft Microsoft Operations Manager is classified as Preferred
  • Use of Novell Zenworks is classified as Preferred
  • Use of Concord eHealth is classified as Contained
  • Use of ISS Real Secure is classified as Contained
  • Use of Lucent E-Pro is classified as Contained
  • Use of Tivoli Tivoli Suite is classified as Contained
  • Use of Multi-Router Traffic Graphics (MRTG) is classified as Contained
  • Use of Snort ACID is classified as Contained
  • Use of Network Forensics is classified as Research

Capacity Planning:

  • Use of Cisco Cisco Works 2000 is classified as Preferred
  • Use of Concord eHealth is classified as Contained
  • Use of Lucent E-Pro is classified as Contained
  • Use of Net Realty Wise LAN is classified as Contained
  • Use of Packeteer PacketShaper is classified as Contained

Change Control:

  • Use of Novell Zenworks is classified as Preferred
  • Use of Tivoli Tivoli Suite is classified as Contained

Backup & Recovery:

  • Use of Sun StorEdge Enterprise NetBackup/HSM Media Manager is classified as Preferred
  • Use of SUN Wnetbp Sun StorEdge Enterprise NetBackup is classified as Preferred
  • Use of Veritas VERITAS Suite is classified as Preferred
  • Use of Sun StorEdge Enterprise NetBackup is classified as Preferred
  • Use of Microsoft BackOffice Tools is classified as Preferred
  • Use of Microsoft Backup is classified as Contained
  • Use of NovaBack+ for Windows 95/NT QIC is classified as Contained
  • Use of Backup Agent for MS Exchange Server is classified as Contained
  • Use of Legato Storage Manager Client is classified as Contained
  • Use of Computer Associates ArcServe / Backup software is classified as Contained

Event Fault Manager:

  • Use of Cabletron Spectrum is classified as Contained
  • Use of Computer Associates Unicenter is classified as Contained
  • Use of Hewlett Packard Openview is classified as Contained
  • Use of Tivoli Netview is classified as Contained
  • Use of Veritas Nerve Center is classified as Contained
  • Use of SUN Net Manager is classified as Contained

Miscellaneous:

  • Use of Netview is classified as Contained
  • Use of webHancer Customer Companion is classified as Contained
  • Use of Wise InstallMaster is classified as Contained
  • Use of Xteq X-Setup is classified as rejected

7.4       Select Best Practices

 

The Domain Principles, because they are derived from Interior's business direction and strategies, provide the primary direction and guidance around technology decisions within this domain.  Additional benefit may sometimes be obtained by reviewing Select Best Practices. These reflect the valuable insights from either domain team members’ experiences or other public sector organizations.

 

SRM Focus

 

Select

Best Practice 1:          Resolution databases- Resolution databases that contain solutions to recurring problems should be built to improve quality and contain costs. The effort to resolve recurring problems is significantly reduced. Education of new personnel is improved because a knowledge base is developed and available to more quickly resolve new problems with similarities to those previously encountered.

Select

Best Practice 2:          Tiered support-: Multiple tiers or levels of client support should be employed to leverage support resources and provide effective client support.  Front-line support staff can handle most problems, while more difficult problems will need quick escalation to additional levels of expertise. A tiered system with defined response times uses limited talent most effectively.

Select 

Best Practice 3:          Single point of contact- All technical support or help desk implementations should have a single point of contact.   The users are those being served and should not have to expend additional effort to report problems and faults.  Simplifying the reporting process helps ensure rapid action.

Select 

Best Practice 4:          Designing to support an enterprise model- A single consolidated Tier-1 help desk supports an enterprise model.  A consolidated help desk does not have to be physically located in one place. However, it should have one constituency, one phone number, one set of procedures, one set of defined services, and one set of integrated network systems management (NSM) platforms and applications.  It should take advantage of advanced technology tools to help improve responsiveness and results in user support.

Select 

Best Practice 5:          Define Metrics- Reliable metrics and reports should be defined and used to assist managers, help desk staff, and the client community to assess the effectiveness of the help desk in meeting organizational goals.  Both consolidated high level and low-level detailed measures are critical to successful service desk operations.   Metrics should be used to identify trends and to support a proactive management approach that anticipates and avoids problems.  Methods and procedures to solve problems should be developed, published, followed and measured.   Service level agreements (SLA’s) should be developed stating responsibilities of both the help desk and its clients. SLA criteria are one method to evaluate help desk performance

Select 

Best Practice 6:          Automated Report Card/Dashboard- To track performance and/or tuning results, interface all areas toward a central repository and owner, with appropriate authority granted to the owner.  Outline a common report card for monitoring performance results, and provide online via Web-services.  Determine the method of tracking a “dashboard” for current values and issues within the system. This can coincide with the report card but should be managed and implemented differently due to the timeliness of data from these mechanisms.  Publish SLAs for both expected and actual results, and educate the staff on their meanings.

Select 

Best Practice 7:          Design to Share Information- Geographically dispersed help desk units must inter-operate and share information.  All requests for service should reside in a database that is shared by technology and application-based help desk units serving specific constituencies throughout the organization. This process shares information and makes it possible for one help desk to electronically pass a service request to another help desk without forcing the user to make another contact attempt. The use of technological advances, such as distributed processing, dynamic control of users desktop, improved telephony, and client support software, make it possible for geographically dispersed help desk groups to function as a cohesive support unit.

Select

Best Practice 8:          Configure for Remote Management and Support- Equipment deployed in virtual data centers must be configured to facilitate remote management and support.  Identical configurations of rack-mounted servers are placed in secure locations (closets).  For reliability and ease of support, each major application should be placed on a uniformly configured server. This may require that each major application be implemented on its own server. Use the same reference configuration on these servers. Important items to consider when planning for consistency, include using the same versions of network software, using the same network hardware cards, etc.  Systems management tools, consistently applied, allow management of multiple instances of the identical network configurations at remote sites as if they were on the data center floor.

 

Select 

Best Practice 9:          Manage the Life Cycle of Software/Applications- Leveraging common life cycle management techniques will help reduce costs in software and replacement, along with the effort required to implement new applications and services.  Develop single standard installation process for workstations and servers, providing a common platform and standards to use for application installs and upgrades.   Develop standard process for certifying products and upgrades, and use this in testing and certifying new applications, hardware components, etc. This will allow for verification of these products as new operating systems are implemented. Develop a joint business/IT process for exceptions-escalation that is business needs driven and integrated into the acquisition process.

 

Select 

Best Practice 10:        Structure for Audit and Policy Management- Systems and tools selected for distributed systems should be deployed with tools for auditing and managing storage processes. This would include the following areas:  auditing and reporting on space usage, aging of data/files, and ownership of data; trend reporting for space usage to plan ahead for future needs.

Select 

Best Practice 11:        Develop and Maintain Simple Designs- An attempt at tuning may generate worse results when engineers try to make changes that are unproven. The focus of IT should be to minimize changes from the industry defaults/standards unless such change is warranted and proven as appropriate. Changes in defaults are often of little value and may result in future work to keep the standard in place.  Maintain default values unless tested and proven, or instructed from appropriate resources within the product vendor.  Document changes from defaults and reasoning for changes in a Change Log Manual.  Review changes semi-annually and determine if additional issues should be reviewed or updated.

Select 

Best Practice 12:        Communicate the Maintenance Schedule- All systems require periodic maintenance, enhancements and changes. Distributed systems require consistent maintenance also, which involves scheduled downtime. It should be communicated to the user community and followed appropriately to ensure the maximum availability for users and mobile staff.  Outline the expectations and schedule for maintenance and communicate to the users.   Provide a change management process for testing and implementing maintenance that does not impact the users negatively and reduces risk of a poorly implemented change.  Ensure all phases of the maintenance schedule are communicated including completion to ensure closer to ensure users can plan work accordingly.


7.5       Contributors

 

The quality of the Interior-wide guidance provided within this TRM chapter is a reflection of the efforts of the Distributed Systems Management Architecture team. The members of the team are:

 

Organization                                         Name

                                   

Office of the Special Trustee                 Frank Olguin

 

Office of Surface Mining                       Ron Bryan

 

Minerals Management Service  Greg Mormile

 

National Park Service                           John Snyder

 

Bureau of Reclamation              Kevin Kelly

 

National Business Center                      Estle Lewis

 

National Business Center                      Steve Woodka

 

Fish and Wildlife Service                       Rhoda Upshur-Dunn

 

Bureau of Land Management                David Pearson

 

Bureau of Land Management                Bruce Allen

 

US Geological Survey                           Bill Reilly


Disclaimer | Privacy Statement | FOIA | E-Gov | USA.gov | White House | DOI Home