Interior
Chapter 3
Data Management Architecture
Version 2.0

CHAPTER 3. DATA MANAGEMENT ARCHITECTURE
3.1 Introduction and Background
Principle 2: Data Collection and Reuse
Principle 4: Data Contingency Planning
Principle 8: Mainstream Technologies
3.3.9 Data Format / Classification
3.3.10 Data Types /
Validation
Data is the representation of facts, concepts or instructions in a formalized manner suitable for communication, interpretation or processing. When data is combined appropriately, information is derived. Much like the natural resources it manages, Interior’s data and information are valuable assets that must managed. The full value of data and information resources is realized when Interior is able to appropriately share that data and information internally, as well as with external partners.
The focus of the Interior Enterprise Architecture is on providing guidance for information technology (IT) issues and initiatives that are Interior-wide or multi-bureau in scope. The Data Management architecture defines the mechanisms and standards for collecting, documenting, accessing, managing, maintaining the integrity of and securing Interior’s electronic data assets.
If used correctly, the Interior Enterprise Architecture will act as a catalyst for those looking to capitalize on its contents and better understand the full meaning of its guidance. This understanding will permit IT personnel to better engage the non-IT organization in discussions around tradeoffs and priorities within the proper governance structure (e.g., Management Initiatives Team (MIT), Information Technology Management Council (ITMC)). The Interior Enterprise Architecture is not intended to be the “last word” (e.g., some automated checklist for product selection). It is intended to be one of the “first words” to assure that Interior’s mission priorities and its IT priorities remain closely aligned.
There are many instances within Interior of data sharing and reuse. Conversely, there are also many examples of where data is not reused and shared enterprise-wide but collected and duplicated in innumerable databases throughout Interior or even within a single Bureau (e.g., names, addresses, and social security numbers may be stored and maintained in every application system that needs that particular data). It is difficult to determine which database stores the most current or correct information. Storing and maintaining multiple copies of the same data throughout the enterprise is time consuming and expensive.
Because Interior is incorporating the OMB’s Federal Enterprise Architecture (FEA) models, the technical guidance provided by the subject area experts within a domain spans both the Service Component Reference Model (SRM) as well as the Technical Reference Model (TRM). For the Data Management domain, the SRM elements are as follows:
Service Domain(s): The Back Office Services Domain defines the set of capabilities that support the management of enterprise planning and transactional-based functions.
Service Type(s): Data Management - defines the set of capabilities that support the usage, processing and general administration of unstructured information.
Development and Integration - defines the set of capabilities that support the communication between hardware/software applications and the activities associated with deployment of software applications.
Component(s): Data Classification – defines the set of capabilities that allow the classification of data.
Data Cleansing – defines the set of capabilities that support the removal of incorrect or unnecessary characters and data from a data source.
Data Exchange – defines the set of capabilities that support the interchange of information between multiple systems or applications.
Data Recovery – defines the set of capabilities that support the restoration and stabilization of data sets to a consistent, desired state.
Extraction and Transformation – defines the set of capabilities that support the manipulation and change of data.
Loading and Archiving – defines the set of capabilities that support the population of a data source with external data.
Data Mart – defines the set of capabilities that support a subset of a data warehouse for a single department or function within an organization.
Data Warehouse – defines the set of capabilities that support the archiving and storage of large volumes of data.
Data Integration - defines the set of capabilities that support the organization of data from separate data sources into a single source using middleware or application integration as well as the modification of system data models to capture new information within a single system.

These SRM service elements are likewise supported by Interior’s
IT (technical) infrastructure (e.g., servers, networks). Within this
infrastructure are individual TRM components for which this domain team is
providing guidance. The graphic below outlines those TRM elements for this
domain that support the service needs of the SRM.
Additionally, it’s doubtful that a single domain chapter from the TRM can be used to address a substantive issue. More realistically, a few architecture domains may need to be reviewed when addressing an important IT decision. For example, if Interior was considering the creation of a new Interior-wide Web application that could be used both by the general public and Interior personnel, then the TRM chapters like Data Management Technologies, Information Security, Distributed Systems Management and Application Development might all need to be reviewed.
The
principles listed below provide guidance for the design and selection of
technology components that will support the data management needs of Interior-wide
IT initiatives.
Principle 1: Data
Sharing
|
|
|
|
Data and information must be managed to facilitate data sharing across Interior, with our partners and the public. Rationale:
Implications:
12. Need to balance the
desire to share data with sensitivity, privacy and confidentiality
restrictions. 13. Need to take
electronic records management requirements into consideration. |
|
|
|
Principle 2: Data Collection and Reuse |
|
|
|
In considering data requirements, we should look to reuse existing data before we buy. If no data exists within Interior, consider acquisition of data from external sources before collecting/creating new data. Rationale:
Implications:
|
|
|
|
Principle 3: Data
Security
|
|
|
|
Data needs to be secured according to its sensitivity. Rationale:
Implications:
|
|
|
|
Principle 4: Data Contingency Planning |
|
|
|
Contingency planning processes need to be in place to ensure data availability. Rationale:
· Allows Interior to continue its mission and meet legal requirements. Implications:
|
|
|
|
Principle 5: Data Lifecycle |
|
|
|
Information is valued as an Interior asset; therefore, Interior data needs to be managed throughout its lifecycle. Rationale:
Implications:
|
|
|
|
Principle 6: Data Stewardship |
|
|
|
Data and information must be managed and maintained as a stewardship responsibility to support the mission of the department. Rationale:
· Data stewardship promotes the establishment of authoritative sources. · Complies with requirements of Section 515 of the Treasury and Consolidated Agency Appropriation Act. Implications:
|
Principle 7: Data Standards |
|
|
|
Interior will strive to create, acquire, and share data that adheres to data standards defined internally, with consideration to existing national standards. Rationale:
Implications:
|
|
|
|
Principle 8: Mainstream Technologies
|
|
|
|
Data management will use industry-proven and mainstream technologies. Rationale:
Implications:
|
The Data Management components in this domain include:
· Database – Refers to a collection of information organized in such a way that a computer program can quickly select desired pieces of data.
· Modeling – The process of representing entities, data, business logic, and capabilities for aiding in software engineering.
· Utilities – Refers to software tools that address various miscellaneous processes for technology applications and users.
· Other Applications – Refers to software applications that do not fit in any of the other aforementioned software categories.
· Static Display - Static Display consists of the software protocols that are used to create a pre-defined, unchanging graphical interface between the user and the software.
· Data Exchange – Data Exchange is concerned with the sending of data over a communications network and the definition of data communicated from one application to another.
· Database Connectivity - Defines the protocol or method in which an application connects to a data store or data base.
· Reporting and Analysis - Consist of the tools, languages and protocols used to extract data from a data store and process it into useful information.
· Data Format / Classification – Defines the structure of a file.
· Data Types / Validation – Refers to specifications used in identifying and affirming common structures and processing rules.
· Data Transformation - Data Transformation consists of the protocols and languages that change the presentation of data within a graphical user interface or application.
· Service Discovery - Defines the method in which applications, systems or web services are registered and discovered.
The classifications for any products or standards within this domain are:
Life Cycle Definition/
Classifications Meaning
Preferred Product/standard of choice; support available; recommended.
Contained Develop solutions using these standards or products only if there are no suitable alternatives categorized as preferred; if a preferred product is available that will meet the requirements, plans should be developed to move from contained to preferred as soon as practical.
Obsolete Being phased out; (e.g., vendor support ending); plans should be developed to rapidly phase out and replace (often to avoid substantial risks.)
Research Product/standard to be used in conjunction with technology research efforts only (e.g., testing, pilots).
Rejected Product/standard has been evaluated and found not to meet technical architecture needs.
Refers to a collection of information organized in such a way that a computer program can quickly select desired pieces of data. Databases organize data and information into physical structures, which are then accessed and updated through the services of a database management system. Databases may be relational, hierarchical, flat files or any other formal collection of data.
Standards:
Products:
The process of representing entities, data, business logic, and capabilities for aiding in software engineering is referred to as Modeling.
Standards:
Products:
Generally, Utilities are software tools that address various miscellaneous processes for technology applications and users. More specifically, the utilities identified in this document provide the means to manage data.
Other Applications refers to software products that do not
fit in any of the other aforementioned software categories but also are used in
conjunction with data management processes.
Applications in this category perform a wide range of data management
functions and should not be compared to each other for classification.
Static Display consists of the software protocols that are used to create a pre-defined, unchanging graphical interface between the user and the software.
Data Exchange is the format by which not-graphical data is exchanged over a communications network and the definition of data communicated from one application to another.
Database Connectivity defines the protocol or method in which an application connects to a data store or data base.
Standards:
Products:
Reporting and Analysis consist of the tools, languages and protocols used to extract data from a data store and process it into useful information.
Standards:
Products:
Data Format/Classification defines the structure of a file. There are well over 500 data formats and classifications in existence. For the purpose of this document, only preferred formats and classifications are listed.
Standards:
Products:
Data Types / Validation refers to specifications used in identifying and affirming common structures and processing rules.
Standards:
Products:
Data Transformation consists of the protocols and languages that change the presentation of data within a graphical user interface or application.
Standards:
Products:
Service Discovery defines the method in which applications, systems or web services are registered and discovered.
The Domain Principles, because they are derived from Interior's business direction and strategies, provide the primary direction and guidance around technology decisions within this domain. Additional benefit may sometimes be obtained by reviewing Select Best Practices. These reflect the valuable insights from either domain team members’ experiences or other public sector organizations.
SRM Focused
Select
Best Practice 1: Classify Data Sensitivity- Establish and use a consistent process to classify the sensitivity of all data and information as a basis for ensuring the security, privacy and confidentiality of Interior's data and information assets.
Select
Best Practice 2: Data Stewardship Roles- Establish an Interior data stewardship program with clearly defined roles and responsibilities.
Select
Best Practice 3: Data Standards Process- Establish and follow a consistent standard & process for defining, maintaining and archiving Interior data.
Select
Best Practice 4: Data Exchange Protocols- Establish a process for determining data exchange protocols and identify the protocols to be used across Interior.
Select
Best Practice 5: Metadata Definitions- Establish and follow a consistent
process for determining and maintaining metadata definitions. Seek guidance
from FGDC and Bureau Data Stewards.
Select
Best Practice 6: Adopt
Standards - Adopt existing national
/ international standards based on OMB Circular A-16, A-119, and A-130.
Select
Best Practice 7: Corporate
Metadata - Describe all databases
in the Corporate Metadata Repository.
Select
Best Practice 8: Reuse
to Facilitate Sharing - Reuse data
models and data sets to facilitate sharing of Data across the Department and
with business partners.
Select
Best Practice 9: Data
Lifecycle - Develop and use a data
lifecycle process to promote the release of Interior data to the public in a
timely fashion.
Select
Best Practice 10: Metadata
Integration - Integrate metadata
into all data management processes not only as a documentation tool but also as
a dynamic reference for all applications that access or update data.
Select
Best Practice 11: Data Cleansing - Develop
and use consistent data cleansing rules. Data Dictionaries and Data Models
should assist in describing how the data should be cleansed. Knowledge of data
dependencies, constraints, data types etc. is important.
Select
Best Practice 12: Minimize Duplication - Minimize data duplication by identifying and using authoritative data
sources. Consult Data Management
Steering Committee for guidance on availability and identification of
authoritative data sources.
Select
Best Practice 13: Backup & Recovery - Develop and document back-up and recovery procedures to support the
published Continuity of Operations Plans (COOP).
Select
Best Practice 14: Data Version Control - Develop Enterprise standards for time stamp (version) control of data to
enable near-real-time data recovery.
Select
Best Practice 15: Authoritative Sources - Use authoritative data sources when developing data marts and data
warehouses.
Select
Best Practice 16: Business Rules - Design
new
Select
Best Practice 17: Data Quality - Data
Stewards are responsible for monitoring the quality of the data in their
repository.
Select
Best Practice 18: Business Rules for Data - Data Stewards are responsible for determining the business rules for the
data in their repository.
The quality of the Interior-wide guidance provided within this TRM chapter is a reflection of the efforts of the Data Management Architecture team. The members of the team are:
Organization Name
Bureau of Reclamation Gary Hardman
Minerals Management Service Joe Chetodal
Minerals Management Service Gwendolyn Young
National Park Service Lance Gridley
Office of Surface Mining Donna Hale
US Fish and Wildlife Service Barb White
US Geological Survey Raymond Obuch
Bureau of Land Mgmt. Melanie Rhinehart
Bureau of Land
Mgmt. Stephen
Adams
[sda1]We should include the specific areas of application for a preferred product that we list in the spreadsheet. (e.g., Use of Microsoft SQL Server (Version > 7.0) is preferred for low cost, concurrent user databases). This would help explain why we have so many preferred products
| Disclaimer | Privacy Statement | FOIA | E-Gov | USA.gov | White House | DOI Home |