You are here:
Data element registry services support the management and use of environmental data maintained in EPA and partner systems. Use of the services promotes discovery and access of comparable information across EPA thereby facilitating understanding of data. Data element registry services include automated searching and downloads of critical metadata that allow the names, definitions, and meaning of fields in agency data systems to be displayed and compared. As such it is a comprehensive, authoritative reference for information about environmental data. It includes:
- Valid values
Data element registry services support the EPA’s business processes and data management by providing access to information about the agency's data assets including its data standards. These services allow distributed stewardship of data elements, code sets, and standards used in program systems. Managing the quality of metadata facilitates access to and understanding of environmental information.
Data element registry services also facilitates the transformation of equivalent data between different representations such as “31” contained in the FIPS 5-2 standard to “NE” contained in the U.S. Postal Service standard to “NB” contained in the FBI's National Crime Information Center (NCIC) standard - all meaning the State of Nebraska.
- What is a Data Element Registry?
The Data Element Registry is a "storage container" that provides important information about environmental data that is called metadata. The metadata includes:
- Semantic information that can help a user understand the data's meaning
- Representational information that specifies the form of presentation; e.g., format, length, permitted values, and datatype
- Naming and identification information
- Specifications for standard data to be incorporated in new and reengineered systems to promote data interchange
- Administrative information about the metadata steward and the quality of the metadata
It can also be the repository where the data in an information system is described.
The Registry provides a system to record what is known about environmental data and to understand that data. Using the Registry, environmental data developers and collectors can document data for which they are responsible one time, and one time only, so that there is no need to repeatedly explain the data to each organization or individual interested in the data (unless the system changes).
The Registry is a tool designed to:
- Promote uniform data documentation
- Foster information integration and sharing across EPA and with states, tribes, and other EPA partners.
- Support implementation of data standards
- Follow international standards in its design
The Registry does not contain the actual data from EPA information systems but rather the information about that data – the metadata that enables a user to better understand, access, and share system’s information.
It serves as a:
- Repository for data dictionaries of EPA’s major information collections
- Repository for EPA standard data elements with names, definitions, and format information for system developers to use
- Repository for EPA code sets (also known as permissible or valid value lists)
- Data management tool
It enables enterprise and system architects to analyze information managed by EPA and to reduce redundancies.
The Registry organizes data elements across EPA by application system and by common concepts. The data elements are uniquely defined and identified so that information can be shared throughout the EPA and integrated across EPA systems.
Metadata is plain, ordinary, everyday, garden variety data. The concept of metadata is often confusing, partly because it lacks a clear definition.
Metadata is a type of data that describes and defines other data, but what makes it different from ordinary data is how it is used. It is found in documents, messages, images, sound streams, and videos. The term metadata or "data about data" also refers to data that are used to describe a data set, such as the content, quality, and condition of data. It is the information that answers questions like:
- Who owns the data?
- What is the meaning of the data?
- How was the data collected?
- How is the data named and identified?
- Who may use the data?
- How current is the data?
- How is the data represented?
It is a set of facts about data and other information elements. It is everything except for the data itself, and it is undeniably important.
ISO 9000 defines quality as “Degree to which a set of inherent characteristics fulfills requirements. Note: Quality may relate to a product or service regarding its ability to meet the stated or implied needs and expectations of the user.”
In order to have quality, data must be understandable to the user. Well-defined, well-documented data is critical to quality. In addition to having good technical definitions and clear English language meanings, associating data elements with their concepts helps to maintain consistent, controlled vocabularies thus reducing ambiguity and misunderstanding.
In addition to knowing what the data means, it is highly important to know where and how to access it as well as where it came from and how complete it is. All of these data attributes are captured in the Data Registry and made readily available to assist EPA and its partners in performing their mission in a consistent and knowledgeable fashion.
For the registry to be successful and of quality content, the American National Standards Institute (ANSI) states that it must have results that:
a. Meet a well-defined need, use, or purpose;
b. Satisfy customers expectations;
c. Comply with applicable standards and specifications;
d. Comply with statutory (and other) requirements of society; and
e. Reflect consideration of cost and economics. [ANSI/ASQ E4-2004]
Environmental data elements from important EPA systems are registered and maintained as data dictionaries. Code sets (also known as valid or permissible value lists) are registered and updated centrally. The metadata registered pertaining to the data elements documents their quality characteristics and quality improvements. These characteristics support quality in the following ways:
- Provenance (source) – Tells where the information is from so the user can determine if they wish to use it
- Understandable – definitions and meaning
- Accessible – can find where it is in what system and what it is called
- Consistency – promotes standards, reuse of good data names
- Completeness – search more EPA sources and compare
The quality of the metadata within the Data Registry is managed using processes and procedures that are specified in International Standards along with additional processes and procedures within EPA. Documentation of these processes assists with repeatability and allows for application of a continuous process improvement activity. The implementation of the stewardship program improves quality of the content of the registry by transferring the entry and maintenance of that data and metadata to the subject matter experts.