Data governance overview
The key to successful data governance is to break down structured data into data entities and data subject areas. You can then use a data governance solution to surround your specific data entities and data subject areas with people, processes, policies, and technology. The solution helps you govern your data entities' lifecycles. Establishing a common business vocabulary in a glossary within your data catalog also helps you govern your data.
Your data catalog technology is critical. You can't govern your data if you don't know where the data is or what it means. Data catalog software provides automatic data discovery, automatic profiling that determines data quality, and automatic sensitive data detection. Data catalog technology also helps you map disparate data to the common vocabulary data names and definitions in your catalog's business glossary to understand what the data means.
Data classification categorizes data assets by assigning them unique logical labels or classes based on business context. Examples of classification labels or classes include:
- Passport number.
- Driver's license number.
- Credit card number.
- SWIFT code.
- Individual's name.
You can define data classification schemes such as a data confidentiality classification scheme in your data catalog. To define the scheme, you associate policies and rules in your catalog with different classification levels.
A data lifecycle retention classification scheme provides different retention classifications for data lifecycle management. A custom microservice lifecycle application can use this scheme to maintain the data lifecycle within your environment.
Label or tag data attributes in your business glossary with confidentiality and retention classifications that specify their governance. Labeling an attribute in your glossary automatically defines how to govern data mapped to the attribute in underlying data stores. Your data catalog maps the physical data attributes in different data stores to the business glossary attributes.
You can integrate multiple technologies with your data catalog to access these attributes and enforce policies and rules across all data stores in your distributed data landscape. You can also apply the same classification labels to unstructured data.
Master data entities are important because their data is widely shared. Master data entities are often associated with documents. Customer and invoice, supplier and contract, and asset and operating manual are example master data entity and document pairings. By using this type of connection, you can tag related documents with master data values, such as a supplier name, and preserve relationships between structured and unstructured data.
You can create pipelines that create trusted data assets by using the common vocabulary data entities from your data catalog. You can then publish these assets in a data marketplace to share.
The key takeaway is that you can use available data governance methods to get your data under control. After your data is trusted, you can use the data to drive value. How well you organize and coordinate your data governance determines your level of success.
The data governance maturity model describes the maturity of your ability to cover all governance aspects across your data landscape. The following tables can help you assess your current position in the data governance maturity model.
Ungoverned | Stage 1 | Stage 2 | Fully governed |
---|---|---|---|
No stakeholder executive sponsor | Stakeholder sponsor in place | Stakeholder sponsor in place | Stakeholder sponsor in place |
No roles and responsibilities | Roles and responsibilities defined | Roles and responsibilities defined | Roles and responsibilities defined |
No data governance control board | Data governance control board in place but no data | Data governance control board in place with data | Data governance control board in place with data |
No data governance working groups | No data governance working groups | Some data governance working groups in place | All data governance working groups in place |
No data owners accountable for data | No data owners accountable for data | Some data owners in place | All data owners in place |
No data stewards appointed with responsibility for data quality | Some data stewards in place for data quality, but scope too broad, like whole department | Data stewards in place and assigned to data governance working groups for specific data | Data stewards in place assigned to data governance working groups for specific data |
No one accountable for data privacy | No one accountable for data privacy | Chief privacy officer accountable for privacy, no tools | Chief privacy officer accountable for privacy with tools |
No one accountable for access security | IT accountable for access security | IT security accountable for access security | IT security accountable for access security and responsible for privacy enforcement |
No trusted data asset producer | Data publisher identified and accountable for producing trusted data | Data publisher identified and accountable for producing trusted data | Data publisher identified and accountable for producing trusted data |
No subject-matter experts (SMEs) identified for data entities | Some SMEs identified, but not engaged | SMEs identified and in data governance working groups | SMEs identified and in data governance working groups |
Ungoverned | Stage 1 | Stage 2 | Fully governed |
---|---|---|---|
No common business vocabulary | Common business vocabulary begun in a glossary | Common business vocabulary established | Common business vocabulary complete and maintained |
No way to know data location, quality or sensitivity | Data catalog auto data discovery, profiling, and sensitive data detection on some systems | Data catalog auto data discovery, profiling, and sensitive data detection on all structured data | Data catalog auto data discovery, profiling, and sensitive data detection on structured and unstructured data in all systems, with full auto-tagging |
No process to govern authoring or maintenance of policies and rules | Governance of data access security, policy authoring, and maintenance on some systems | Governance of data access security, privacy, and retention policy authoring and maintenance | Governance of data access security, privacy, and retention policy authoring and maintenance |
No way to enforce policies and rules | Piecemeal enforcement of data access security policies and rules across systems with no catalog integration | Enforcement of data access security and privacy policies and rules across systems with catalog integration | Enforcement of data access security, privacy, and retention policies and rules across all systems |
No process to monitor data quality, data privacy, or data access security | Some ability to monitor data quality, some ability to monitor privacy, such as queries | Monitoring and stewardship of data quality and data privacy on core systems with database management system (DBMS) masking | Monitoring and stewardship of data quality and data privacy on all systems with dynamic masking |
No availability of fully trusted data assets | Development started for a small set of trusted data assets using data fabric software | Several core trusted data assets created using data fabric | Continuous delivery of trusted data assets through enterprise data marketplace |
No way to know whether a policy violation occurs or process if one occurs | Data access security violation detection in some systems | Data access security violation detection in all systems | Data access security violation detection in all systems |
No vulnerability testing process | Limited vulnerability testing process | Vulnerability testing process for all systems | Vulnerability testing process for all systems |
No common process for master data creation, maintenance, and sync | Master data management (MDM) with common master data create, read, update, and delete (CRUD) and sync processes for single entities | MDM with common master data CRUD and sync processes for some data entities | MDM with common master data CRUD and sync processes for all master data entities |
Ungoverned | Stage 1 | Stage 2 | Fully governed |
---|---|---|---|
No data governance classification schemes on confidentiality and retention | Data governance classification scheme for confidentiality | Data governance classification scheme for both confidentiality and retention | Data governance classification scheme for both confidentiality and retention |
No policies and rules to govern data quality | Policies and rules to govern data quality begun in common vocabulary in business glossary | Policies and rules to govern data quality defined in common vocabulary in catalog business glossary | Policies and rules to govern data quality defined in common vocabulary in catalog business glossary |
No policies and rules to govern data access security | Some policies and rules to govern data access security created in different technologies | Policies and rules to govern data access security consolidated in the data catalog using a classification scheme | Policies and rules to govern data access security consolidated in the data catalog using a classification scheme and enforced everywhere |
No policies and rules to govern data privacy | Some policies and rules to govern data privacy | Policies and rules to govern data privacy consolidated in the data catalog using a classification scheme | Policies and rules to govern data privacy consolidated in the data catalog using a classification scheme and enforced everywhere |
No policies and rules to govern data retention | Some policies and rules to govern data retention | Policies and rules to govern data retention consolidated in the data catalog using a classification scheme | Policies and rules to govern data retention consolidated in the data catalog using classification schemes and enforced everywhere |
No policies and rules to govern master data maintenance | Policies and rules to govern master data maintenance for a single master data entity | Policies and rules to govern master data maintenance for some master data entities | Policies and rules to govern master data maintenance for all master data entities |
Ungoverned | Stage 1 | Stage 2 | Fully governed |
---|---|---|---|
No data catalog with auto data discovery, profiling, and sensitive data detection | Data catalog with auto data discovery, profiling, and sensitive data detection purchased | Data catalog with auto data discovery, profiling, and sensitive data detection purchased | Data catalog with auto data discovery, profiling, and sensitive data detection purchased |
No data fabric software with multicloud edge and datacenter connectivity | Data fabric software with multicloud edge and datacenter connectivity and catalog integration purchased | Data fabric software with multicloud edge and datacenter connectivity and catalog integration purchased | Data fabric software with multicloud edge and datacenter connectivity and catalog integration purchased |
No metadata lineage | Metadata lineage available in data catalog on trusted assets being developed by using fabric | Metadata lineage available in data catalog on trusted assets being developed by using fabric | Metadata lineage available in data catalog on trusted assets being developed by using fabric |
No data stewardship tools | Data stewardship tools available as part of the data fabric software | Data stewardship tools available as part of the data fabric software | Data stewardship tools available as part of the data fabric software |
No data access security tool | Data access security in multiple technologies | Data access security in multiple technologies | Data access security enforced in all systems |
No data privacy enforcement software | No data privacy enforcement software | Data privacy enforcement software in some database management systems | Data privacy enforcement software in all data stores |
No MDM system | Single entity MDM system | Multientity MDM system | Multientity MDM system |
After you determine where you currently stand in the governance maturity model, meet with your key stakeholders to map out a strategy to increase your maturity. Start by defining your requirements, technology, data quality, metadata, data sharing, and master data strategy.