What is a Data Contract?
.png)
A data contract is a universal agreement that encompasses the format, structure, and guarantees of data products. It is not simply a service level agreement (SLA) but a comprehensive definition that helps stakeholders understand and trust the data they are working with.
Key Aspects of Data Contracts
.png)
- Automation: Data contracts should be machine-readable to enable automatic enforcement of rules and guarantees, ensuring that data products adhere to agreed-upon standards.
- Data Validation: Data contracts should define specific rules for data quality and integrity. Automated checks can trigger alerts if these rules are violated, ensuring that data issues are identified and addressed promptly.
- Predefined Schemas: Data contracts should specify the expected schema for data products, preventing breaking changes and ensuring compatibility across different systems.
Example Data Contract
.png)
While the current format for data contracts may be in YAML, future tools could offer more user-friendly interfaces. A typical data contract would include:
- Name: A clear and descriptive name for the data product.
- Ownership: Information about who owns and is responsible for the data product.
- Stakeholders: A list of teams and individuals who rely on the data product.
- Description: A detailed explanation of the data product's purpose and content.
- Output: Information on where and how the data product can be accessed.
- Schema: Detailed information about the structure of the data, including column names, data types, and constraints.
- Sensitivity Tags: Tags that identify sensitive data elements to ensure proper protection and compliance with privacy regulations.
- Service Level Objectives (SLOs): Guarantees about data quality, such as completeness, freshness, and distribution of values.
Service Level Objectives