Data Access Policies
Policies consist of a definition of sensitive data and a set of rules specifying how the data can be accessed, expressed in YAML format. Here is an example policy along with descriptions of each field.
sensitiveAttrs:- card_number- credit_limit- card_familylocations:- repo: invoicesschema: financetable: cardsrules:- identities:- scientistreads:allow: trueattributes: ["*"]rows: 10updates:allow: trueattributes:- credit_limitrows: 1deletes:allow: truerows: 1defaultRule:reads:allow: trueattributes:rows: 1updates:allow: falsedeletes:allow: false
A policy can be broken into two sections: data specification, comprising fields sensitiveAttrs
and locations
; and data access rules, comprising fields rules
and defaultRule
.
Data Specification
Users specify sensitive data to be managed by the policy through two fields:
sensitiveAttrs
is the dataset managed by the policylocations
is the set of locations where the dataset exists each location is defined by the repository, schema, and table containing the dataset entered in fieldsrepo
,schema
, andtable
, respectively.
In the following example, we specify that attributes card_number
, card_family
, and credit_limit
are sensitive and exist in table cards under the schema playground in the clinics repository as well as in the credit
repository.
sensitiveAttrs: [card_number, card_family, credit_limit]locations:- repo: creditschema: playgroundtable: cards- repo: clinicsschema: playgroundtable: cards
Data Access Rules
Users can manage how sensitive data can be accessed by specifying data access rules.
A data access rule comprises these fields:
reads
,updates
, anddeletes
respectively specify restrictions on reads, updates, and deletes of sensitive data. For each action, a user can specify these fields:allow
istrue
when the action is allowed, andfalse
otherwiseattributes
is the set of sensitive attributes for which the action is allowed- the value
["*"]
is used to specify that all attributes can be accessed - allowed attributes are only specified for
read
andupdate
actions, because attributes cannot be selectively deleted without deleting the row
- the value
rows
specifies the maximum numbers of rows that can be accessed or affected by the action- the value
-1
is used to specify that there is no limit on the amount of rows that can be accessed identities is the set of entities affected by the rule
- the value
identities
is the set of entities affected by the rule- this field is not specified for the default rule, which applies to accesses that do not match any of the identities specified by rules in the
rules
field
- this field is not specified for the default rule, which applies to accesses that do not match any of the identities specified by rules in the
For example, the following rule dictates that the identity scientist
can read up to 10 rows of any attributes, update a single row of the attribute credit_limit
, and is prohibited from deleting any sensitive data managed by the policy containing this rule.
identities:- scientistreads:allow: trueattributes: ["*"]rows: 10updates:allow: trueattributes:- credit_limitrows: 1deletes:allow: truerows: 1
A single policy can have several such rules specified in the field rules, as well as a rule that applies to any identities that aren’t covered by
those rules specified in the field defaultRule
.