Dwork helps you to analyze personal data without compromising privacy. Dwork is written in Python and has an intuitive interface that makes it easy to work with data. To get stared, we first need to install Dwork, which we can do via pip. Please note that Dwork requires Python version 3.5 or higher.
Here, we first imported the necessary functions from pandas and Dwork. Then, we defined a data schema for our dataset. This is important to tell Dwork about the types and ranges of individual attributes. Dwork can also try to infer this information automatically from the dataset, often this is not a good idea though as it can reveal personal information (for example, knowing the largest or smallest value of a given attribute can already reveal information about individuals from the dataset). For our example dataset we just defined two attributes, Weight and Height, that both have integer values in the range [0,200].
The PandasDataset instance that we've created can now be used almost like a normal pandas dataset. For example, if we want to calculate the mean value of the weight of all persons in the dataset we can simply write
result=ds["Weight"].sum()/ds.len()
Now, the result is not a numerical variable, but an instance of a Dwork Expression. We can choose to get the true value of the expression by calling result.true(), or we can get a differentially private value by calling result.dp(epsilon=0.5). Dwork will automatically calculate the sensitivity for us and add the proper amount of noise. Neat, isn't it?
Filtering
The filtering functionality is still under construction and not yet merged into the `master` branch.
Dwork allows you to filter a dataset by specifying a conditional expression.
# return only rows with Weight > 100dsf=ds[ds["Weight"]>100]
Grouping
The grouping functionality is still under construction and not yet merged into the `master` branch.
Dwork also allows you to group the data by a single or multiple attributes. This is useful to e.g. generate statistics for a number of subgroups of your dataset.
# group the dataset by weight, using 10 kg intervals, as well as by height using# 10 cm intervalsdsg=ds.group_by(ds['Weight'].discretize(10),ds['Height'].discretize(10))
Getting Started With Dwork
Dwork helps you to analyze personal data without compromising privacy. Dwork is written in Python and has an intuitive interface that makes it easy to work with data. To get stared, we first need to install Dwork, which we can do via
pip
. Please note that Dwork requires Python version 3.5 or higher.Then, we load our dataset into Dwork:
Here, we first imported the necessary functions from pandas and Dwork. Then, we defined a data schema for our dataset. This is important to tell Dwork about the types and ranges of individual attributes. Dwork can also try to infer this information automatically from the dataset, often this is not a good idea though as it can reveal personal information (for example, knowing the largest or smallest value of a given attribute can already reveal information about individuals from the dataset). For our example dataset we just defined two attributes,
Weight
andHeight
, that both have integer values in the range[0,200]
.The
PandasDataset
instance that we've created can now be used almost like a normal pandas dataset. For example, if we want to calculate the mean value of the weight of all persons in the dataset we can simply writeNow, the
result
is not a numerical variable, but an instance of a DworkExpression
. We can choose to get the true value of the expression by callingresult.true()
, or we can get a differentially private value by callingresult.dp(epsilon=0.5)
. Dwork will automatically calculate the sensitivity for us and add the proper amount of noise. Neat, isn't it?Filtering
Dwork allows you to filter a dataset by specifying a conditional expression.
Grouping
Dwork also allows you to group the data by a single or multiple attributes. This is useful to e.g. generate statistics for a number of subgroups of your dataset.