Shan Gao Ma is a consultancy specializing in collecting sensitive data and navigating privacy ethics and regulation.
We help you translate your goals into ones you can measure.
We figure out what to measure; which means we come up with a master list of questions you want to "ask of your data."
We identify who you need to participate in the program to get the answers you're looking for. And we think about how to give participants the proper incentives to provide accurate data.
That means we address privacy concerns upfront and bake them into the technical design of the data collection system.
Making use of data by understanding
its limitations.
Our process is iterative, working through trial and error to refine both the questions you ask of your data and how you collect what you need to answer those questions. So expect to be working actively with us through the entire life cycle of the project to continually refine what we're doing.
We also assume that you're going to be continually evolving what you do as you learn from your data, so we set you up with a stewardship process to keep track of the data your system is collecting and what it means.
We are currently developing The Data Dictionary to help you automate the stewardship process.
The Data Dictionary
Data documentation and issue tracking in a structured database.

Centralize Documentation
Document what you're collecting
Document what it means
Annotate with analysis
Track issues
Broaden Data Use In Your Organization
Easier access to data
Faster ramp up for new colleagues
Share and Collaborate
Update documentation together
Track issues together
Share analysis
The Common Data Project is a non-profit whose mission is to encourage and enable the disclosure of sensitive data for public re-use through the creation of a technology and legal framework for anonymized data-sharing with a measurable privacy guarantee. Shan Gao Ma is working closely with CDP to build out the core technology components behind the CDP open data sharing service, including the Datatrust Platform.
SGM is donating time to CDP conducting technical explorations of the PINQ Differential Privacy technology that will be a core feature of the CDP Datatrust.
Datatrust Platform
The Datatrust Platform will be a "open data" sharing platform for releasing sensitive data records to the public with a measurable privacy guarantee.
The platform will enable more timely data releases by doing away with the need for labor-intensive and inexact anonymization methods like scrubbing, swapping or synthesizing data.
Instead, privacy will be guaranteed in a quantifiable way using an adapted version of differential privacy. Anonymization will happen on-the-fly, on a query-by-query basis.
To date, there are no methods in use that can measure the effectiveness of their anonymization techniques or allow for arbitrary queries of data. Instead, most anonymization is subjective and results in either woefully inadequate privacy protection or pre-digested aggregate reports that limit the accuracy and usefulness of the data.
Read more about our work on defining a measurable privacy guarantee and see our demo of PINQ differential privacy in action.

