Shan Gao Ma is a consultancy specializing in collecting sensitive data and navigating privacy ethics and regulation.

We help you translate your goals into ones you can measure.

We figure out what to measure; which means we come up with a master list of questions you want to "ask of your data."

We identify who you need to participate in the program to get the answers you're looking for. And we think about how to give participants the proper incentives to provide accurate data.

That means we address privacy concerns upfront and bake them into the technical design of the data collection system.

Making use of data by understanding
its limitations.

Our process is iterative, working through trial and error to refine both the questions you ask of your data and how you collect what you need to answer those questions. So expect to be working actively with us through the entire life cycle of the project to continually refine what we're doing.

We also assume that you're going to be continually evolving what you do as you learn from your data, so we set you up with a stewardship process to keep track of the data your system is collecting and what it means.

We are currently developing The Data Dictionary to help you automate the stewardship process.

The Data Dictionary

Data documentation and issue tracking in a structured database.

Centralize Documentation
    Document what you're collecting
    Document what it means
    Annotate with analysis
    Track issues

Broaden Data Use In Your Organization
    Easier access to data
    Faster ramp up for new colleagues

Share and Collaborate
    Update documentation together
    Track issues together
    Share analysis

In 2007, realizing that there were surmountable technical and policy roadblocks to sharing sensitive personal information, SGM's Principal Alex Selkirk started The Common Data Project to create an "open data" sharing service to safely release sensitive data to the public.

The Common Data Project is a non-profit whose mission is to encourage and enable the disclosure of sensitive data for public re-use through the creation of a technology and legal framework for anonymized data-sharing with a measurable privacy guarantee. Shan Gao Ma is working closely with CDP to build out the core technology components behind the CDP open data sharing service, including the Datatrust Platform.

SGM is donating time to CDP conducting technical explorations of the PINQ Differential Privacy technology that will be a core feature of the CDP Datatrust.


Datatrust Platform

The Datatrust Platform will be a "open data" sharing platform for releasing sensitive data records to the public with a measurable privacy guarantee.

The platform will enable more timely data releases by doing away with the need for labor-intensive and inexact anonymization methods like scrubbing, swapping or synthesizing data.

Instead, privacy will be guaranteed in a quantifiable way using an adapted version of differential privacy. Anonymization will happen on-the-fly, on a query-by-query basis.

To date, there are no methods in use that can measure the effectiveness of their anonymization techniques or allow for arbitrary queries of data. Instead, most anonymization is subjective and results in either woefully inadequate privacy protection or pre-digested aggregate reports that limit the accuracy and usefulness of the data.

Read more about our work on defining a measurable privacy guarantee and see our demo of PINQ differential privacy in action.