The Data Loss Prevention API provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams.
- A Google Cloud project with billing enabled
- Enable the DLP API.
- (Local testing) Create a service account
and set the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable pointing to the downloaded credentials file. - (Local testing) Set the
DLP_DEID_WRAPPED_KEYenvironment variable to an AES-256 key encrypted ('wrapped') with a Cloud Key Management Service (KMS) key. - (Local testing) Set the
DLP_DEID_KEY_NAMEenvironment variable to the path-name of the Cloud KMS key you wrappedDLP_DEID_WRAPPED_KEYwith.
This project uses the Assembly Plugin to build an uber jar. Run:
mvn clean package -DskipTests
An InfoType identifier represents an element of sensitive data.
InfoTypes are updated periodically. Use the API to retrieve the most current InfoTypes.
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata
The Quickstart demonstrates using the DLP API to identify an InfoType in a given string.
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar dlp.snippets.QuickStart
Inspect strings, files locally and on Google Cloud Storage, Cloud Datastore, and BigQuery with the DLP API.
Note: image scanning is not currently supported on Google Cloud Storage. For more information, refer to the API documentation. Optional flags are explained in this resource.
Automatic redaction produces an output image with sensitive data matches removed.
Commands:
-f <string> Source image file
-o <string> Destination image file
Options:
--help Show help
-minLikelihood choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.
-infoTypes set of infoTypes to search for [eg. PHONE_NUMBER US_PASSPORT]
- Redact phone numbers and email addresses from
test.png:java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Redact -f src/test/resources/test.png -o test-redacted.png -infoTypes PHONE_NUMBER EMAIL_ADDRESS
- Ensure that
GOOGLE_APPLICATION_CREDENTIALSpoints to authorized service account credentials file. - Create a Google Cloud Storage bucket and upload test.txt.
- Set the
GCS_PATHenvironment variable to point to the path for the bucket.
- Set the
- Copy and paste the data below into a CSV file and create a BigQuery table from the file:
Name,TelephoneNumber,Mystery,Age,Gender James,(567) 890-1234,8291 3627 8250 1234,19,Male Gandalf,(123) 456-7890,4231 5555 6781 9876,27,Male Dumbledore,(313) 337-1337,6291 8765 1095 7629,27,Male Joe,(452) 123-1234,3782 2288 1166 3030,35,Male Marie,(452) 123-1234,8291 3627 8250 1234,35,Female Carrie,(567) 890-1234,2253 5218 4251 4526,35,Female- Set the
BIGQUERY_DATASETandBIGQUERY_TABLEenvironment values.
- Set the
- Create a Google Cloud Pub/Sub topic and and a subscription that is subscribed to the topic.
- Set the
PUB_SUB_TOPICandPUB_SUB_SUBSCRIPTIONenvironment variables to the corresponding values.
- Set the
- Create a Google Cloud Datastore kind and add an entity with properties:
property1: [email protected]property2: 343-343-3435
- Update the Datastore kind in InspectTests.java.
Run all tests:
mvn clean verify