The Data Loss Prevention API provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams.
- A Google Cloud project with billing enabled
- Enable the DLP API.
- (Local testing) Create a service account
and set the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable pointing to the downloaded credentials file. - (Local testing) Set the
DLP_DEID_WRAPPED_KEYenvironment variable to an AES-256 key encrypted ('wrapped') with a Cloud Key Management Service (KMS) key. - (Local testing) Set the
DLP_DEID_KEY_NAMEenvironment variable to the path-name of the Cloud KMS key you wrappedDLP_DEID_WRAPPED_KEYwith.
This project uses the Assembly Plugin to build an uber jar. Run:
mvn clean package -DskipTests
An InfoType identifier represents an element of sensitive data.
InfoTypes are updated periodically. Use the API to retrieve the most current InfoTypes.
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata
The Quickstart demonstrates using the DLP API to identify an InfoType in a given string.
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.QuickStart
Inspect strings, files locally and on Google Cloud Storage, Cloud Datastore, and BigQuery with the DLP API.
Note: image scanning is not currently supported on Google Cloud Storage. For more information, refer to the API documentation. Optional flags are explained in this resource.
Commands:
-s <string> Inspect a string using the Data Loss Prevention API.
-f <filepath> Inspects a local text, PNG, or JPEG file using the Data Loss Prevention API.
-gcs -bucketName <bucketName> -fileName <fileName> Inspects a text file stored on Google Cloud Storage using the Data Loss
Prevention API.
-ds -projectId [projectId] -namespace [namespace] - kind <kind> Inspect a Datastore instance using the Data Loss Prevention API.
Options:
--help Show help
-minLikelihood [string] [choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.
-f, --maxFindings [number] [default: 0]
maximum number of results to retrieve
-q, --includeQuote [boolean] [default: true] include matching string in results
-t, --infoTypes set of infoTypes to search for [eg. PHONE_NUMBER US_PASSPORT]
-customDictionaries set of comma-separated dictionary words to search for as customInfoTypes
-customRegexes set of regex patterns to search for as customInfoTypes
- Inspect a string:
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is [email protected]" --infoTypes PHONE_NUMBER EMAIL_ADDRESS java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is [email protected]" -customDictionaries [email protected] -customRegexes "\(\d{3}\) \d{3}-\d{4}" - Inspect a local file (text / image):
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f src/test/resources/test.txt --infoTypes PHONE_NUMBER EMAIL_ADDRESS java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f src/test/resources/test.png --infoTypes PHONE_NUMBER EMAIL_ADDRESS - Inspect a file on Google Cloud Storage:
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -gcs -bucketName my-bucket -fileName my-file.txt --infoTypes PHONE_NUMBER EMAIL_ADDRESS - Inspect a Google Cloud Datastore kind:
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -ds -kind my-kind --infoTypes PHONE_NUMBER EMAIL_ADDRESS
Automatic redaction produces an output image with sensitive data matches removed.
Commands:
-f <string> Source image file
-o <string> Destination image file
Options:
--help Show help
-minLikelihood choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.
-infoTypes set of infoTypes to search for [eg. PHONE_NUMBER US_PASSPORT]
- Redact phone numbers and email addresses from
test.png:java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Redact -f src/test/resources/test.png -o test-redacted.png -infoTypes PHONE_NUMBER EMAIL_ADDRESS
- Create a Google Cloud Storage bucket and upload test.txt.
- Create a Google Cloud Datastore kind and add an entity with properties:
property1: [email protected]property2: 343-343-3435
- Update the Google Cloud Storage path and Datastore kind in InspectIT.java.
- Ensure that
GOOGLE_APPLICATION_CREDENTIALSpoints to authorized service account credentials file.
Run all tests:
mvn clean verify