Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Cloud Data Loss Prevention (DLP) API Samples

Open in Cloud Shell

The Data Loss Prevention API provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams.

Setup

  • A Google Cloud project with billing enabled
  • Enable the DLP API.
  • (Local testing) Create a service account and set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to the downloaded credentials file.
  • (Local testing) Set the DLP_DEID_WRAPPED_KEY environment variable to an AES-256 key encrypted ('wrapped') with a Cloud Key Management Service (KMS) key.
  • (Local testing) Set the DLP_DEID_KEY_NAME environment variable to the path-name of the Cloud KMS key you wrapped DLP_DEID_WRAPPED_KEY with.

Build

This project uses the Assembly Plugin to build an uber jar. Run:

   mvn clean package -DskipTests

Retrieve InfoTypes

An InfoType identifier represents an element of sensitive data.

InfoTypes are updated periodically. Use the API to retrieve the most current InfoTypes.

  java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata

Run the quickstart

The Quickstart demonstrates using the DLP API to identify an InfoType in a given string.

   java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar dlp.snippets.QuickStart

Inspect data for sensitive elements

Inspect strings, files locally and on Google Cloud Storage, Cloud Datastore, and BigQuery with the DLP API.

Note: image scanning is not currently supported on Google Cloud Storage. For more information, refer to the API documentation. Optional flags are explained in this resource.

Automatic redaction of sensitive data from images

Automatic redaction produces an output image with sensitive data matches removed.

Commands:
  -f <string>                   Source image file
  -o <string>                   Destination image file
 Options:
  --help               Show help
  -minLikelihood       choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
                       [default: "LIKELIHOOD_UNSPECIFIED"]
                       specifies the minimum reporting likelihood threshold.
  
  -infoTypes      set of infoTypes to search for [eg. PHONE_NUMBER US_PASSPORT]

Example

  • Redact phone numbers and email addresses from test.png:
      java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Redact -f src/test/resources/test.png -o test-redacted.png -infoTypes PHONE_NUMBER EMAIL_ADDRESS
    

Integration tests

Setup

  • Ensure that GOOGLE_APPLICATION_CREDENTIALS points to authorized service account credentials file.
  • Create a Google Cloud Storage bucket and upload test.txt.
    • Set the GCS_PATH environment variable to point to the path for the bucket.
  • Copy and paste the data below into a CSV file and create a BigQuery table from the file:
    Name,TelephoneNumber,Mystery,Age,Gender
    James,(567) 890-1234,8291 3627 8250 1234,19,Male
    Gandalf,(123) 456-7890,4231 5555 6781 9876,27,Male
    Dumbledore,(313) 337-1337,6291 8765 1095 7629,27,Male
    Joe,(452) 123-1234,3782 2288 1166 3030,35,Male
    Marie,(452) 123-1234,8291 3627 8250 1234,35,Female
    Carrie,(567) 890-1234,2253 5218 4251 4526,35,Female
    
    
    • Set the BIGQUERY_DATASET and BIGQUERY_TABLE environment values.
  • Create a Google Cloud Pub/Sub topic and and a subscription that is subscribed to the topic.
    • Set the PUB_SUB_TOPIC and PUB_SUB_SUBSCRIPTION environment variables to the corresponding values.
  • Create a Google Cloud Datastore kind and add an entity with properties:
  • Update the Datastore kind in InspectTests.java.

Run

Run all tests:

   mvn clean verify