As more organizations move sensitive data into the cloud, there is increasing pressure to identify what kinds of data are in various cloud storage models, to track data in all cloud environments and to control data protection throughout all cloud deployments.
The motivations for this are multiplying rapidly. Reasons range from internal security best practices and data classification needs to regulatory and privacy requirements, such as GDPR and California Consumer Privacy Act. That said, this is a tall order — discovery alone can be a huge headache. Fortunately, there is a steady increase in the number of tools and services organizations can employ to discover sensitive data in cloud environments.
Cloud data discovery governance
Before exploring tools and services to track data, it is worth discussing the governance needed to best accommodate cloud-oriented data discovery and monitoring. Ideally, any sound cloud governance program should include involvement from disparate teams, including legal, HR, infosec, IT, and risk and compliance. For any cloud project or deployment, the data types that may be involved should be categorized and assessed for risk. Additional data classification policies should be in place as background. For each cloud service provider chosen — whether existing or new — data discovery tools and methods should be evaluated when sensitive data is involved in the deployment. Data types and sensitivity tiers should be a core element of all facets of cloud risk and governance.
Fortunately, emerging tools and services can enable organizations to discover, categorize and track data in the cloud. The starting point for tracking and identifying data sent to and downloaded from SaaS tools is likely some sort of brokering service, for example, a cloud access security broker (CASB). As all SaaS options are different, it is challenging to develop a consistent tracking and discovery process for data patterns — especially without a third-party CASB service layer in place.
PaaS and IaaS data discovery tools
Larger PaaS and IaaS providers offer various cloud-native services that can orchestrate data discovery and tracking. In AWS, the Macie service can be used to automatically identify and protect sensitive data stored in S3. Custom monitoring scenarios can also be implemented through integration with the Amazon CloudWatch service.
Microsoft Azure has several integrated services to identify data. The Data Discovery & Classification service for Azure SQL Data Warehouse is a fully cloud-native service that offers data discovery and security recommendations, labeling and classification tagging, and monitoring and auditing of identified data, as well as reporting on all access, usage and storage. Azure Information Protection is another service security teams can use to tag and track data, primarily documents and email content within Azure and Office 365 cloud environments.
Google Cloud Platform also has several tools available for data discovery and control. Data Catalog is a metadata management service that acts as a search engine for all data stored in the Google cloud. This includes automated searching, discovery and cataloging of all data. The service integrates with another critical discovery and data security tool — Google Cloud Data Loss Prevention (DLP) — where data in Cloud Storage, BigQuery and Cloud Datastore can be assessed for sensitive data patterns. Cloud DLP also supports API integration with external data and event analysis tools. Additionally, Cloud DLP can mask or tokenize data elements to protect sensitive data patterns when discovered.
Other cloud data discovery tools
Aside from cloud provider and CASB data discovery services, there are a range of third-party tools available to integrate with cloud data stores and deployments. If an enterprise is already using a specific set of products internally for data discovery and DLP, it may make sense to continue with the same tool or tools in cloud environments if they are able to provide the same capabilities.
When looking at cloud data discovery tools, be sure to consider:
- coverage of data sources and types
- data pattern matching flexibility
- data control policies, if available
- monitoring and reporting capabilities
Data discovery tools and services across all cloud environments are still evolving. They can be expensive to implement in a multi-cloud architecture. Keep an eye on this emerging technology, as it is likely to improve significantly in the next year or two.