Episode 6: Developing a DLP Strategy. Data, People and Technology
Today Nikoloz will talk about a Data Leak Prevention strategy. The podcast will be about development of the strategy, key points to focus on and the main features that you would want to your DLP to have. If you are looking for specific tools or solutions then this this episode is not for you, but if you want to find out how to define vision and strategy, how to choose the solution, what features should you be looking for and organize data governance, then you are advised to listen. Stay tuned for the experience!
Other ways to listen:
Data leakage is probably one of the main problems that organizations face. Often companies do not have a view of the data that is processed or transferred on their premises or outside. This leads to absence of controls which is therefore exploited by malicious employees or just leads to leakage of data by mistake.
Let’s agree that human factor plays key role in everything related to information security and DLP solution is one of the controls that helps prevent mistakes, boost user awareness about data sensitivity and gives you an opportunity to see the data flow. Imagine your company is a human body, where the flow of blood is a vital part and DLP is like an advanced blood pressure measuring device providing data about the contents in your blood, gives you a feeling of your pressure and allows you to decide what medicine to take or how to improve your health.
So who needs DLP? DLP is for every Medium or Large organization, who has sensitive customers, processes personal data and or Credit Card Data. Every organization who is obliged to obey privacy legislations such as GDPR, stay compliant with standards and who want to maintain trust in their services should have some kind of a Data Leakage Prevention in place.
Now you will tell me that you do not want to invest in DLP because you have mitigating controls in place, you block USB ports, you prevent access to Cloud Storage and your policies require encryption of files that are sent via email. That would be correct but I want you to think about all the data leakage points that still exist. Are you sure that user’s will not be able to upload files to some custom file transfer services by accessing an IP address that does not have a well-known hostname? How confident are you that malicious insiders will not use mobile hotspot to share your customer data with competitors, bypassing your proxy restrictions? How can you make sure that when you allow an exception for a user to send files to your client via Cloud Storage, user will not send Personal Identification data to their personal accounts or competitors?
This is where the DLP comes in play, without such solution you will have hard time distinguishing loads of public data from personal data. It can be used to discover personal data within your organization’s premises, identify various forms of personal data from names and phone numbers to government identifiers and credit card numbers, assemble multiple subsets of such data to accurately identify a whole record, and even do all of this in multiple languages.
DLP is a solution capable to detect and prevent intentional or unintentional data leakage in a corporate environment. There are various DLP solutions on the market, however methodology and features on the high level are quite similar. Next, I will do a very high-level introduction of the core processes included in DLP:
- First organization needs to define a type of data that needs to be protected
- Is it personal data such as ID numbers, passport numbers, addresses, names, birthdates?
- Maybe you want to focus on financial and card data such as PAN numbers, IBAN, financial amounts, currencies?
- Or you have specific types of documents that your organizations really care about such as signed files, classified files, images, sheets
- Why not all of them right?
- After you have identified the types of data that is most important for you, you need to focus on creating rules for a DLP, meaning that you need to transform identified data into the patterns that would be understandable for your DLP solution. Now, do not be afraid of this step because most of the rules that come with DLP solutions will probably cover you by default, but anyway here are the things that are most required for the DLP solution
- Regular expressions or RegExs which is basically a pattern like let’s say LetterLetter-NumberNumber-Letter
- Next we have signatures, which is useful when organization is signing or classifying the documents
- You can also use machine learning, where you will periodically feed various types of files and data to DLP in order to increase its knowledge and to better identify data in question
- Now, when it comes to a DLP solution there are usually three types network-based, host-based and hybrid
- Network-based as you might have guessed is deployed on a network, it might be a standalone hardware or embedded into firewall. It acts as a proxy, reads data in network traffic, decrypts and inspects if traffic is encrypted
- Host-based is installed on a host machines as a software with administrative privileges
- Monitors user activities on Operating System layer such as opening/writing files, archiving., typing, clipboard and more
- Host-based DLP might have a slight impact of endpoints performance
- DLP usually comes with various modes of action, but detection and prevention modes are always available options.
- Detection mode detects transfer of sensitive data and incident is logged as a detected incident. Often provides ability to escalate incidents by for example sending a notification to user Data Owner
- Prevention mode instantly block transfer of sensitive data as defined in your policies and incident is logged as a prevented incident. So, if in your policy you allow unencrypted zip files to be transferred only on your premises, then when user attempts to upload it to Google Drive, this action will be blocked.
Now let’s talk about strategy. How should we approach it? I will present you with my way of thinking and managing this topic, this does not mean that my method is the best nor that it will work in your organization, but probably it will be useful in creating your own strategy.
Before typing a first letter on any strategic documents I always divide it in following core parts Introduction, Vision and Strategy and case of DLP we will also have a DLP part. Introduction provides a very high-level information in couple of sentences to present the risk that needs to be solved by DLP. Vision is about providing precise details of why the risk exists and proposing what I want to achieve, what is my goal. DLP part will be about describing the Data Leakage Prevention, and presenting how DLP will help me in achieving the goals proposed in vision. Now, strategy part is the biggest one and presents a reader about how I will approach the risks and implementation of DLP. Keep in mind that you will be most likely trying to get approval from C suite or top management, and you need to be precise and write as small amount of text as possible because less text gives better idea about why you need their support. You can also use some visuals to make your strategy more fun to read.
I believe that DLP is not only about technology its about data, technology, people and implementation. Data is about understanding the types of data and patterns of data that needs to be protected. Technology is about features that you require from the solution, type of the solution and infrastructure requirements. People, a vital part of every organization, defines data ownership, user education and positive communication with employees. Implementation is about setting up a test environment, scoping the pilot and going to full deployment
Let’s follow the trail and start with the data. Usually companies have defined a data classification guideline usually located in information security policies or data governance policies. The data classification allows you to classify all the data in your organization into categories or classes, for example a policy might state that all personal data is a confidential information and that tax payment data is restricted information. Or you can come up with different classifications. Remember that alone you can never implement DLP and nor will you be able to protect data. So for classification purposes you might seek assistance from business line managers, heads of divisions or departments and even C suite. You can setup small workshops to understand what kind fo data you have, how can your organization best classify data and distribute responsibilities. It might be useful to create a small chart where you display all classes and come up with examples that fall in those classes. This will be very useful when communicating with data custodians and other people who will be actively using the data. I would recommend that as an information security professional you should guide your stakeholders during this process but allow them to take ownership of data. Information security should not own data other than security incidents, logs or other similar pieces of information.
Since the data loss prevention solution is a software, you need to identify details of how types of data are structured, as this will aid you in making thee selection and determining the best deployment strategy. One of the ways that can help you is my creating a data type and pattern mapping form. You can use any format you like; this form should allow stakeholders to provide you with common patterns for your data. This might not be necessary if you are just focusing on one type of data because as I mentioned before most of the solutions on the market will have default policies that might cover you.
After having a good understanding of the data in your organization, we need to switch to technology. I will propose the main features that you might want your DLP to have, however remember that what works in my experience might not work for you so feel free to add or remove features from my suggestions. So, features that I am mostly looking for is first of all:
- Protection against common data leakage vectors:
- File Transfer Protocols
- Instant messaging
- Removable media
- Social media
- Web pages
- Ability to build a test environment to evaluate the effectiveness of DLP solution and detect problems, identify false positives and overall finetune the policies.
- Predefined rules that are covering your standards, jurisdiction and legal requirement such as GDPR and PCI-DSS
- Since I am located in Europe I am also looking into that vendor and the solution are compliant with GDPR
- Customizable built-in sensitive information types. This means that DLP should detect email that includes your own organizations numbering scheme for personnel information or customer accounts
- Customizable rules is absolute must
- Customizable and multilingual notification to offending users of the policy violations and that their data has been blocked/removed. This will be important if you are a multilingual company or operate in different countries
- You definitely need a customizable encryption requirement for sensitive data before storing, processing or transferring the data. For example, you will need to force encryption when transferring sensitive data outside organization’s premises or force encryption when transferring one external storage or USB devices
- Ability to mask sensitive data, there is no use of DLP if your IT support can read personal information that one employee mistakenly transferred to Google Drive.
- Detect sensitive data at rest, during processing and in transit
- Detect and protect sensitive information both on local systems and in the cloud
- Detect and protect sensitive information sent via SMPT, HTTP, HTTPS, NNTP, FTP, IM and other protocols or customer services.
- Logging and reporting is crucial, usually you would want to have an ability to generate reports in multiple formats, get report from any point in time, report on users and departments with most incidents, also having an ability to continuously update risk profiles is a good thing to have. Remember that if your organization is using some kind of SIEM solution, then you need to check with a vendor that DLP will be fully compatible with it
- Incident management is a core part of DLP solutions but you should be looking for at least capabilities of marking incidents based on categories, data and false positives, also escalating incidents to employee managers and temporarily allowing some actions, because you do not want to prevent business operations
- In order to make sure that solution will work as intended you might want to check that it is compatible with your infrastructure an security measures. For example, with operating systems, deployed software, browsers, websites, endpoint protection, encryption and so on. Make sure that you request confirmation from vendors on this.
- DLP should be able to integrate with Active Directory – where it can gather information about users, roles, positions, departments, countries, organizational units and other.
- It’s 2018 so you should probably need Single Sign On to prevent multiple accounts and improve onboarding and offboarding processes
- One of the features that DLP solutions have is a continuous data discovery, although the name might be different for each one, this means that it should scan your organizations premises for sensitive data, schedule scans, identify and categorize discovered data, notify data owners about existence of discovered sensitive data
- Next feature is steganalysis (which I am not sure if many vendors propose but it’s definitely something to look for). As you know you can hide sensitive pieces of data in plain sight in the bits and bytes of ordinary images and videos. In order to prevent data leakage by tech-savvy users, you might need to have this in place
- Data fingerprinting allows DLP to collect data in variety of formats such as Word, Excel and PDF, later it fingerprints such data with a hashing algorithm to produce an index that can be deployed as part of a DLP policy
- Last feature that I recommend you too look for is a vector machine learning or VML. This way you will provide DLP with positive and negative examples of data at the training stage, during which the features of the data are extracted to build a statistical profile that is used o classify unstructured textual data that should be protected.
Okay, now we know what to look for, but where should you place the DLP? While network-based solution can be deployed quicker and provide coverage on the internal network, they are generally unable to scan encrypted traffic and can impact network performance when they are in preventing mode by blocking inappropriate traffic. Additionally if you want to have a better protection on your endpoints no matter where they are located or how their network connection works, then you might want to look in to a host based solution. Host-based solution is slightly difficult and slower to deploy, but provides more direct control over user actions and coverage for systems that are connected outside of corporate network.
Lastly I would suggest that you communicate with your IT and infrastructure teams, setup some workshops to understand how you want to structure the solution, calculate potential costs and resources that you will need.
Next step is people… Success of the DLP strategy will largely depend on the people. Employee should be aware about DLP implementation, your strategy, consequences and most importantly their duties and responsibilities. If your organization does not already have a data governance policy, then I would suggest to go for it a draft a simple instruction that you will use to define the roles. Depending on the size and culture of your organization, you would need at least following roles.
Data owners – usually head of department who are both responsible and accountable for the protection and classification of specific datasets. They appoint data stewards or in some cases data custodian and delegate administrative activities to data custodians.
You might also have a data steward – who can be a line manager who operationalizes data strategy and defines business rules
Data custodian on the other hand is a data administrator, responsible for implementing and maintaining security controls to meet requirements determined by data owner.
Data processors are users who utilize, process, access and transfer data. They are limited by security controls put in place by custodian as determined by the owner.
So in my example we have Data owners, Data stewards, data custodians and data processors. Now that’s left is to define responsibilities for each process, for this I would recommend to go with a simple RACI chart. RACI stands for Responsible Accountable Consulted and Informed. With little internet search, you might find some examples of it. Additionally, RACI will be a good practice for presenting your strategy to the management.
When responsibilities are defined, the next step for people is the education and awareness. You might need to create and distribute training materials, gather questions from users and in the beginning conduct some QA session. Also remember the power of positive communication, try to present DLP solution as something that will enhance companies trust and improve everyday processes. Do not play a bad cop in this part, after all you need dlp to support and protect business.
Now, implementation part is where you need to define the way you will start and continue working with DLP. Try to structure the implementation process in certain phases, where the initial pilot will cover high-value networks, hosts and departments. This will allow you to prove the technology, increase buy-in as benefits are realized, and gradually build skills, rules, and data analysis capabilities. Try to use pilot to learn about the data flow, find what kind of data is going through that specific part of your organization. Start with detection mode first and learn from it to improve policies. After you are confident that DLP is working, try switching to prevention mode periodically and test how it goes. These short experiments will allow to become even better in DLP implementation and give you experience to start covering other parts of your organization.
That’s about it for DLP strategy. It definitely is a long term engagement especially for large organizations and requires hard work, but I think its totally worth it!
Don't forget to subscribe below, stay up-to-date with latest podcasts and other developments!
If you find this content useful, feel free to share it with your friends and family. Owls love humans, so if you want to keep in touch make sure to sign up for CypherOwl Newsletter. Let me know what you think from the comments section below.