SQL 2016 Dynamic data masking

SQL 2016 Dynamic data masking

Dynamic data masking in SQL Server 2016 limits sensitive data exposure by masking it to non-privileged users. Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. Dynamic data masking complements other SQL Server security features, including auditing, encryption, and row-level security. Using the dynamic data masking feature, in conjunction with other SQL Server security features, helps better protect sensitive data.

Set-up is fairly simple. You choose a table and column, determine how much of the data you want the users to see, and apply a function to mask it. If there is a group that needs to see all the data, you can apply permissions for them to do so.

Before Masking

I’m going to run one query:

SELECT PO.PurchaseOrderID, PO.OrderDate, PO.SupplierID, POL.PurchaseOrderLineID, POL.OrderedOuters, POL.ReceivedOuters, POL.StockItemID, POL.Description

FROM Purchasing.PurchaseOrders AS PO

INNER JOIN Purchasing.PurchaseOrderLines AS POL ON ON POL.PurchaseOrderID = PO.PurchaseOrderID;

The result is

Applying a Mask

The mask gets applied by issuing an ALTER TABLE…ALTER COLUMN…ADD MASKED WITH command.

The options for masking functions are:

Function Before mask After mask
Default() – masks all the data Number: 487

Text: Hany

(FUNCTION = ‘default()’)

Number: 000

Text: XXX

Email() – masks everything but first letter, @, and suffix Hany@email.com (FUNCTION = ’email()’)

jXX@XXXXX.com

Partial() – custom mask in which you determine how much will be shown 985-744-5238 (FUNCTION = ‘partial(1,”XXXXXXX”,0)’)

9XXXXXXXXX

 

(FUNCTION = ‘partial(3,”XXXXXXX”,0)’)

985XXXXXXX

Random() – mask a number within a range 487 (FUNCTION = ‘random(1,1000)’)

198, or perhaps 633, or maybe 1000

 

 

I’m applying the default mask to the Supplier ID column

 

ALTER TABLE Purchasing.PurchaseOrders

ALTER COLUMN SupplierID ADD MASKED WITH (FUNCTION = ‘default()’);

Controlling access to masked data

The key to controlling access to masked data is in the privileges granted to the user accounts. When I created the table and then queried the data, I was logged in with an administrative account. However, an account that has been granted only SELECT permission on the table will see the data as masked.

To test this out, let’s start by creating a local user account on the EmpData4 database and granting that account the SELECT permission:

1

2

3

4

5

USE EmpData4;

  GO

  CREATE USER user1 WITHOUT LOGIN;

  GRANT SELECT ON OBJECT::dbo.EmpInfo TO user1;  

  GO

This should all be fairly straightforward. We’ve created an account in the database that is not linked to a login (to keep things simple for our testing), and then we granted the SELECT permission to that account to allow read access to the EmpInfo table.

Now let’s run our SELECT statement again, only this time, we’ll do it within the context of the new user:

1

2

3

4

5

6

EXECUTE AS USER = ‘user1’;

  SELECT TOP 5

    EmpID, FirstName, LastName, 

    Birthdate, CurrentFlag, SalesLastYear 

  FROM EmpInfo;

  REVERT;

We’re simply using the EXECUTE AS and REVERT statements to get the data that the new user would see. As the following table shows, the values in the four columns are now very different from how they’re stored in the database.

EmpID FirstName LastName Birthdate CurrentFlag SalesLastYear
274 Stephen xxxx 1900-01-01 0 0.00
275 Michael xxxx 1900-01-01 0 0.00
276 Linda xxxx 1900-01-01 0 0.00
277 Jillian xxxx 1900-01-01 0 0.00
279 Tsvi xxxx 1900-01-01 0 0.00

To begin with, the database engine returns each LastName value as xxxx. The LastName column is defined with the NVARCHAR(50) data type. The database engine returns the xxxx value for all masked character data, unless the column is defined at a smaller size. For example, a CHAR(2) column will return a masked value of xx.

The database engine handles date/time values differently. Consider the Birthdate column, which is configured with the DATE data type. Each value is returned as 1900-01-01 (or whatever format is consistent with your regional settings). If this had been some variation of the DATETIME data type, such as DATETIME2, the returned values would be 1900-01-01 00:00:00:0000000.

For numeric columns, the database engine returns 0 or 0.00, if decimals are involved. In this case, the CurrentFlag column is configured with the BIT data type, which qualifies as numeric, and the SalesLastYear is configured with the MONEY data type.

Regardless of the data type, the default function results in the entire value being somehow masked, which means an unauthorized user would not be able to infer the actual value other than to know the type of data, such as character or date.

In some cases, you might want to allow a user to view the data as unmasked. To do so, you would grant the UNMASK permission to the account:

1

2

GRANT UNMASK TO user1;

GO

Now when you run the SELECT statement as User1, the results will show the unmasked data. However, if you then decide that the account should not be able to access the data as unmasked, you can revoke the permission:

1

2

REVOKE UNMASK TO user1;

GO

After you run this statement, the user will again see the data as masked, as defined by the masking rules in the column definitions.

Verifying column masks

If you want to view which columns are masked in your database and how that masking is implemented, you can use the sys.masked_columns system view, as shown in the following SELECT statement:

1

2

3

4

5

SELECT OBJECT_NAME(object_id) TableName,

name ColumnName,

masking_function MaskFunction

FROM sys.masked_columns

ORDER BY TableName, ColumnName;

If you run the statement against our example database, you should see the following results.

TableName ColumnName MaskFunction
EmpInfo Birthdate default()
EmpInfo CurrentFlag default()
EmpInfo LastName default()
EmpInfo SalesLastYear default()

In this case, we’ve defined masking rules on only four columns in the EmpInfo table, with all rules using the default function. If we had used functions that require arguments, those arguments would be shown here as well.

Applying an email mask

The next function we’ll look at is email, which is used specifically to mask email addresses. The email function is as simple to implement as the default function, except that it returns somewhat different results. The default function masks the entire value, whereas the email function masks all but the first letter and, in some cases, the Internet domain (e.g., .com, .org, .edu).

The behavior of the email function might seem a bit odd because it returns the .com domain for all email addresses. For example, if the email address ends in .org or .edu, the masked value will read .com, as it would if the domain actually were .com. Another way to look at this is that the function masks the domain for all email addresses except those that end with .com.

Regardless of the domain issue, the process of defining the masking rule is quite straightforward:

1

2

3

ALTER TABLE EmpInfo

ALTER COLUMN EmailAddress NVARCHAR(50)

MASKED WITH (FUNCTION = ’email()’) NULL;

In this case, we’re simply modifying the EmailAddress column to include the masking rule, with the email function specified. Now let’s query the table as our privileged user:

1

2

SELECT TOP 5 EmpID, EmailAddress

FROM EmpInfo;

As expected, the SELECT statement returns the results shown in the following table.

EmpID EmailAddress
274 stephen0@adventure-works.com
275 michael9@adventure-works.com
276 linda3@adventure-works.com
277 jillian0@adventure-works.com
279 tsvi0@adventure-works.com

This time, we’ll run the SELECT statement as user1:

1

2

3

4

EXECUTE AS USER = ‘user1’;

SELECT TOP 5 EmpID, EmailAddress

FROM EmpInfo;

REVERT;

Now our results now look quite different.

EmpID EmailAddress
274 sXXX@XXXX.com
275 mXXX@XXXX.com
276 lXXX@XXXX.com
277 jXXX@XXXX.com
279 tXXX@XXXX.com

That’s all there is to masking email addresses. You might decide it’s just as easy to go with the default function to mask the data. In most cases, the value xxxx will work as effectively as tXXX@XXXX.com. The only advantage to the latter is that it telegraphs to the user that this value is a masked email address, and perhaps that’s exactly what you want to do.

Giving users rights to see the masked data

There may be some groups of users that should be able to view the data in an unmasked state. In that case, you can grant them permission to see unmasked data.

GRANT UNMASK TO PurchaseOrdersApp;

GO

If the user or group should go back to seeing masked data, you can revoke the permission.

REVOKE UNMASK FROM PurchaseOrdersApp;

GO

Remove data masking

At a future point, you may choose to remove data masking. This is easily accomplished with the DROP MASKED command.

ALTER TABLE Purchasing.PurchaseOrders

ALTER COLUMN SupplierID DROP MASKED;

 

You can query the sys.masked_columns view to determine which columns of which tables in your database are masked

 

SELECT c.name, tbl.name as table_name, c.is_masked, c.masking_function

 

FROM sys.masked_columns AS c

JOIN sys.tables AS tbl ON c.[object_id] = tbl.[object_id]

WHERE is_masked = 1;

is_masked = 1;

Protecting sensitive data

Dynamic data masking is a great tool to use in combination with other security features (such as encryption) to protect sensitive data.

 

Author:Hany Ali