A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Friday, March 15, 2013

SSIS-Encrypt data while in transit

A coworker asked if I had any experience with encrypting data while in transit with SSIS. Their client specifies that PII data must be protected at rest and in flight. They enforce the data at rest through something, it's not my client so I don't know the particulars but we'll be pulling data like a social security number, SSN, out of the financial system and into the data warehouse and while that data is between places, our ETL needs to safeguard that data.

I had never had to deal with encryption but in SSIS, or any ADO.NET connection, we can encrypt the connection and everything flowing down the wire should be encrypted. Encrypting Connections to SQL Server It looks like there’s some setup involved but once it’s done, it should be a matter of simply updating the connection string from a usage perspective. Update 2013-05-09 After attending Steve Jones's talk on The Encryption Primer, he suggested using IPSec tunneling as it will be far easier to implement than setting up SSL certs.

The second option was to use the native, but new-to-me, TSQL function for encryption and decryption. The following example shows me using Encrypt/DecryptBYPASSPHRASE functions to encrypt/decrypt the names from spt_values. At this point I will also point out, that crypto stuff has never been anything I've been passionate about so do your due diligence before you go implementing something you read on the interblag.
See also: Cryptographic functions

   1:  DECLARE @PassphraseEnteredByUser  nvarchar(128);
   2:  SET @PassphraseEnteredByUser  = 'Bill is the greatest';
   3:  WITH SRC AS
   4:  (
   5:      SELECT
   6:          SV.name
   7:      ,   ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS  FakeSK
   8:      FROM
   9:          master.dbo.spt_values AS SV
  10:  )
  11:  , ENCRYPTO AS
  12:  (
  13:      SELECT
  14:          S.name
  15:      ,   EncryptByPassPhrase
  16:          (
  17:              @PassphraseEnteredByUser
  18:          ,   S.name
  19:          ,   1
  20:          ,   CONVERT( varbinary,  S.FakeSK)
  21:          ) AS  encrypted_text
  22:      ,   S.FakeSK
  23:      FROM
  24:          SRC S
  25:  )
  26:  , DECRYPTO AS
  27:  (
  28:      SELECT
  29:          E.*
  30:      ,   CONVERT(nvarchar, DECRYPTBYPASSPHRASE
  31:          (
  32:              @PassphraseEnteredByUser
  33:          ,   E.encrypted_text
  34:          ,   1
  35:          ,   CONVERT( varbinary,  E.FakeSK)
  36:          )) AS  decrypted_text
  37:      FROM
  38:          ENCRYPTO E
  39:  )
  40:  SELECT
  41:      D.* 
  42:  FROM 
  43:      DECRYPTO AS D;

In the SSIS world, we could use this as our source query. The challenge around doing that we’d either need to write the encrypted value into a staging table and then decrypt in a post cleanup step (Execute SQL task) or decrypt in the data flow prior to writing to the target table. But at that point, we’d have unencrypted SSNs in memory. If we run out of memory in the DF and spill to disk, the temp files would have the unencrypted SSN in them too. The automatic, post execution step would handle the cleanup on those files but there’d be the ghost image of them if people were NSA crazy about getting the data back. Either way, that might violate the client's rules on PII security.

The other thing to note about encrypting the data is that you’re going to go from a simple 9 byte allocation per row to varbinary(8000) which will result in the data flow either writing the binary portion to disk and carrying a pointer along to that field in the in memory portion or the engine will have to allocate 8k bytes of memory per row in the buffer. Either way, you’re probably going to feel a performance impact for large datasets but we’d have to test before we can know the possible severity.

No comments: