Compress & Uncompress Query Results for Faster Data Transfer over TDS
Working with remote data in the cloud has become a challenge over the last few years. When you're on-premises, querying remote data can be slow. Two big factors working against you are the limits of your network bandwidth and the TDS protocol. I'm not well versed with TDS. But from what I have read, it sends query results as uncompressed XML from server to client. It does not support compression natively. (I've also read there are some 3rd-party software/hardware solutions that compress TDS data streams. It may also be possible to create a VPN tunnel that supports TDS compression.)
Consider the following linked server query involving a table of GeneralJournalAccountEntry data from Dynamics365. It takes about the same amount of time to complete, whether table compression is enabled or not. ROW compression, PAGE compression, clustered COLUMNSTORE indexing...none of that makes a difference over TDS.
SELECT TOP(10000) * FROM [MyLinkedServer].[RemoteAzureDatabase].dbo.GeneralJournalAccountEntry;
Now consider this alternative. OPENQUERY is used to "pass through" a query to the remote server in Azure. It selects 10,000 rows from a table FOR JSON AUTO, compresses the JSON data via the native TSQL function COMPRESS(), and assigns it to variable @RemoteVarBinData...all on the remote server. The compressed VARBINARY data is sent across the network (via TDS protocol) to the client, where it is stored in (assigned to) local variable @VarBinData. From there, the data is decompressed via TSQL function DECOMPRESS(), and assigned to local variable @JsonData. Lastly, the JSON data is shredded to a tabular result set via OPENQUERY().
DECLARE @VarBinData VARBINARY(MAX); SELECT @VarBinData = ( SELECT * FROM OPENQUERY([MyLinkedServer],' DECLARE @RemoteVarBinData VARBINARY(MAX); SELECT @RemoteVarBinData = COMPRESS( ( SELECT TOP(10000) * FROM [RemoteAzureDatabase].dbo.GeneralJournalAccountEntry FOR JSON AUTO ) ) SELECT @RemoteVarBinData;')oq ) DECLARE @JsonData NVARCHAR(MAX) = DECOMPRESS(@VarBinData) SELECT * FROM OPENJSON(@JsonData) WITH ( [$FileName] NVARCHAR (1024) '$."$FileName" ', [_SysRowId] BIGINT '$."_SysRowId" ', [LSN] NVARCHAR (60) '$."LSN" ', [LastProcessedChange_DateTime] DATETIME2 '$."LastProcessedChange_DateTime" ', [DataLakeModified_DateTime] DATETIME2 '$."DataLakeModified_DateTime" ', [RECID] BIGINT '$."RECID" ', [ACCOUNTINGCURRENCYAMOUNT] DECIMAL (32,6) '$."ACCOUNTINGCURRENCYAMOUNT" ', [ALLOCATIONLEVEL] BIGINT '$."ALLOCATIONLEVEL" ', [GENERALJOURNALENTRY] BIGINT '$."GENERALJOURNALENTRY" ', [HISTORICALEXCHANGERATEDATE] DATETIME2 '$."HISTORICALEXCHANGERATEDATE" ', [ISCORRECTION] BIGINT '$."ISCORRECTION" ', [ISCREDIT] BIGINT '$."ISCREDIT" ', [LEDGERACCOUNT] NVARCHAR (100) '$."LEDGERACCOUNT" ', [LEDGERDIMENSION] BIGINT '$."LEDGERDIMENSION" ', [POSTINGTYPE] BIGINT '$."POSTINGTYPE" ', [QUANTITY] DECIMAL (32,6) '$."QUANTITY" ', [REPORTINGCURRENCYAMOUNT] DECIMAL (32,6) '$."REPORTINGCURRENCYAMOUNT" ', [TEXT] NVARCHAR (1000) '$."TEXT" ', [TRANSACTIONCURRENCYAMOUNT] DECIMAL (32,6) '$."TRANSACTIONCURRENCYAMOUNT" ', [TRANSACTIONCURRENCYCODE] NVARCHAR (100) '$."TRANSACTIONCURRENCYCODE" ', [MAINACCOUNT] BIGINT '$."MAINACCOUNT" ', [PROJID_SA] NVARCHAR (100) '$."PROJID_SA" ', [PROJTABLEDATAAREAID] NVARCHAR (100) '$."PROJTABLEDATAAREAID" ', [REASONREF] BIGINT '$."REASONREF" ', [PAYMENTREFERENCE] NVARCHAR (100) '$."PAYMENTREFERENCE" ', [ASSETLEASEPOSTINGTYPES] BIGINT '$."ASSETLEASEPOSTINGTYPES" ', [ASSETLEASETRANSACTIONTYPE] BIGINT '$."ASSETLEASETRANSACTIONTYPE" ', [PARTITION] BIGINT '$."PARTITION" ', [RECVERSION] BIGINT '$."RECVERSION" ', [CREATEDTRANSACTIONID] BIGINT '$."CREATEDTRANSACTIONID" ' )
That's a lot more code than the first query. But hopefully it's not too complicated to follow. I ran tests with 10,000 rows, 100,000 rows, and 1 million rows. Here are my results:
Elapsed time | ||
Rows Selected | Traditional/Uncompressed | Compressed JSON |
10k | 00:02.39 | 00:01.49 |
100k | 00:23.15 | 00:14.85 |
1M | 03:43.00 | 02:20.00 |
Elapsed time savings increase as the size of the result set increases. I did not try any result sets larger than a million rows, so I'm not sure how well this scales. It's hard to imagine using this approach with ad-hoc queries. But for a stored procedure, it might make sense where performance is imperative.
What if you're not using SSMS? What then? I did some very basic testing in C#. I was able to run the COMPRESS query on a remote SQL server, read the VARBINARY(MAX) SQL data into a C# byte [] array, decompress it into a JSON string, and serialize it to a DataTable.
using System.IO.Compression; using Microsoft.Data.SqlClient; using System.Data; using Newtonsoft.Json; using System.Text; using (SqlConnection conn = new SqlConnection()) { SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder(); builder.DataSource = "RemoteSqlServerHost"; builder.IntegratedSecurity = true; builder.TrustServerCertificate = true; conn.ConnectionString = builder.ConnectionString; conn.Open(); using (SqlCommand cmd = new SqlCommand()) { cmd.Connection = conn; cmd.CommandText = "DECLARE @RemoteVarBinData VARBINARY(MAX);" + Environment.NewLine + "SELECT @RemoteVarBinData = " + Environment.NewLine + " COMPRESS(" + Environment.NewLine + " (" + Environment.NewLine + " SELECT TOP(10000) * " + Environment.NewLine + " FROM RemoteAzureDatabase.dbo.GeneralJournalAccountEntry" + Environment.NewLine + " FOR JSON AUTO" + Environment.NewLine + " )" + Environment.NewLine + ")" + Environment.NewLine + "SELECT @RemoteVarBinData AS RemoteData; "; SqlDataReader dr = cmd.ExecuteReader(); if (dr.Read()) { byte[] CompressedData = (byte[])dr[0]; using (var inputStream = new MemoryStream(CompressedData)) { using (var gZipStream = new GZipStream(inputStream, CompressionMode.Decompress, true)) { using (var streamReader = new StreamReader(gZipStream, Encoding.Unicode)) { string jsonData = streamReader.ReadToEnd(); DataTable? dt = JsonConvert.DeserializeObject(jsonData, typeof(DataTable)) as DataTable; } } } } } }
Comments