Skip to main content
Dave Mason - Mastodon Dave Mason - LinkedIn Dave Mason - GitHub Dave Mason - Counter Social

Pivoting Data Without TSQL

Dave Mason - SQL Server - sp_execute_external_script

Pivoting data in SQL Server is something that comes up infrequently for me. Whenever the need arises, I have to pause and ask myself "What is it I'm trying to do again?". Next I go to the documentation for the T-SQL PIVOT syntax (which I'll never commit to memory) and realize for the umpteenth time the pivoted values have to be hard coded. Then I ponder using dynamic T-SQL because I won't always know the values to pivot at query design time.

If T-SQL isn't a good hammer to PIVOT's nail, perhaps R is. There are different packages that make summarizing and transposing data frames somewhat easy. It can even dynamically pivot unknown values at runtime. But there is a catch, which I'll get to in a bit.

Here's an [AdventureWorks] query showing purchase orders, including the employee that created each PO, the subtotal, and the shipping method.


SELECT
p.FirstName + ' ' + p.LastName AS EmployeeName,
h.SubTotal,
sm.Name AS ShippingMethod
FROM AdventureWorks.Purchasing.PurchaseOrderHeader h
JOIN AdventureWorks.HumanResources.Employee e
ON e.BusinessEntityID = h.EmployeeID
JOIN AdventureWorks.Person.Person p
ON p.BusinessEntityID = h.EmployeeID
JOIN AdventureWorks.Purchasing.ShipMethod sm
ON sm.ShipMethodID = h.ShipMethodID

SQL Server AdventureWorks

While there are likely countless methods and packages that can be used to pivot a data frame in R, I'll focus on the dcast function from the reshape2 package. Here's a small block of code to demonstrate, along with the output:



library(RODBC)
library(reshape2)

dbhandle <- odbcDriverConnect(
'driver={SQL Server};server=.\\SQL2017;database=AdventureWorks;trusted_connection=true')
res <- sqlQuery(dbhandle,
"SELECT
p.FirstName + ' ' + p.LastName AS EmployeeName,
h.SubTotal,
sm.Name AS ShippingMethod
FROM Purchasing.PurchaseOrderHeader h
JOIN HumanResources.Employee e
ON e.BusinessEntityID = h.EmployeeID
JOIN Person.Person p
ON p.BusinessEntityID = h.EmployeeID
JOIN Purchasing.ShipMethod sm
ON sm.ShipMethodID = h.ShipMethodID"
)
po_sum <- dcast(res, EmployeeName ~ ShippingMethod, value.var =
"SubTotal", fun.aggregate = sum)
po_sum

Dave Mason - Microsoft R Client

Most of the R code is a database connection and query. The result set is assigned to the res data frame. The pivoting is done with a single line of code at line 16 that invokes the dcast function. It specifies EmployeeNames as row values, ShippingMethods pivoted as columns, and SubTotals summed. No matter how many different ShippingMethods there are, the dcast function dynamically creates columns for them as needed.

Now, let's see what this looks like using R in-database with sp_execute_external_script:


DECLARE @QueryIn NVARCHAR(MAX) = N'SELECT
p.FirstName + '' '' + p.LastName AS EmployeeName,
h.SubTotal,
sm.Name AS ShippingMethod
FROM AdventureWorks.Purchasing.PurchaseOrderHeader h
JOIN AdventureWorks.HumanResources.Employee e
ON e.BusinessEntityID = h.EmployeeID
JOIN AdventureWorks.Person.Person p
ON p.BusinessEntityID = h.EmployeeID
JOIN AdventureWorks.Purchasing.ShipMethod sm
ON sm.ShipMethodID = h.ShipMethodID'
;

EXEC sp_execute_external_script
@language = N'R',
@script = N'
library(reshape2)
po_sum <- dcast(purchase_orders,
purchase_orders$EmployeeName ~ purchase_orders$ShippingMethod,
value.var = "SubTotal", fun.aggregate = sum)'
,
@input_data_1 = @QueryIn,
@input_data_1_name = N'purchase_orders',
@output_data_1_name = N'po_sum';

Dave Mason - SSMS - sp_execute_external_script

We have our same pivoted result set of 12 rows and 6 columns. But something's missing: the column names. And that's the catch. If you don't use WITH RESULT SETS when invoking sp_execute_external_script, you don't get column names. That's especially important here because the pivoted data values would normally be the column names.

This is one aspect of sp_execute_external_script that vexes me. SQL Server won't infer the column names on its own, even though that meta data is readily available. R has a lot of potential for developers and DBAs, even though we may not be data scientists per se. I'm hoping Microsoft will address this shortcoming in a future Cumulative Update in all versions from SQL 2016 forward.

And maybe you can help! If you would, please upvote this customer feedback item. Thanks for reading...and thanks for voting!

Comments

Post comment