Skip to main content
Dave Mason - Mastodon Dave Mason - LinkedIn Dave Mason - GitHub Dave Mason - Counter Social

STRING_AGG Return Types

Dave Mason SQL Server STRING_AGG

I had a chance to use the STRING_AGG function recently, or so I thought. Since I couldn't use it for "real world" work, I decided to give it a test drive on my own. My initial thought was to do some performance testing and blog about that, but I ran into an interesting problem that I'll discuss here for now.

I started by populating a table with some "string" data:


SELECT TOP(100000)
CAST(ROW_NUMBER() OVER (ORDER BY o.object_id) % 10 AS CHAR(1)) AS Digit,
CAST(ROW_NUMBER() OVER (ORDER BY o.object_id) AS VARCHAR(16)) AS RowNumber,
o.name AS ObjName
INTO tempdb.guest.Strings
FROM sys.objects o
CROSS JOIN sys.objects o2
CROSS JOIN sys.objects o3

My very first query attempt with STRING_AGG bombed:


SELECT STRING_AGG(s.digit, CHAR(13) + CHAR(10))
FROM tempdb.guest.Strings s

Msg 9829, Level 16, State 0, Line 5
STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.

OK, fair enough. I've got 100k rows, times 3 bytes per row. That's well beyond the 8k bytes threshold. But why is that threshold only 8k bytes? Microsoft Docs has some answers. The [digit] column that I pass in as the first argument is data type CHAR(1). It's not a unicode data type, so I'll assume the return type is the same as for VARCHAR. Casting the [digit] column to VARCHAR allows my query to run:


SELECT STRING_AGG(CAST(s.digit AS VARCHAR(MAX)), CHAR(13) + CHAR(10))
FROM tempdb.guest.Strings s

I guessed the data type conversion via CAST might slow things down. And it probably does to a certain extent. But the results come back fast. Really fast. Now, back to that issue with the 8k limit. Here's something else I tried that still didn't work:


SELECT TOP(100) STRING_AGG(s.digit, CHAR(13) + CHAR(10))
FROM tempdb.guest.Strings s

Msg 9829, Level 16, State 0, Line 5
STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.

100 rows times 3 bytes per row...that should work, right? If I look at the estimated execution plan, there's a table scan. The estimated number of rows is 100k, not 100. So I can see why the optimizer isn't cooperating.

Dave Mason SQL Server STRING_AGG Table Scan

This is a little bit of a mystery for me, but I'm not going to lose any sleep over it. (Please leave a comment if you know something that I don't.) As shown above, using CAST to VARCHAR(MAX) works. Even though the performance of STRING_AGG appears really fast, I'm hoping to do some further testing to compare its performance to the old way of concatenating strings from a result set. Stay tuned.

Comments

Post comment