STRING_AGG Return Types
- Posted in:
- Development
- SQL Server 2017
- T-SQL
I had a chance to use the STRING_AGG function recently, or so I thought. Since I couldn't use it for "real world" work, I decided to give it a test drive on my own. My initial thought was to do some performance testing and blog about that, but I ran into an interesting problem that I'll discuss here for now.
SELECT TOP(100000)
CAST(ROW_NUMBER() OVER (ORDER BY o.object_id) % 10 AS CHAR(1)) AS Digit,
CAST(ROW_NUMBER() OVER (ORDER BY o.object_id) AS VARCHAR(16)) AS RowNumber,
o.name AS ObjName
INTO tempdb.guest.Strings
FROM sys.objects o
CROSS JOIN sys.objects o2
CROSS JOIN sys.objects o3
My very first query attempt with STRING_AGG bombed:
SELECT STRING_AGG(s.digit, CHAR(13) + CHAR(10))
FROM tempdb.guest.Strings s
Msg 9829, Level 16, State 0, Line 5
STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.
OK, fair enough. I've got 100k rows, times 3 bytes per row. That's well beyond the 8k bytes threshold. But why is that threshold only 8k bytes? Microsoft Docs has some answers. The [digit] column that I pass in as the first argument is data type CHAR(1). It's not a unicode data type, so I'll assume the return type is the same as for VARCHAR. Casting the [digit] column to VARCHAR allows my query to run:
SELECT STRING_AGG(CAST(s.digit AS VARCHAR(MAX)), CHAR(13) + CHAR(10))
FROM tempdb.guest.Strings s
I guessed the data type conversion via CAST might slow things down. And it probably does to a certain extent. But the results come back fast. Really fast. Now, back to that issue with the 8k limit. Here's something else I tried that still didn't work:
SELECT TOP(100) STRING_AGG(s.digit, CHAR(13) + CHAR(10))
FROM tempdb.guest.Strings s
Msg 9829, Level 16, State 0, Line 5
STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.
100 rows times 3 bytes per row...that should work, right? If I look at the estimated execution plan, there's a table scan. The estimated number of rows is 100k, not 100. So I can see why the optimizer isn't cooperating.
This is a little bit of a mystery for me, but I'm not going to lose any sleep over it. (Please leave a comment if you know something that I don't.) As shown above, using CAST to VARCHAR(MAX) works. Even though the performance of STRING_AGG appears really fast, I'm hoping to do some further testing to compare its performance to the old way of concatenating strings from a result set. Stay tuned.
Comments