Analytic Function FIRST_VALUE

2021-10-02

Last week, I found myself with a T-SQL problem. A customer has a database where multiple "widget" records might exist in a table. End users might enter the duplicate rows, or they might be inserted by other automated processes. My task was to keep the most recent widget record. But, if there were any NULL values, look in the previous "duplicate" rows and use the most recent missing value for each column.

Here's a contrived data example. I needed to take the following result set...

Row_ID	Tomash_ID	Hoppler_ID	Atrovel_ID	Mogotrevo_ID	Buwheal_ID	Izzillent_ID
1	NULL	2	3	14	55	16
2	21	22	3	4	5	NULL
3	12	NULL	3	44	NULL	6
4	11	22	NULL	4	NULL	66
5	11	2	3	14	NULL	NULL
6	1	2	3	NULL	NULL	NULL

...and reduce it to this:

Row_ID	Tomash_ID	Hoppler_ID	Atrovel_ID	Mogotrevo_ID	Buwheal_ID	Izzillent_ID
6	1	2	3	14	5	66

There was an existing stored procedure that resolved the duplicates. But it had quite a bit of procedural code. There were separate queries that looked up the most recent value for each column, followed by an update to the "good" row. I was looking for a set-based approach. I'd made a few various attempts. Some were close, others were ridiculous, all were unsuccessful. Then I stumbled upon the FIRST_VALUE analytic function for T-SQL.

FIRST_VALUE

As its name indicates, FIRST_VALUE returns the first value in an ordered set of values. It requires an OVER clause along with an ORDER BY clause. There are optional PARTITION BY and ROWS/RANGE clauses -- check out the documentation for all the details.

Below is the query I used to get the desired result (along with some sample data so you can try yourself). The CASE expression is used to avoid returning NULL values--it orders them last. (There are IGNORE NULLS | RESPECT NULLS parameters that would seem to work well here, but the documentation says those only apply to Azure SQL Edge.)

SELECT TOP(1)
	Row_ID,
	FIRST_VALUE(Tomash_ID) OVER (ORDER BY CASE WHEN Tomash_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Tomash_ID_MostRecent,
	FIRST_VALUE(Hoppler_ID) OVER (ORDER BY CASE WHEN Hoppler_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Hoppler_ID_MostRecent,
	FIRST_VALUE(Atrovel_ID) OVER (ORDER BY CASE WHEN Atrovel_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Atrovel_ID_MostRecent,
	FIRST_VALUE(Mogotrevo_ID) OVER (ORDER BY CASE WHEN Mogotrevo_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Mogotrevo_ID_MostRecent,
	FIRST_VALUE(Buwheal_ID) OVER (ORDER BY CASE WHEN Buwheal_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Buwheal_ID_MostRecent,
	FIRST_VALUE(Izzillent_ID) OVER (ORDER BY CASE WHEN Izzillent_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Izzillent_ID_MostRecent
FROM #Test
ORDER BY Row_ID DESC

DROP TABLE IF EXISTS #Test;

CREATE TABLE #Test (
	Row_ID INT IDENTITY NOT NULL,
	Tomash_ID INT, 
	Hoppler_ID INT, 
	Atrovel_ID INT, 
	Mogotrevo_ID INT, 
	Buwheal_ID INT, 
	Izzillent_ID INT
);

INSERT INTO #Test VALUES(NULL,2,3,14,55,16);
INSERT INTO #Test VALUES(21,22,3,4,5,NULL);
INSERT INTO #Test VALUES(12,NULL,3,44,NULL,6);
INSERT INTO #Test VALUES(11,22,NULL,4,NULL,66);
INSERT INTO #Test VALUES(11,2,3,14,NULL,NULL);
INSERT INTO #Test VALUES(1,2,3,NULL,NULL,NULL);

The presence of a FIRST_VALUE function might lead you to believe there is a LAST_VALUE function. And there is! It returns the last value in an ordered set of values, but is otherwise the same in syntax and usage as FIRST_VALUE is. Happy coding, everyone.

FIRST_VALUE

Comments