Last week, I found myself with a T-SQL problem. A customer has a database where multiple "widget" records might exist in a table. End users might enter the duplicate rows, or they might be inserted by other automated processes. My task was to keep the most recent widget record. But, if there were any NULL values, look in the previous "duplicate" rows and use the most recent missing value for each column.
Here's a contrived data example. I needed to take the following result set...
...and reduce it to this:
There was an existing stored procedure that resolved the duplicates. But it had quite a bit of procedural code. There were separate queries that looked up the most recent value for each column, followed by an update to the "good" row. I was looking for a set-based approach. I'd made a few various attempts. Some were close, others were ridiculous, all were unsuccessful. Then I stumbled upon the FIRST_VALUE analytic function for T-SQL.
As its name indicates, FIRST_VALUE returns the first value in an ordered set of values. It requires an OVER clause along with an ORDER BY clause. There are optional PARTITION BY and ROWS/RANGE clauses -- check out the documentation for all the details.
Below is the query I used to get the desired result (along with some sample data so you can try yourself). The CASE expression is used to avoid returning NULL values--it orders them last. (There are IGNORE NULLS | RESPECT NULLS parameters that would seem to work well here, but the documentation says those only apply to Azure SQL Edge.)
SELECT TOP(1) Row_ID, FIRST_VALUE(Tomash_ID) OVER (ORDER BY CASE WHEN Tomash_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Tomash_ID_MostRecent, FIRST_VALUE(Hoppler_ID) OVER (ORDER BY CASE WHEN Hoppler_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Hoppler_ID_MostRecent, FIRST_VALUE(Atrovel_ID) OVER (ORDER BY CASE WHEN Atrovel_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Atrovel_ID_MostRecent, FIRST_VALUE(Mogotrevo_ID) OVER (ORDER BY CASE WHEN Mogotrevo_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Mogotrevo_ID_MostRecent, FIRST_VALUE(Buwheal_ID) OVER (ORDER BY CASE WHEN Buwheal_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Buwheal_ID_MostRecent, FIRST_VALUE(Izzillent_ID) OVER (ORDER BY CASE WHEN Izzillent_ID IS NULL THEN 1 ELSE 0 END, Row_ID DESC) AS Izzillent_ID_MostRecent FROM #Test ORDER BY Row_ID DESC
DROP TABLE IF EXISTS #Test; CREATE TABLE #Test ( Row_ID INT IDENTITY NOT NULL, Tomash_ID INT, Hoppler_ID INT, Atrovel_ID INT, Mogotrevo_ID INT, Buwheal_ID INT, Izzillent_ID INT ); INSERT INTO #Test VALUES(NULL,2,3,14,55,16); INSERT INTO #Test VALUES(21,22,3,4,5,NULL); INSERT INTO #Test VALUES(12,NULL,3,44,NULL,6); INSERT INTO #Test VALUES(11,22,NULL,4,NULL,66); INSERT INTO #Test VALUES(11,2,3,14,NULL,NULL); INSERT INTO #Test VALUES(1,2,3,NULL,NULL,NULL);
The presence of a FIRST_VALUE function might lead you to believe there is a LAST_VALUE function. And there is! It returns the last value in an ordered set of values, but is otherwise the same in syntax and usage as FIRST_VALUE is. Happy coding, everyone.