How to get the last not-null value in an ordered column of a huge table? The Next CEO of Stack OverflowGet records not updated in last 30 minuteshow to set default value of column 0 when NULL is inseertedsql server: updating fields on huge table in small chunks: how to get progress/status?How to replace column if not null in select statement?NULL value self comparison in a tableHow to get data from different table having same column?Percentage difference of the last value from the previous values of a column based on certain data group within the same tableDo not add a comma in front of the string if value is null or emptyHOW to work with NULL in a NOT NULL column?How to get last 12 months values when some months have no records in the table

Lucky Feat: How can "more than one creature spend a luck point to influence the outcome of a roll"?

Is there a reasonable and studied concept of reduction between regular languages?

What would be the main consequences for a country leaving the WTO?

Is French Guiana a (hard) EU border?

Getting Stale Gas Out of a Gas Tank w/out Dropping the Tank

Does higher Oxidation/ reduction potential translate to higher energy storage in battery?

Is it ok to trim down a tube patch?

What happened in Rome, when the western empire "fell"?

Players Circumventing the limitations of Wish

Is it correct to say moon starry nights?

Does the Idaho Potato Commission associate potato skins with healthy eating?

Is there a way to save my career from absolute disaster?

I dug holes for my pergola too wide

How to use ReplaceAll on an expression that contains a rule

Purpose of level-shifter with same in and out voltages

How to find image of a complex function with given constraints?

Reshaping json / reparing json inside shell script (remove trailing comma)

what's the use of '% to gdp' type of variables?

How to avoid supervisors with prejudiced views?

Can you teleport closer to a creature you are Frightened of?

What day is it again?

What steps are necessary to read a Modern SSD in Medieval Europe?

Aggressive Under-Indexing and no data for missing index

TikZ: How to fill area with a special pattern?



How to get the last not-null value in an ordered column of a huge table?



The Next CEO of Stack OverflowGet records not updated in last 30 minuteshow to set default value of column 0 when NULL is inseertedsql server: updating fields on huge table in small chunks: how to get progress/status?How to replace column if not null in select statement?NULL value self comparison in a tableHow to get data from different table having same column?Percentage difference of the last value from the previous values of a column based on certain data group within the same tableDo not add a comma in front of the string if value is null or emptyHOW to work with NULL in a NOT NULL column?How to get last 12 months values when some months have no records in the table










4















I have to following input:



 id | value 
----+-------
1 | 136
2 | NULL
3 | 650
4 | NULL
5 | NULL
6 | NULL
7 | 954
8 | NULL
9 | 104
10 | NULL


I expect the following result:



 id | value 
----+-------
1 | 136
2 | 136
3 | 650
4 | 650
5 | 650
6 | 650
7 | 954
8 | 954
9 | 104
10 | 104


The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



WITH tmp AS (
SELECT t2.id, MAX(t1.id) AS lastKnownId
FROM t t1, t t2
WHERE
t1.value IS NOT NULL
AND
t2.id >= t1.id
GROUP BY t2.id
)
SELECT
tmp.id, t.value
FROM t, tmp
WHERE t.id = tmp.lastKnownId;


However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.










share|improve this question




























    4















    I have to following input:



     id | value 
    ----+-------
    1 | 136
    2 | NULL
    3 | 650
    4 | NULL
    5 | NULL
    6 | NULL
    7 | 954
    8 | NULL
    9 | 104
    10 | NULL


    I expect the following result:



     id | value 
    ----+-------
    1 | 136
    2 | 136
    3 | 650
    4 | 650
    5 | 650
    6 | 650
    7 | 954
    8 | 954
    9 | 104
    10 | 104


    The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



    WITH tmp AS (
    SELECT t2.id, MAX(t1.id) AS lastKnownId
    FROM t t1, t t2
    WHERE
    t1.value IS NOT NULL
    AND
    t2.id >= t1.id
    GROUP BY t2.id
    )
    SELECT
    tmp.id, t.value
    FROM t, tmp
    WHERE t.id = tmp.lastKnownId;


    However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



    However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



    Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



    Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



    I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



    The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.










    share|improve this question


























      4












      4








      4








      I have to following input:



       id | value 
      ----+-------
      1 | 136
      2 | NULL
      3 | 650
      4 | NULL
      5 | NULL
      6 | NULL
      7 | 954
      8 | NULL
      9 | 104
      10 | NULL


      I expect the following result:



       id | value 
      ----+-------
      1 | 136
      2 | 136
      3 | 650
      4 | 650
      5 | 650
      6 | 650
      7 | 954
      8 | 954
      9 | 104
      10 | 104


      The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



      WITH tmp AS (
      SELECT t2.id, MAX(t1.id) AS lastKnownId
      FROM t t1, t t2
      WHERE
      t1.value IS NOT NULL
      AND
      t2.id >= t1.id
      GROUP BY t2.id
      )
      SELECT
      tmp.id, t.value
      FROM t, tmp
      WHERE t.id = tmp.lastKnownId;


      However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



      However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



      Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



      Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



      I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



      The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.










      share|improve this question
















      I have to following input:



       id | value 
      ----+-------
      1 | 136
      2 | NULL
      3 | 650
      4 | NULL
      5 | NULL
      6 | NULL
      7 | 954
      8 | NULL
      9 | 104
      10 | NULL


      I expect the following result:



       id | value 
      ----+-------
      1 | 136
      2 | 136
      3 | 650
      4 | 650
      5 | 650
      6 | 650
      7 | 954
      8 | 954
      9 | 104
      10 | 104


      The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



      WITH tmp AS (
      SELECT t2.id, MAX(t1.id) AS lastKnownId
      FROM t t1, t t2
      WHERE
      t1.value IS NOT NULL
      AND
      t2.id >= t1.id
      GROUP BY t2.id
      )
      SELECT
      tmp.id, t.value
      FROM t, tmp
      WHERE t.id = tmp.lastKnownId;


      However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



      However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



      Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



      Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



      I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



      The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.







      t-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 7 hours ago







      peterh

















      asked 7 hours ago









      peterhpeterh

      1,08241431




      1,08241431




















          3 Answers
          3






          active

          oldest

          votes


















          4















          I expected t-sql to optimize it out - on a block/record level, the
          task to do is very easy and linear, essentially a for loop ( O(n) ).




          That's not the query that you wrote. It may not be equivalent to the query that you wrote depending on some otherwise minor detail of the table schema. You're expecting too much from the query optimizer.



          With the right indexing you can get the algorithm that you seek through the following T-SQL:



          SELECT t1.id, ca.[VALUE] 
          FROM dbo.[BIG_TABLE(FOR_U)] t1
          CROSS APPLY (
          SELECT TOP (1) [VALUE]
          FROM dbo.[BIG_TABLE(FOR_U)] t2
          WHERE t2.ID <= t1.ID AND t2.[VALUE] IS NOT NULL
          ORDER BY t2.ID DESC
          ) ca; --ORDER BY t1.ID ASC


          For each row, the query processor traverses the index backwards and stops when it finds a row with a non null value for [VALUE]. On my machine this finishes in about 90 seconds for 100 million rows in the source table. The query runs longer than necessary because some amount of time is wasted on the client discarding all of those rows.



          It's not clear to me if you need ordered results or what you plan on doing with such a large result set. The query can be adjusted to meet the actual scenario. The biggest advantage of this approach is that it does not require a sort in the query plan. That can help for larger result sets. One disadvantage is that performance will not be optimal if there are a lot of NULLs in the table because many rows will be read from the index and discarded. You should be able to improve performance with a filtered index that excludes NULLs for that case.



          Sample data for the test:



          DROP TABLE IF EXISTS #t;

          CREATE TABLE #t (
          ID BIGINT NOT NULL
          );

          INSERT INTO #t WITH (TABLOCK)
          SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
          FROM master..spt_values t1
          CROSS JOIN master..spt_values t2
          OPTION (MAXDOP 1);

          DROP TABLE IF EXISTS dbo.[BIG_TABLE(FOR_U)];

          CREATE TABLE dbo.[BIG_TABLE(FOR_U)] (
          ID BIGINT NOT NULL,
          [VALUE] BIGINT NULL
          );

          INSERT INTO dbo.[BIG_TABLE(FOR_U)] WITH (TABLOCK)
          SELECT 10000 * t1.ID + t2.ID, CASE WHEN (t1.ID + t2.ID) % 3 = 1 THEN t2.ID ELSE NULL END
          FROM #t t1
          CROSS JOIN #t t2;

          CREATE UNIQUE CLUSTERED INDEX ADD_ORDERING ON dbo.[BIG_TABLE(FOR_U)] (ID);





          share|improve this answer






























            3














            One method, by using OVER() and MAX() and COUNT() based on this source could be:



            SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
            FROM
            (
            SELECT ID, value
            ,COUNT(value) OVER (ORDER BY ID) AS Value2
            FROM dbo.HugeTable
            ) a
            ORDER BY ID;


            Result



            Id UpdatedValue
            1 136
            2 136
            3 650
            4 650
            5 650
            6 650
            7 954
            8 954
            9 104
            10 104



            Another method based on this source, closely related to the first example



            ;WITH CTE As 
            (
            SELECT value,
            Id,
            COUNT(value)
            OVER(ORDER BY Id) As Value2
            FROM dbo.HugeTable
            ),

            CTE2 AS (
            SELECT Id,
            value,
            First_Value(value)
            OVER( PARTITION BY Value2
            ORDER BY Id) As UpdatedValue
            FROM CTE
            )
            SELECT Id,UpdatedValue
            FROM CTE2;





            share|improve this answer




















            • 2





              Consider adding details about how these approaches perform with a "huge table".

              – Joe Obbish
              2 hours ago


















            0














            A common solution to this type of problem is given by Itzik Ben-Gan in his article The Last non NULL Puzzle:



            DROP TABLE IF EXISTS dbo.Example;

            CREATE TABLE dbo.Example
            (
            id integer PRIMARY KEY,
            val integer NULL
            );

            INSERT dbo.Example
            (id, val)
            VALUES
            (1, 136),
            (2, NULL),
            (3, 650),
            (4, NULL),
            (5, NULL),
            (6, NULL),
            (7, 954),
            (8, NULL),
            (9, 104),
            (10, NULL);

            SELECT
            E.id,
            E.val,
            lastval =
            CAST(
            SUBSTRING(
            MAX(CAST(E.id AS binary(4)) + CAST(E.val AS binary(4))) OVER (
            ORDER BY E.id
            ROWS UNBOUNDED PRECEDING),
            5, 4)
            AS integer)
            FROM dbo.Example AS E
            ORDER BY
            E.id;


            Demo: db<>fiddle





            share























              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "182"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f233610%2fhow-to-get-the-last-not-null-value-in-an-ordered-column-of-a-huge-table%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              4















              I expected t-sql to optimize it out - on a block/record level, the
              task to do is very easy and linear, essentially a for loop ( O(n) ).




              That's not the query that you wrote. It may not be equivalent to the query that you wrote depending on some otherwise minor detail of the table schema. You're expecting too much from the query optimizer.



              With the right indexing you can get the algorithm that you seek through the following T-SQL:



              SELECT t1.id, ca.[VALUE] 
              FROM dbo.[BIG_TABLE(FOR_U)] t1
              CROSS APPLY (
              SELECT TOP (1) [VALUE]
              FROM dbo.[BIG_TABLE(FOR_U)] t2
              WHERE t2.ID <= t1.ID AND t2.[VALUE] IS NOT NULL
              ORDER BY t2.ID DESC
              ) ca; --ORDER BY t1.ID ASC


              For each row, the query processor traverses the index backwards and stops when it finds a row with a non null value for [VALUE]. On my machine this finishes in about 90 seconds for 100 million rows in the source table. The query runs longer than necessary because some amount of time is wasted on the client discarding all of those rows.



              It's not clear to me if you need ordered results or what you plan on doing with such a large result set. The query can be adjusted to meet the actual scenario. The biggest advantage of this approach is that it does not require a sort in the query plan. That can help for larger result sets. One disadvantage is that performance will not be optimal if there are a lot of NULLs in the table because many rows will be read from the index and discarded. You should be able to improve performance with a filtered index that excludes NULLs for that case.



              Sample data for the test:



              DROP TABLE IF EXISTS #t;

              CREATE TABLE #t (
              ID BIGINT NOT NULL
              );

              INSERT INTO #t WITH (TABLOCK)
              SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
              FROM master..spt_values t1
              CROSS JOIN master..spt_values t2
              OPTION (MAXDOP 1);

              DROP TABLE IF EXISTS dbo.[BIG_TABLE(FOR_U)];

              CREATE TABLE dbo.[BIG_TABLE(FOR_U)] (
              ID BIGINT NOT NULL,
              [VALUE] BIGINT NULL
              );

              INSERT INTO dbo.[BIG_TABLE(FOR_U)] WITH (TABLOCK)
              SELECT 10000 * t1.ID + t2.ID, CASE WHEN (t1.ID + t2.ID) % 3 = 1 THEN t2.ID ELSE NULL END
              FROM #t t1
              CROSS JOIN #t t2;

              CREATE UNIQUE CLUSTERED INDEX ADD_ORDERING ON dbo.[BIG_TABLE(FOR_U)] (ID);





              share|improve this answer



























                4















                I expected t-sql to optimize it out - on a block/record level, the
                task to do is very easy and linear, essentially a for loop ( O(n) ).




                That's not the query that you wrote. It may not be equivalent to the query that you wrote depending on some otherwise minor detail of the table schema. You're expecting too much from the query optimizer.



                With the right indexing you can get the algorithm that you seek through the following T-SQL:



                SELECT t1.id, ca.[VALUE] 
                FROM dbo.[BIG_TABLE(FOR_U)] t1
                CROSS APPLY (
                SELECT TOP (1) [VALUE]
                FROM dbo.[BIG_TABLE(FOR_U)] t2
                WHERE t2.ID <= t1.ID AND t2.[VALUE] IS NOT NULL
                ORDER BY t2.ID DESC
                ) ca; --ORDER BY t1.ID ASC


                For each row, the query processor traverses the index backwards and stops when it finds a row with a non null value for [VALUE]. On my machine this finishes in about 90 seconds for 100 million rows in the source table. The query runs longer than necessary because some amount of time is wasted on the client discarding all of those rows.



                It's not clear to me if you need ordered results or what you plan on doing with such a large result set. The query can be adjusted to meet the actual scenario. The biggest advantage of this approach is that it does not require a sort in the query plan. That can help for larger result sets. One disadvantage is that performance will not be optimal if there are a lot of NULLs in the table because many rows will be read from the index and discarded. You should be able to improve performance with a filtered index that excludes NULLs for that case.



                Sample data for the test:



                DROP TABLE IF EXISTS #t;

                CREATE TABLE #t (
                ID BIGINT NOT NULL
                );

                INSERT INTO #t WITH (TABLOCK)
                SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
                FROM master..spt_values t1
                CROSS JOIN master..spt_values t2
                OPTION (MAXDOP 1);

                DROP TABLE IF EXISTS dbo.[BIG_TABLE(FOR_U)];

                CREATE TABLE dbo.[BIG_TABLE(FOR_U)] (
                ID BIGINT NOT NULL,
                [VALUE] BIGINT NULL
                );

                INSERT INTO dbo.[BIG_TABLE(FOR_U)] WITH (TABLOCK)
                SELECT 10000 * t1.ID + t2.ID, CASE WHEN (t1.ID + t2.ID) % 3 = 1 THEN t2.ID ELSE NULL END
                FROM #t t1
                CROSS JOIN #t t2;

                CREATE UNIQUE CLUSTERED INDEX ADD_ORDERING ON dbo.[BIG_TABLE(FOR_U)] (ID);





                share|improve this answer

























                  4












                  4








                  4








                  I expected t-sql to optimize it out - on a block/record level, the
                  task to do is very easy and linear, essentially a for loop ( O(n) ).




                  That's not the query that you wrote. It may not be equivalent to the query that you wrote depending on some otherwise minor detail of the table schema. You're expecting too much from the query optimizer.



                  With the right indexing you can get the algorithm that you seek through the following T-SQL:



                  SELECT t1.id, ca.[VALUE] 
                  FROM dbo.[BIG_TABLE(FOR_U)] t1
                  CROSS APPLY (
                  SELECT TOP (1) [VALUE]
                  FROM dbo.[BIG_TABLE(FOR_U)] t2
                  WHERE t2.ID <= t1.ID AND t2.[VALUE] IS NOT NULL
                  ORDER BY t2.ID DESC
                  ) ca; --ORDER BY t1.ID ASC


                  For each row, the query processor traverses the index backwards and stops when it finds a row with a non null value for [VALUE]. On my machine this finishes in about 90 seconds for 100 million rows in the source table. The query runs longer than necessary because some amount of time is wasted on the client discarding all of those rows.



                  It's not clear to me if you need ordered results or what you plan on doing with such a large result set. The query can be adjusted to meet the actual scenario. The biggest advantage of this approach is that it does not require a sort in the query plan. That can help for larger result sets. One disadvantage is that performance will not be optimal if there are a lot of NULLs in the table because many rows will be read from the index and discarded. You should be able to improve performance with a filtered index that excludes NULLs for that case.



                  Sample data for the test:



                  DROP TABLE IF EXISTS #t;

                  CREATE TABLE #t (
                  ID BIGINT NOT NULL
                  );

                  INSERT INTO #t WITH (TABLOCK)
                  SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
                  FROM master..spt_values t1
                  CROSS JOIN master..spt_values t2
                  OPTION (MAXDOP 1);

                  DROP TABLE IF EXISTS dbo.[BIG_TABLE(FOR_U)];

                  CREATE TABLE dbo.[BIG_TABLE(FOR_U)] (
                  ID BIGINT NOT NULL,
                  [VALUE] BIGINT NULL
                  );

                  INSERT INTO dbo.[BIG_TABLE(FOR_U)] WITH (TABLOCK)
                  SELECT 10000 * t1.ID + t2.ID, CASE WHEN (t1.ID + t2.ID) % 3 = 1 THEN t2.ID ELSE NULL END
                  FROM #t t1
                  CROSS JOIN #t t2;

                  CREATE UNIQUE CLUSTERED INDEX ADD_ORDERING ON dbo.[BIG_TABLE(FOR_U)] (ID);





                  share|improve this answer














                  I expected t-sql to optimize it out - on a block/record level, the
                  task to do is very easy and linear, essentially a for loop ( O(n) ).




                  That's not the query that you wrote. It may not be equivalent to the query that you wrote depending on some otherwise minor detail of the table schema. You're expecting too much from the query optimizer.



                  With the right indexing you can get the algorithm that you seek through the following T-SQL:



                  SELECT t1.id, ca.[VALUE] 
                  FROM dbo.[BIG_TABLE(FOR_U)] t1
                  CROSS APPLY (
                  SELECT TOP (1) [VALUE]
                  FROM dbo.[BIG_TABLE(FOR_U)] t2
                  WHERE t2.ID <= t1.ID AND t2.[VALUE] IS NOT NULL
                  ORDER BY t2.ID DESC
                  ) ca; --ORDER BY t1.ID ASC


                  For each row, the query processor traverses the index backwards and stops when it finds a row with a non null value for [VALUE]. On my machine this finishes in about 90 seconds for 100 million rows in the source table. The query runs longer than necessary because some amount of time is wasted on the client discarding all of those rows.



                  It's not clear to me if you need ordered results or what you plan on doing with such a large result set. The query can be adjusted to meet the actual scenario. The biggest advantage of this approach is that it does not require a sort in the query plan. That can help for larger result sets. One disadvantage is that performance will not be optimal if there are a lot of NULLs in the table because many rows will be read from the index and discarded. You should be able to improve performance with a filtered index that excludes NULLs for that case.



                  Sample data for the test:



                  DROP TABLE IF EXISTS #t;

                  CREATE TABLE #t (
                  ID BIGINT NOT NULL
                  );

                  INSERT INTO #t WITH (TABLOCK)
                  SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
                  FROM master..spt_values t1
                  CROSS JOIN master..spt_values t2
                  OPTION (MAXDOP 1);

                  DROP TABLE IF EXISTS dbo.[BIG_TABLE(FOR_U)];

                  CREATE TABLE dbo.[BIG_TABLE(FOR_U)] (
                  ID BIGINT NOT NULL,
                  [VALUE] BIGINT NULL
                  );

                  INSERT INTO dbo.[BIG_TABLE(FOR_U)] WITH (TABLOCK)
                  SELECT 10000 * t1.ID + t2.ID, CASE WHEN (t1.ID + t2.ID) % 3 = 1 THEN t2.ID ELSE NULL END
                  FROM #t t1
                  CROSS JOIN #t t2;

                  CREATE UNIQUE CLUSTERED INDEX ADD_ORDERING ON dbo.[BIG_TABLE(FOR_U)] (ID);






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 hours ago









                  Joe ObbishJoe Obbish

                  21.6k43190




                  21.6k43190























                      3














                      One method, by using OVER() and MAX() and COUNT() based on this source could be:



                      SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
                      FROM
                      (
                      SELECT ID, value
                      ,COUNT(value) OVER (ORDER BY ID) AS Value2
                      FROM dbo.HugeTable
                      ) a
                      ORDER BY ID;


                      Result



                      Id UpdatedValue
                      1 136
                      2 136
                      3 650
                      4 650
                      5 650
                      6 650
                      7 954
                      8 954
                      9 104
                      10 104



                      Another method based on this source, closely related to the first example



                      ;WITH CTE As 
                      (
                      SELECT value,
                      Id,
                      COUNT(value)
                      OVER(ORDER BY Id) As Value2
                      FROM dbo.HugeTable
                      ),

                      CTE2 AS (
                      SELECT Id,
                      value,
                      First_Value(value)
                      OVER( PARTITION BY Value2
                      ORDER BY Id) As UpdatedValue
                      FROM CTE
                      )
                      SELECT Id,UpdatedValue
                      FROM CTE2;





                      share|improve this answer




















                      • 2





                        Consider adding details about how these approaches perform with a "huge table".

                        – Joe Obbish
                        2 hours ago















                      3














                      One method, by using OVER() and MAX() and COUNT() based on this source could be:



                      SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
                      FROM
                      (
                      SELECT ID, value
                      ,COUNT(value) OVER (ORDER BY ID) AS Value2
                      FROM dbo.HugeTable
                      ) a
                      ORDER BY ID;


                      Result



                      Id UpdatedValue
                      1 136
                      2 136
                      3 650
                      4 650
                      5 650
                      6 650
                      7 954
                      8 954
                      9 104
                      10 104



                      Another method based on this source, closely related to the first example



                      ;WITH CTE As 
                      (
                      SELECT value,
                      Id,
                      COUNT(value)
                      OVER(ORDER BY Id) As Value2
                      FROM dbo.HugeTable
                      ),

                      CTE2 AS (
                      SELECT Id,
                      value,
                      First_Value(value)
                      OVER( PARTITION BY Value2
                      ORDER BY Id) As UpdatedValue
                      FROM CTE
                      )
                      SELECT Id,UpdatedValue
                      FROM CTE2;





                      share|improve this answer




















                      • 2





                        Consider adding details about how these approaches perform with a "huge table".

                        – Joe Obbish
                        2 hours ago













                      3












                      3








                      3







                      One method, by using OVER() and MAX() and COUNT() based on this source could be:



                      SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
                      FROM
                      (
                      SELECT ID, value
                      ,COUNT(value) OVER (ORDER BY ID) AS Value2
                      FROM dbo.HugeTable
                      ) a
                      ORDER BY ID;


                      Result



                      Id UpdatedValue
                      1 136
                      2 136
                      3 650
                      4 650
                      5 650
                      6 650
                      7 954
                      8 954
                      9 104
                      10 104



                      Another method based on this source, closely related to the first example



                      ;WITH CTE As 
                      (
                      SELECT value,
                      Id,
                      COUNT(value)
                      OVER(ORDER BY Id) As Value2
                      FROM dbo.HugeTable
                      ),

                      CTE2 AS (
                      SELECT Id,
                      value,
                      First_Value(value)
                      OVER( PARTITION BY Value2
                      ORDER BY Id) As UpdatedValue
                      FROM CTE
                      )
                      SELECT Id,UpdatedValue
                      FROM CTE2;





                      share|improve this answer















                      One method, by using OVER() and MAX() and COUNT() based on this source could be:



                      SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
                      FROM
                      (
                      SELECT ID, value
                      ,COUNT(value) OVER (ORDER BY ID) AS Value2
                      FROM dbo.HugeTable
                      ) a
                      ORDER BY ID;


                      Result



                      Id UpdatedValue
                      1 136
                      2 136
                      3 650
                      4 650
                      5 650
                      6 650
                      7 954
                      8 954
                      9 104
                      10 104



                      Another method based on this source, closely related to the first example



                      ;WITH CTE As 
                      (
                      SELECT value,
                      Id,
                      COUNT(value)
                      OVER(ORDER BY Id) As Value2
                      FROM dbo.HugeTable
                      ),

                      CTE2 AS (
                      SELECT Id,
                      value,
                      First_Value(value)
                      OVER( PARTITION BY Value2
                      ORDER BY Id) As UpdatedValue
                      FROM CTE
                      )
                      SELECT Id,UpdatedValue
                      FROM CTE2;






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 5 hours ago

























                      answered 6 hours ago









                      Randi VertongenRandi Vertongen

                      4,131924




                      4,131924







                      • 2





                        Consider adding details about how these approaches perform with a "huge table".

                        – Joe Obbish
                        2 hours ago












                      • 2





                        Consider adding details about how these approaches perform with a "huge table".

                        – Joe Obbish
                        2 hours ago







                      2




                      2





                      Consider adding details about how these approaches perform with a "huge table".

                      – Joe Obbish
                      2 hours ago





                      Consider adding details about how these approaches perform with a "huge table".

                      – Joe Obbish
                      2 hours ago











                      0














                      A common solution to this type of problem is given by Itzik Ben-Gan in his article The Last non NULL Puzzle:



                      DROP TABLE IF EXISTS dbo.Example;

                      CREATE TABLE dbo.Example
                      (
                      id integer PRIMARY KEY,
                      val integer NULL
                      );

                      INSERT dbo.Example
                      (id, val)
                      VALUES
                      (1, 136),
                      (2, NULL),
                      (3, 650),
                      (4, NULL),
                      (5, NULL),
                      (6, NULL),
                      (7, 954),
                      (8, NULL),
                      (9, 104),
                      (10, NULL);

                      SELECT
                      E.id,
                      E.val,
                      lastval =
                      CAST(
                      SUBSTRING(
                      MAX(CAST(E.id AS binary(4)) + CAST(E.val AS binary(4))) OVER (
                      ORDER BY E.id
                      ROWS UNBOUNDED PRECEDING),
                      5, 4)
                      AS integer)
                      FROM dbo.Example AS E
                      ORDER BY
                      E.id;


                      Demo: db<>fiddle





                      share



























                        0














                        A common solution to this type of problem is given by Itzik Ben-Gan in his article The Last non NULL Puzzle:



                        DROP TABLE IF EXISTS dbo.Example;

                        CREATE TABLE dbo.Example
                        (
                        id integer PRIMARY KEY,
                        val integer NULL
                        );

                        INSERT dbo.Example
                        (id, val)
                        VALUES
                        (1, 136),
                        (2, NULL),
                        (3, 650),
                        (4, NULL),
                        (5, NULL),
                        (6, NULL),
                        (7, 954),
                        (8, NULL),
                        (9, 104),
                        (10, NULL);

                        SELECT
                        E.id,
                        E.val,
                        lastval =
                        CAST(
                        SUBSTRING(
                        MAX(CAST(E.id AS binary(4)) + CAST(E.val AS binary(4))) OVER (
                        ORDER BY E.id
                        ROWS UNBOUNDED PRECEDING),
                        5, 4)
                        AS integer)
                        FROM dbo.Example AS E
                        ORDER BY
                        E.id;


                        Demo: db<>fiddle





                        share

























                          0












                          0








                          0







                          A common solution to this type of problem is given by Itzik Ben-Gan in his article The Last non NULL Puzzle:



                          DROP TABLE IF EXISTS dbo.Example;

                          CREATE TABLE dbo.Example
                          (
                          id integer PRIMARY KEY,
                          val integer NULL
                          );

                          INSERT dbo.Example
                          (id, val)
                          VALUES
                          (1, 136),
                          (2, NULL),
                          (3, 650),
                          (4, NULL),
                          (5, NULL),
                          (6, NULL),
                          (7, 954),
                          (8, NULL),
                          (9, 104),
                          (10, NULL);

                          SELECT
                          E.id,
                          E.val,
                          lastval =
                          CAST(
                          SUBSTRING(
                          MAX(CAST(E.id AS binary(4)) + CAST(E.val AS binary(4))) OVER (
                          ORDER BY E.id
                          ROWS UNBOUNDED PRECEDING),
                          5, 4)
                          AS integer)
                          FROM dbo.Example AS E
                          ORDER BY
                          E.id;


                          Demo: db<>fiddle





                          share













                          A common solution to this type of problem is given by Itzik Ben-Gan in his article The Last non NULL Puzzle:



                          DROP TABLE IF EXISTS dbo.Example;

                          CREATE TABLE dbo.Example
                          (
                          id integer PRIMARY KEY,
                          val integer NULL
                          );

                          INSERT dbo.Example
                          (id, val)
                          VALUES
                          (1, 136),
                          (2, NULL),
                          (3, 650),
                          (4, NULL),
                          (5, NULL),
                          (6, NULL),
                          (7, 954),
                          (8, NULL),
                          (9, 104),
                          (10, NULL);

                          SELECT
                          E.id,
                          E.val,
                          lastval =
                          CAST(
                          SUBSTRING(
                          MAX(CAST(E.id AS binary(4)) + CAST(E.val AS binary(4))) OVER (
                          ORDER BY E.id
                          ROWS UNBOUNDED PRECEDING),
                          5, 4)
                          AS integer)
                          FROM dbo.Example AS E
                          ORDER BY
                          E.id;


                          Demo: db<>fiddle






                          share











                          share


                          share










                          answered 4 mins ago









                          Paul WhitePaul White

                          53.9k14287459




                          53.9k14287459



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Database Administrators Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f233610%2fhow-to-get-the-last-not-null-value-in-an-ordered-column-of-a-huge-table%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Möglingen Índice Localización Historia Demografía Referencias Enlaces externos Menú de navegación48°53′18″N 9°07′45″E / 48.888333333333, 9.129166666666748°53′18″N 9°07′45″E / 48.888333333333, 9.1291666666667Sitio web oficial Mapa de Möglingen«Gemeinden in Deutschland nach Fläche, Bevölkerung und Postleitzahl am 30.09.2016»Möglingen

                              Virtualbox - Configuration error: Querying “UUID” failed (VERR_CFGM_VALUE_NOT_FOUND)“VERR_SUPLIB_WORLD_WRITABLE” error when trying to installing OS in virtualboxVirtual Box Kernel errorFailed to open a seesion for the virtual machineFailed to open a session for the virtual machineUbuntu 14.04 LTS Virtualbox errorcan't use VM VirtualBoxusing virtualboxI can't run Linux-64 Bit on VirtualBoxUnable to insert the virtual optical disk (VBoxguestaddition) in virtual machine for ubuntu server in win 10VirtuaBox in Ubuntu 18.04 Issues with Win10.ISO Installation

                              Antonio De Lisio Carrera Referencias Menú de navegación«Caracas: evolución relacional multipleja»«Cuando los gobiernos subestiman a las localidades: L a Iniciativa para la Integración de la Infraestructura Regional Suramericana (IIRSA) en la frontera Colombo-Venezolana»«Maestría en Planificación Integral del Ambiente»«La Metrópoli Caraqueña: Expansión Simplificadora o Articulación Diversificante»«La Metrópoli Caraqueña: Expansión Simplificadora o Articulación Diversificante»«Conózcanos»«Caracas: evolución relacional multipleja»«La Metrópoli Caraqueña: Expansión Simplificadora o Articulación Diversificante»