Not logged in : Login

About: In Hoc Signo Vinces (part 11 of n): TPC-H Q2, Q10 - Late Projection     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : schema:BlogPosting, within Data Space : associated with source document(s)
QRcode icon

has container
Date Created
Date Modified
  • 6309597f5581c5b0b87c8b863490c92e
  • Analytics is generally about making something small out of something large. This reduction is obtained by a TOP k operator (i.e., show only the 10 best by some metric) and/or by grouping and aggregation (i.e., for a set of items, show some attributes of these items and a sum, count, or other aggregate of dependent items for each).

    In this installment we will look at late projection, also sometimes known as late materialization. If many attributes are returned and there is a cutoff of some sort, then the query does not need to be concerned about attributes on which there are no conditions, except for fetching them at the last moment, only for the entities which in fact will be returned to the user.

    We look at TPC-H Q2 and Q10.


      SELECT  TOP 100
        FROM  part,
       WHERE  p_partkey = ps_partkey
         AND  s_suppkey = ps_suppkey
         AND  p_size = 15
         AND  p_type LIKE '%BRASS'
         AND  s_nationkey = n_nationkey
         AND  n_regionkey = r_regionkey
         AND  r_name = 'EUROPE'
         AND  ps_supplycost = 
                ( SELECT  MIN(ps_supplycost)
                    FROM  partsupp,
                   WHERE  p_partkey = ps_partkey
                     AND  s_suppkey = ps_suppkey
                     AND  s_nationkey = n_nationkey
                     AND  n_regionkey = r_regionkey
                     AND  r_name = 'EUROPE'
    ORDER BY  s_acctbal DESC,

    The intent is to return information about parts and suppliers, such that the part is available from a supplier in Europe, and the supplier has the lowest price for the part among all European suppliers.


      SELECT  TOP 20
              SUM(l_extendedprice * (1 - l_discount)) AS revenue,
        FROM  customer,
       WHERE  c_custkey = o_custkey
         AND  l_orderkey = o_orderkey
         AND  o_orderdate >= CAST ('1993-10-01' AS DATE)
         AND  o_orderdate < DATEADD ('month', 3, CAST ('1993-10-01' AS DATE))
         AND  l_returnflag = 'R'
         AND  c_nationkey = n_nationkey
    GROUP BY  c_custkey,
    ORDER BY  revenue DESC

    The intent is to list the customers who cause the greatest loss of revenue in a given quarter by returning items ordered in said quarter.

    We notice that both queries return many columns on which there are no conditions, and that both have a cap on returned rows. The difference is that in Q2 the major ORDER BY is on a grouping column, and in Q10 it is on the aggregate of the GROUP BY. Thus the TOP k trick discussed in the previous article does apply to Q2 but not to Q10.

    The profile for Q2 follows:

    time   6.1e-05% fanout         1 input         1 rows
    time       1.1% fanout         1 input         1 rows
    { hash filler
    Subquery 27 
    time    0.0012% fanout         1 input         1 rows
    REGION         1 rows(t10.R_REGIONKEY)
     R_NAME = <c EUROPE>
    time   0.00045% fanout         5 input         1 rows
    NATION         5 rows(t9.N_NATIONKEY)
    time       1.6% fanout     40107 input         5 rows
    SUPPLIER   4.2e+04 rows(t8.S_SUPPKEY)
    After code:
          0: t8.S_SUPPKEY :=  := artm t8.S_SUPPKEY
          4: BReturn 0
    time       0.1% fanout         0 input    200535 rows
    Sort hf 49 (t8.S_SUPPKEY)
    time    0.0004% fanout         1 input         1 rows
    { fork
    time        21% fanout     79591 input         1 rows
    PART     8e+04 rows(.P_PARTKEY)
     P_TYPE LIKE <c %BRASS> LIKE <c > ,  P_SIZE =  15 
    time        44% fanout  0.591889 input     79591 rows
          0: { 
    time     0.083% fanout         1 input     79591 rows
    time      0.13% fanout         1 input     79591 rows
    { fork
    time        24% fanout  0.801912 input     79591 rows
     inlined  PS_PARTKEY = k_.P_PARTKEY
    hash partition+bloom by 62 (tmp)hash join merged always card       0.2 -> ()
    time       1.3% fanout         0 input     63825 rows
    Hash source 49 merged into ts not partitionable       0.2 rows(.PS_SUPPKEY) -> ()
    After code:
          0:  min min.PS_SUPPLYCOSTset no set_ctr
          5: BReturn 0
    After code:
          0: aggregate :=  := artm min
          4: BReturn 0
    time      0.19% fanout         0 input     79591 rows
    Subquery Select(aggregate)
          8: BReturn 0
    PARTSUPP     5e-08 rows(.PS_SUPPKEY)
     inlined  PS_PARTKEY = k_.P_PARTKEY PS_SUPPLYCOST = k_scalar
    time       5.9% fanout  0.247023 input     47109 rows
    SUPPLIER unq       0.9 rows (.S_ACCTBAL, .S_NATIONKEY, .S_NAME, .S_SUPPKEY)
     inlined  S_SUPPKEY = .PS_SUPPKEY
    top k on S_ACCTBAL
    time     0.077% fanout         1 input     11637 rows
    NATION unq         1 rows (.N_REGIONKEY, .N_NAME)
     inlined  N_NATIONKEY = .S_NATIONKEY
    time     0.051% fanout         1 input     11637 rows
    REGION unq       0.2 rows ()
    time      0.42% fanout         0 input     11637 rows
    time    0.0016% fanout       100 input         1 rows
    top order by read (.S_SUPPKEY, .P_PARTKEY, .N_NAME, .S_NAME, .S_ACCTBAL)
    time      0.02% fanout         1 input       100 rows
    PART unq      0.95 rows (.P_MFGR)
     inlined  P_PARTKEY = .P_PARTKEY
    time     0.054% fanout         1 input       100 rows
    SUPPLIER unq         1 rows (.S_PHONE, .S_ADDRESS, .S_COMMENT)
     inlined  S_SUPPKEY = k_.S_SUPPKEY
    time   6.7e-05% fanout         0 input       100 rows
     128 msec 1007% cpu,    196992 rnd 2.53367e+07 seq   50.4135% same seg   45.3574% same pg 

    The query starts with a scan looking for the qualifying parts. It then looks for the best price for each part from a European supplier. All the European suppliers have been previously put in a hash table by the hash filler subquery at the start of the plan. Thus, to find the minimum price, the query takes the partsupp for the part by index, and then eliminates all non-European suppliers by a selective hash join. After this, there is a second index lookup on partsupp where we look for the part and the price equal to the minimum price found earlier. These operations could in principle be merged, as the minimum price partsupp has already been seen. The gain would not be very large, though.

    Here we note that the cost model guesses that very few rows will survive the check of ps_supplycost = minimum cost. It does not know that the minimum is not just any value, but one of the values that do occur in the ps_supplycost column for the part. Because of this, the remainder of the plan is carried out by index, which is just as well. The point is that if very few rows of input are expected, it is not worthwhile to make a hash table for a hash join. The hash table made for the European suppliers could be reused here, maybe with some small gain. It would however need more columns, which might make it not worthwhile. We note that the major order with the TOP k is on the supplier s_acctbal, hence as soon as there are 100 suppliers found, one can add a restriction on the s_acctbal for subsequent ones.

    At the end of the plan, after the TOP k ORDER BY and the reading of the results, we have a separate index-based lookup for getting only the columns that are returned. We note that this is done on 100 rows whereas the previous operations are done on tens-of-thousands of rows. The TOP k restriction produces some benefit, but it is relatively late in the plan, and not many operations follow it.

    The plan is easily good enough, with only small space for improvement. Q2 is one of the fastest queries of the set.

    Let us now consider the execution of Q10:

    time   1.1e-06% fanout         1 input         1 rows
    time   4.4e-05% fanout         1 input         1 rows
    { hash filler
    time   1.6e-05% fanout        25 input         1 rows
    NATION        25 rows(.N_NATIONKEY, .N_NAME)
    time   6.7e-06% fanout         0 input        25 rows
    Sort hf 35 (.N_NATIONKEY) -> (.N_NAME)
    time   1.5e-06% fanout         1 input         1 rows
    { fork
    time   2.4e-06% fanout         1 input         1 rows
    { fork
    time        13% fanout 5.73038e+06 input         1 rows
    ORDERS   5.1e+06 rows(.O_ORDERKEY, .O_CUSTKEY)
     O_ORDERDATE >= <c 1993-10-01> < <c 1994-01-01>
    time       4.8% fanout   2.00042 input 5.73038e+06 rows
    time        25% fanout         1 input 1.14632e+07 rows
          0: temp := artm  1  - .L_DISCOUNT
          4: temp := artm .L_EXTENDEDPRICE * temp
          8: BReturn 0
    CUSTOMER unq         1 rows (.C_NATIONKEY, .C_CUSTKEY)
     inlined  C_CUSTKEY = k_.O_CUSTKEY
    hash partition+bloom by 39 (tmp)hash join merged always card         1 -> (.N_NAME)
    time    0.0023% fanout         1 input 1.14632e+07 rows
    Hash source 35 merged into ts          1 rows(.C_NATIONKEY) -> (.N_NAME)
    time       2.3% fanout         1 input 1.14632e+07 rows
    Stage 2
    time       3.6% fanout         0 input 1.14632e+07 rows
    Sort (q_.C_CUSTKEY, .N_NAME) -> (temp)
    time       0.6% fanout 3.88422e+06 input         1 rows
    group by read node  
    (.C_CUSTKEY, .N_NAME, revenue)in each partition slice
    time      0.57% fanout         0 input 3.88422e+06 rows
    Sort (revenue) -> (.N_NAME, .C_CUSTKEY)
    time   6.9e-06% fanout        20 input         1 rows
    top order by read (.N_NAME, revenue, .C_CUSTKEY)
    time   0.00036% fanout         1 input        20 rows
    CUSTOMER unq         1 rows (.C_PHONE, .C_NAME, .C_ACCTBAL, .C_ADDRESS, .C_COMMENT)
     inlined  C_CUSTKEY = .C_CUSTKEY
    time   1.1e-06% fanout         0 input        20 rows
     2153 msec 2457% cpu, 1.71845e+07 rnd 1.67177e+08 seq   76.3221% same seg   21.1204% same pg 

    The plan is by index, except for the lookup of nation name for the customer. The most selective condition is on order date, followed by the returnflag on lineitem. Getting the customer by index turns out to be better than by hash, even though almost all customers are hit. See the input cardinality above the first customer entry in the plan -- over 10M. The key point here is that only the c_custkey and c_nationkey get fetched, which saves a lot of time. In fact the c_custkey is needless since this is anyway equal to the o_custkey, but this makes little difference.

    One could argue that customer should be between lineitem and orders in join order. Doing this would lose the ORDER BY on orders and lineitem, but would prevent some customer rows from being hit twice for a single order. The difference would not be large, though. For a scale-out setting, one definitely wants to have orders and lineitem without customer in between if the former are partitioned on the same key.

    The c_nationkey is next translated into a n_name by hash, and there is a partitioned GROUP BY on c_custkey. The GROUP BY is partitioned because there are many different c_custkey values (155M for 100G scale).

    The most important trick is fetching all the many dependent columns of c_custkey only after the TOP k ORDER BY. The last access to customer in the plan does this and is only executed on 20 rows.

    Without the TOP k trick, the plan is identical, except that the dependent columns are fetched for nearly all customers. If this is done, the run time is 16s, which is bad enough to sink the whole score.

    There is another approach to the challenge of this query: If foreign keys are declared and enforced, the system will know that every order has an actually existing customer and that every customer has a country. If so, the whole GROUP BY and TOP k can be done without any reference to customer, which is a notch better still, at least for this query. In this implementation, we do not declare foreign keys, thus the database must check that the customer and its country in fact exist before doing the GROUP BY. This makes the late projection trick mandatory, but does save the expense of checking foreign keys on updates. In both cases, the optimizer must recognize that the columns to be fetched at the end (late projected) are functionally dependent on a grouping key (c_custkey).

    The late projection trick is generally useful, since almost all applications aside from bulk data export have some sort of limit on result set size. A column store especially benefits from this, since some columns of a row can be read without even coming near to other ones. A row store can also benefit from this in the form of decreased intermediate result size. This is especially good when returning long columns, such as text fields or blobs, on which there are most often no search conditions. If there are conditions of such, then these will most often be implemented via a special text index and not a scan.

    *           *           *           *           *

    In the next installment we will have a look at the overall 100G single server performance. After this we will recap the tricks so far. Then it will be time to look at implications of scale out for performance and run at larger scales. After the relational ground has been covered, we can look at implications of schema-lastness, i.e., triples for this type of workload.

    So, while the most salient tricks have been at least briefly mentioned, we are far from having exhausted this most foundational of database topics.

    To be continued...

    In Hoc Signo Vinces (TPC-H) Series

  • In Hoc Signo Vinces (part 11 of n): TPC-H Q2, Q10 - Late Projection
is described using
  • 2015-06-10T16:10:19Z
  • In Hoc Signo Vinces (part 11 of n): TPC-H Q2, Q10 - Late Projection
links to
  • In Hoc Signo Vinces (part 11 of n): TPC-H Q2, Q10 - Late Projection
  • 2014-04-07T16:30:54Z
is made of
is link of
is atom:contains of
is atom:entry of
is container of of
is of
Faceted Search & Find service v1.17_git122 as of Jan 03 2023

Alternative Linked Data Documents: iSPARQL | ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 08.03.3327 as of May 12 2023, on Linux (x86_64-generic-linux-glibc25), Single-Server Edition (30 GB total memory, 26 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2023 OpenLink Software