Monday 5 May 2014

Adaptive query optimization - SQL execution enhancement in Oracle Database 12c

Oracle Database 12c has ability to automatically detect when an SQL plan performed sub-optimally, and make corrections for subsequent invocations. This behavior is part of its adaptive query optimization feature set, which includes adaptive plans, adaptive statistics, and sql plan directives,

Here, we discuss one aspect of adaptive statistics, which is known as adaptive re-optimization.

Following the initial execution of a query, Oracle analyzes run-time metrics like actual volume and quality of data present in the tables, and compares it against the assumptions/estimates made at the time of initial plan generation. If any significant differences are found, it will attempt to make corrections by marking the query as re-optimizable.

Re-optimizable means, for the next invocation it will not re-use the original cursor (and plan).
Instead, it will use the execution information collected previously, and if necessary, put the sql through a re-optimization process, generating a different, and a potentially better performing plan.

This process can repeat over multiple times, with Oracle making adjustments to the plan as it learns more about the nature of the data in the database from each execution. This feature is known as adaptive re-optimization. Following such a re-optimization, Oracle Database 12c creates Sql Plan Directives in the data dictionary, which are the results of its learnings about the data from previous executions, and are the potential input to future invocations or future statistics gathering efforts.


We will see a small example of adaptive re-optimization occurring on in 12c, due to a table which has had its optimizer statistics set to incorrect values on purpose.

Scenario
Container db
SALESPDB
Demo table schema/owner
SALES.SALES


Step # 1. First, create a table called sales in the sales schema of the SALESPDB container.

Make sure statistics_level is set to ALL.


[oracle@laboms ~]$ sqlplus sales/sales@localhost/salespdb

SQL*Plus: Release 12.1.0.1.0 Production on Mon May 5 17:58:47 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Last Successful login time: Mon May 05 2014 17:40:56 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics, Real Application Testing
and Unified Auditing options



SQL> alter session set statistics_level='ALL';

Session altered.

SQL> show con_name

CON_NAME
------------------------------
SALESPDB

SQL > create table sales ( time_id date, region_id number(3), amt number(5));
       
Table created.




Step # 2. Populate sales table with some test data. We create about 50000 rows.


SQL> insert into sales
2   select to_date('01012000','ddmmyyyy') + rownum, 1, 100 from dual
3   connect by level<=50000;
              
50000 rows created
SQL > commit;
              
Commit complete.

Step # 3. Set the table statistics per to some incorrect values.
Run the following pl/sql block to set the number of rows to 1 million.


begin
        dbms_stats.set_table_stats(ownname=>'SALES',tabname=>'SALES',
           numrows=>1000000);
end;
/

Step # 6. Issue this query in the SALES schema:
This query lists the table contents for a date range. In this case, about 1946 rows are returned.


select * from sales where region_id in (1,2) and time_id between
to_date('01012014','ddmmyyyy') and to_date('30042019','ddmmyyyy');

TIME_ID    REGION_ID        AMT
--------- ---------- ----------
20-OCT-18          1        100
21-OCT-18          1        100
22-OCT-18          1        100
23-OCT-18          1        100
24-OCT-18          1        100
25-OCT-18          1        100
26-OCT-18          1        100
27-OCT-18          1        100
28-OCT-18          1        100
29-OCT-18          1        100

1946 rows selected.

Step # 7.  Check the plan generated.

You may need to start another command shell window and connect to the database as sysdba to query v$sql. Determine the sql id of the above sql using v$sql view. It was '978mxpdcdrnrm'. In the DBMS_XPLAN call, the 3rd parameter is set to +ALLSTATS which enable reporting of sql plan statistics. Note that it its estimate is inaccurate (projected 25 rows v/s 1946 rows actual).

1* select * from table(dbms_xplan.display_cursor('978mxpdcdrnrm',0,'+ALLSTATS'))
SQL> /
SQL_ID  978mxpdcdrnrm, child number 0
-------------------------------------
select * from sales where region_id in (1,2) and time_id between
to_date('01012014','ddmmyyyy') and to_date('30042019','ddmmyyyy')

Plan hash value: 781590677

-------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |   1946 |00:00:00.02 |     337 |
|*  1 |  TABLE ACCESS FULL| SALES |      1 |     25 |   1946 |00:00:00.02 |     337 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((INTERNAL_FUNCTION("REGION_ID") AND "TIME_ID">=TO_DATE('
              2014-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"<=TO_DATE('
              2019-04-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))


21 rows selected.                             
                                   
During execution of the SQL, the optimizer encountered cardinality mis-match due to the large number of rows hard-coded into the optimizer stats. Due to the statistics feedback mechanism, the optimizer would have also created a number of  sql plan directives against this table. Let's see next, the evidence of plan directive creation

Step # 8. Force a write of of any generated sql plan directives to disk.
This is done so that  it can be queried later from SqlPlus.
Note:
  1. We are doing this as sysdba, user sales does not have required privileges.
  2. We have to make sure to connect to the container which contains the SALES table. Note that some performance metadata are stored in the pluggable database and not in the container.
           

1  begin
2   dbms_spd.flush_sql_plan_directive();
3* end;
SQL> /

PL/SQL procedure successfully completed.

Step # 9. Re-execute the query in the original session and check v$sql. 

           
Checking v$sql after repeat execution, we include the IS_REOPTIMIZABLE column, it indicates which SQL are subject to re-optimization due to adaptive statistics feature.

select sql_id, child_number, is_reoptimizable, sql_text from v$sql where  sql_id='978mxpdcdrnrm'

SQL_ID          CHILD_NB SQL_TEXT                                                   IS_REOPTIMIZABLE
------------- -------- ------------------------------------------------------------ ----------------
978mxpdcdrnrm        0 select * from sales where region_id in (1,2) and time_id bet Y
                       ween to_date('01012014','ddmmyyyy') and to_date('30042019','
                       ddmmyyyy')

978mxpdcdrnrm        1 select * from sales where region_id in (1,2) and time_id bet N
                       ween to_date('01012014','ddmmyyyy') and to_date('30042019','

                       ddmmyyyy')

There are 2 cursors for the sql and one of them is marked as re-optimizable (third column, its shown as ‘Y’). That would be the sql from the original invocation (child_number=0), which hit the incorrect stats condition.

Examining the plan for the new cursor, having child_number=1, (this is no longer re-optimizable, is_reoptimizable='N for this one) we get

1* select * from table(dbms_xplan.display_cursor('978mxpdcdrnrm',1,'+ALLSTATS'))
SQL> /
SQL_ID  978mxpdcdrnrm, child number 1
-------------------------------------
select * from sales where region_id in (1,2) and time_id between
to_date('01012014','ddmmyyyy') and to_date('30042019','ddmmyyyy')

Plan hash value: 781590677

-------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |   1946 |00:00:00.02 |     333 |
|*  1 |  TABLE ACCESS FULL| SALES |      1 |   1946 |   1946 |00:00:00.02 |     333 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((INTERNAL_FUNCTION("REGION_ID") AND "TIME_ID">=TO_DATE('
              2014-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID"<=TO_DATE('
              2019-04-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - statistics feedback used for this statement


25 rows selected.
                                         

The Note section has a remark that statistics feedback was used for this plan. The optimizer detected the bad statistics during first execution, and marked the original sql as re-optimizable. During the second invocation, the optimizer would have done a dynamic sampling of the data instead of relying on the statistics.

Looking at the plan, now, the estimated number of rows returned is now accurate (1946 v/s 25 before).

Further, an SQL plan directive was also created. Let us see what the SQL Plan directive asked the optimizer to do.

Below, we will query the directives created, to see what they instruct the optimizer
Run the following script as sysdba:

column owner format a10
column object_name format a15
column reason format a40 wrapped
column type format a20
set lines 300

SELECT  o.OWNER, o.OBJECT_NAME,
       d.TYPE, d.REASON
FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
AND    o.OWNER IN ('SALES');

Note:
The database connection is made to the pluggable database, not the CDB. Because, in the query, we are referring to the DBA_* views and not the CDB_* views. If connected to the container CDB$ROOT, you’d have to query the CDB_SQL_PLAN_* views.
             

[oracle@laboms ~]$ sqlplus sys/oracle@localhost/salespdb as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Mon May 5 17:41:40 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics, Real Application Testing
and Unified Auditing options

SQL> column owner format a10
column object_name format a15
column reason format a40 wrapped
column type format a20
set lines 300

SELECT  o.OWNER, o.OBJECT_NAME,
       d.TYPE, d.REASON
FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
AND    o.OWNER IN ('SALES')SQL> SQL> SQL> SQL> SQL> SQL>   2    3    4    5  ;

OWNER      OBJECT_NAME     TYPE                 REASON
---------- --------------- -------------------- ----------------------------------------
SALES      SALES           DYNAMIC_SAMPLING     SINGLE TABLE CARDINALITY MISESTIMATE
SALES      SALES           DYNAMIC_SAMPLING     SINGLE TABLE CARDINALITY MISESTIMATE
SALES      SALES           DYNAMIC_SAMPLING     SINGLE TABLE CARDINALITY MISESTIMATE

SQL>

The optimizer has created a number of Sql Plan directives, which instructs that dynamic sampling should be done on the sales table due to cardinality mis-estimate. This causes the optimizer to ignore the inaccurate statistics stored in the dictionary and sample the data on the fly, which as we saw, resulted in a more accurate row count estimate (though not a modified plan). Sql Plans and access methods may have also changed (it didnt in this case), bringing about a noticeable improvement in later executions.

Oracle has attempted to recover from the inaccurate statistics condition in the dictionary, and do a dynamic sampling for 2nd execution. Oracle 12c is thus able to generate adaptive sql plans by statistics feedback.

If you want to off this behavior, set OPTIMIZER_ADAPTIVE_REPORTING_ONLY=TRUE. Setting it to TRUE still lets internal feedback reporting will take place, but Oracle will not generate and use new plans based on the reporting.

No comments:

Post a Comment