Tuesday, 9 December 2014

Ilm in 12c - Temporal Validity

About Temporal validity

Temporal validity is an ILM feature introduced in Oracle 12c and applies to tables having date or timestamp columns. Temporal means related to time, and temporal validity means we attach a time-based validity attribute to the records in a database.

The validity of a data with respect to business rules changes over time. A record of an employee has attributes like start and end dates, which are end point markers for the period in time when the employment was in force.

The idea of temporal validity is that the application should only see data which are valid in a given temporal frame of reference (i.e. valid within a specific interval of time; within a date range interval).

Oracle allows a table designer to attach a temporal validity attributes to a record, based on values on a pair of its columns, which should be a date-time type.
Taking the example of the employee table…

Create table emp (empno number, name varchar(20), start_date date, end_date date);

To this normal table, it is possible to attach a temporal validity attribute, saying that only those records whose start/end dates envelop a specific date are valid (and visible).

Alter table emp      
add period for emp_valid_time (start_date, end_date);

By this alter table command, an invisible column by the name emp_valid_time would be added to the table to track the validity.

This sets up the emp table records, and by default, all records are valid and all records will be returned by a SELECT.  To enforce a specific temporal validity rule, it is necessary to execute a DBMS_FLASHBACK_ARCHIVE procedure called enable_at_valid_time.

This procedure specifies a date or timestamp value which must fall within the start and end date columns, in order that the record be considered temporally valid. Thus, to define a rule that only those records be considered temporally valid; for which the current system date falls between start and end date values, the procedure call would be:

  1  Begin
  2     dbms_flashback_archive.enable_at_valid_time('CURRENT');
  3* end;
SQL> /
PL/SQL procedure successfully completed.

With the rule in force, let’s examine the SQL plan for a SELECT * from emp .

The plan shows a predicate filter being applied transparently to the SELECT due to the temporal validity rule in force during that session.

SQL> select * from emp;

Execution Plan
----------------------------------------------------------
Plan hash value: 3848736563

----------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    43 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |    43 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("T"."START_DATE" IS NULL OR
              SYS_EXTRACT_UTC(INTERNAL_FUNCTION("T"."START_DATE"))<=SYS_EXTRACT_UTC(SY
              STIMESTAMP(6))) AND ("T"."END_DATE" IS NULL OR
              SYS_EXTRACT_UTC(INTERNAL_FUNCTION("T"."END_DATE"))>SYS_EXTRACT_UTC(SYSTI
              MESTAMP(6))))

SQL>



The rule specified by the DBMS_FLASHBACK_ARCHIVE procedure call will apply for the duration of the session only, for queries against all tables subject to temporal validity.

Below is an example with test data…

Creating the table…


SQL> Create table emp (empno number, name varchar(20), start_date date, end_date date);

Table created


Adding a valid-time dimension to the table…

SQL> Alter table emp
  2  add period for test_valid_time (start_date, end_date);

Table altered


Populating with test data… 1 record each for Jan, Feb, and March 2014.

Emp no
Start Date
End Date
1
01 jan 2014
31 jan 2014
2
01 feb 2014
28 feb 2014
3
01 mar 2013
31 mar 2014


SQL> begin
  2    insert into emp values (1, 'emp1',to_date('01jan2014','ddmmyyyy'),
  3                                   to_date('31jan2014','ddmmyyyy'));
  4    insert into emp values (2, 'emp1',to_date('01feb2014','ddmmyyyy'),
  5                                   to_date('28feb2014','ddmmyyyy'));
  6    insert into emp values (3, 'emp1',to_date('01mar2014','ddmmyyyy'),
  7                                   to_date('31mar2014','ddmmyyyy'));
  8    commit;
  9  end;
 10  /

PL/SQL procedure successfully completed

SQL>

Configuring the dbms_flashback_archive setting... for 20 Feb 2014 only .. and doing a select to test.


SQL>
SQL>   begin
  2     dbms_flashback_archive.enable_at_valid_time(
  3           level => 'ASOF',
  4          query_time => to_date('20feb2014','ddmonyyyy')
  5          );
  6    end;
  7  /

PL/SQL procedure successfully completed

SQL> SELECT * FROM EMP;

     EMPNO NAME                 START_DATE  END_DATE
---------- -------------------- ----------- -----------
         2 emp1                 01-Feb-14   28-Feb-14


Only the record with the specified valid time (20th Feb 2014) falling in between the start/end date are returned.

About valid-time flashback queries


It is possible to over ride the DBMS_FLASHBACK_ARCHIVE.enable_at_valid_time setting, by using what’s known as a valid-time flashback query. In this query, there is the VERSIONS clause and instead of specifying an SCN range, a valid time range is specified using this syntax.

SELECT <col-list> FROM TABLE
VERSIONS FOR <valid-time-col> between <start-value> and <end-value>

For example…

SQL> Select *
  2    from emp VERSIONS PERIOD
  3     FOR valid_time between to_date('01012014', 'ddmmyyyy') and
  4     to_date('31072014', 'ddmmyyyy');

     EMPNO NAME                 START_DATE  END_DATE
---------- -------------------- ----------- -----------
         1 emp1                 01-Jan-14   31-Jan-14
         2 emp1                 01-Feb-14   28-Feb-14
         3 emp1                 01-Mar-14   31-Mar-14
  

Configuring the dbms_flashback_archive setting... to the default ALL.. and doing a select to test.

SQL>   begin
  2     dbms_flashback_archive.enable_at_valid_time(level => 'ALL');
  3    end;
  4  /

PL/SQL procedure successfully completed

SQL> select * from emp;

     EMPNO NAME                 START_DATE  END_DATE
---------- -------------------- ----------- -----------
         1 emp1                 01-Jan-14   31-Jan-14
         2 emp1                 01-Feb-14   28-Feb-14
         3 emp1                 01-Mar-14   31-Mar-14


Now all rows are returned, as the temporal validity rule has been reset. 


About the hidden valid-time temporal column



Valid-time temporal as shown above adds a new invisible column to the employee table to track the temporal validity. The column name is emp_valid_time, as specified in the ALTER TABLE DDL.



In addition, Oracle also sets up a check constraint on the table's data. Consider this record on the emp table which has the valid-time temporal rule:

SQL> select * from emp where empno=1;

     EMPNO NAME                 START_DATE  END_DATE
---------- -------------------- ----------- -----------
         1 emp1                 01-Jan-14   31-Jan-14

When I try to update the record's start date with a date later than its end date, the update fails.

SQL> update emp set start_date=sysdate where empno=1;

update emp set start_date=sysdate where empno=1

ORA-02290: check constraint (SCOTT.VALID_TIMEB08640) violated


Checking up the constraint, we get

SQL> select constraint_name, search_condition_vc from user_constraints where table_name='EMP';

CONSTRAINT_NAME         SEARCH_CONDITION_VC
----------------------  -----------------------------
VALID_TIMEB08640        (START_DATE < END_DATE) and
                        (VALID_TIME > 0)

The valid-time temporal ALTER table statement has also resulted in this check constraint being defined automatically on the table.
 

Wednesday, 3 December 2014

Attribute clustering in Oracle Database 12c

Attribute clustering in Oracle database 12c.

This is a new feature in Oracle Database version 12.1.0.2 which allows the dba to lay down an order for the table’s records while writing them to the disk, keeping physically close together on disk (clustering together, co-locating) those records which have the similar values for specified attributes (columns).

By clustering together records which have similar values, it becomes likelier that data matching a specifc sql filter condition (like where dept_id = 10) will be stored on the same block or on adjacent blocks on the disk. With this kind of data placement, it is possible to retrieve the requested data with fewer disk IO operations, than when data is stored in the order that it is inserted. Because the nature of queries expected against the table can be anticipated, it is advantageous to tailor the physical ordering to match them, and that is the point of attribute clustering.

Below is a simple example which creates a table named employees for which we define a simple clustering rule, that records will be ordered first by employee id and then by department id (see bolded text showing the new syntax) when writing to the disk.


  CREATE TABLE EMPLOYEES
   (           EMP_ID NUMBER,    NAME VARCHAR2(20),
               JOIN_DATE DATE,  DEPT_ID NUMBER,
               MGR NUMBER
   )
 CLUSTERING
 BY LINEAR ORDER (EMP_ID, DEPT_ID)
   YES ON LOAD  YES ON DATA MOVEMENT;


In the sections below, we are exploring some attribute clustering basics.

Attribute clustering does not work for conventional dml. It only works for these  operations:
  1. Create table as select
  2. Bulk loads using direct path insert like :
insert /*+ append */ select … from <table.
  1. Data movement operations like
§       Alter table <table> move
§       Online table redefinition

This means, if you define a table with attribute clustering and do a conventional insert, then the record created will not be placed in the expect sort position.

See the plan below for a conventional insert which inserts data from a source table, in which records have been created in a random order for DEPT_ID=1. It’s a conventional insert, though the table is attribute-clustered. Due to use of a normal insert, there won’t be any attribute clustering done for this set of rows.


PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 1772640232

--------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |           |     6 |   150 |     4   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | EMPLOYEES |       |       |            |          |
|*  2 |   TABLE ACCESS FULL      | TRIAL     |     6 |   150 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------

   2 - filter("DEPT_ID"=1)

14 rows selected.


Source table data (note that emp_id is in random order) :
SQL> select * from scott.trial where dept_id=1;

    EMP_ID NAME                 JOIN_DATE    DEPT_ID        MGR
---------- -------------------- --------- ---------- ----------
         1 emptest              01-JAN-14          1          1
         5 emptest              01-JAN-14          1          1
         2 emptest              01-JAN-14          1          1
         4 emptest              01-JAN-14          1          1
         3 emptest              01-JAN-14          1          1
         7 emptest              01-JAN-14          1          1

6 rows selected.


And below are the results of the insert…note that the data are still stored in the same random order in scott.employees as in the trial table.

SQL> insert into
  2   scott.employees
  3     select * from scott.trial where dept_id=1;

6 rows created.

SQL> commit;

Commit complete.

SQL> select * from scott.employees;

    EMP_ID NAME                 JOIN_DATE    DEPT_ID        MGR
---------- -------------------- --------- ---------- ----------
         1 emptest              01-JAN-14          1          1
         5 emptest              01-JAN-14          1          1
         2 emptest              01-JAN-14          1          1
         4 emptest              01-JAN-14          1          1
         3 emptest              01-JAN-14          1          1
         7 emptest              01-JAN-14          1          1

6 rows selected.

SQL>




Whereas if you do a direct path insert, a SORT operation will have been inserted into the plan...See below a direct path insert on the same table, for dept_id = 2

SQL>  explain plan for
  2    insert /*+ append */ into scott.employees
  3     select * from scott.trial
  4       where dept_id=2;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 1954480923

----------------------------------------------------------------------------------------------
| Id  | Operation                        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |           |     6 |   150 |     5  (20)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | EMPLOYEES |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |           |     6 |   150 |     5  (20)| 00:00:01 |
|   3 |    SORT ORDER BY                 |           |     6 |   150 |     5  (20)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL            | TRIAL     |     6 |   150 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("DEPT_ID"=2)

16 rows selected.

And the records in the employees table (dept_id=2) will be sorted… See below the result of a simple select (no order by specified, but the data is still sorted).


SQL>  insert /*+ append */ into scott.employees
  2    select * from scott.trial
  3     where dept_id = 2;

6 rows created.

SQL> commit;

Commit complete.

SQL> select * from scott.employees where dept_id=2;

    EMP_ID NAME                 JOIN_DATE    DEPT_ID        MGR
---------- -------------------- --------- ---------- ----------
         8 emptest              01-JAN-14          2          3
         9 emptest              01-JAN-14          2          3
        10 emptest              01-JAN-14          2          3
        11 emptest              01-JAN-14          2          3
        12 emptest              01-JAN-14          2          3
        14 emptest              01-JAN-14          2          3

6 rows selected.



Now, the table scott.employees has accumulated both unsorted data (dept_id = 1)
and sorted data (dept_id = 2).

Recalling the DDL of the table…

  CREATE TABLE EMPLOYEES
   (           EMP_ID NUMBER,    NAME VARCHAR2(20),
               JOIN_DATE DATE,  DEPT_ID NUMBER,
               MGR NUMBER
   )
 CLUSTERING
 BY LINEAR ORDER (EMP_ID, DEPT_ID)
   YES ON LOAD  YES ON DATA MOVEMENT;



Because it was defined with YES ON DATA MOVEMENT, an ALTER TABLE move will force the sort. (as will a table redefinition).

We can see below, that the table after an ALTER..MOVE operation gets sorted by employee id and department id


SQL> alter table scott.employees move;

Table altered.

SQL> select * from scott.employees;

    EMP_ID NAME                 JOIN_DATE    DEPT_ID        MGR
---------- -------------------- --------- ---------- ----------
         1 emptest              01-JAN-14          1          1
         2 emptest              01-JAN-14          1          1
         3 emptest              01-JAN-14          1          1
         4 emptest              01-JAN-14          1          1
         5 emptest              01-JAN-14          1          1
         7 emptest              01-JAN-14          1          1
         8 emptest              01-JAN-14          2          3
         9 emptest              01-JAN-14          2          3
        10 emptest              01-JAN-14          2          3
        11 emptest              01-JAN-14          2          3
        12 emptest              01-JAN-14          2          3

    EMP_ID NAME                 JOIN_DATE    DEPT_ID        MGR
---------- -------------------- --------- ---------- ----------
        14 emptest              01-JAN-14          2          3

12 rows selected.

SQL>


Further illustrating that attribute clustering works only on the specific set of being inserted, we will do a trial direct insert of a few records with dept_id = 3, generating 3 records (with a select … union all, to avoid sorting) and generate data in a random order.
insert  /*+ append */ into scott.employees
 select
17, 'emptest',sysdate, 3, 4 from dual
 union all
 select
20, 'emptest',sysdate, 3, 4 from dual
 union all
 select
13, 'emptest',sysdate, 3, 4 from dual

It’s a direct path insert, therefore, was an implicit sort was eventually done to enforce clustering.

SQL> explain plan for
  2  insert /*+ append */ into scott.employees
 select 17, 'emptest',sysdate, 3, 4 from dual
 union all
 select 20, 'emptest',sysdate, 3, 4 from dual
 union all
 select 13, 'emptest',sysdate, 3, 4 from dual  3    4    5    6    7  ;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
Plan hash value: 2013921891

--------------------------------------------------------------------------------------
| Id  | Operation                        | Name      | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |           |     3 |     7  (15)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | EMPLOYEES |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |           |     3 |     6   (0)| 00:00:01 |
|   3 |    SORT ORDER BY                 |           |     3 |     6   (0)| 00:00:01 |
|   4 |     UNION-ALL                    |           |       |            |          |
|   5 |      FAST DUAL                   |           |     1 |     2   (0)| 00:00:01 |
|   6 |      FAST DUAL                   |           |     1 |     2   (0)| 00:00:01 |
|   7 |      FAST DUAL                   |           |     1 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

14 rows selected.

SQL>


The sort is not seen for the same insert, without the /*+ APPEND */ hint.

SQL> explain plan for
  2   insert  into scott.employees
 select 17, 'emptest',sysdate, 3, 4 from dual
 union all
 select 20, 'emptest',sysdate, 3, 4 from dual
 union all
 select 13, 'emptest',sysdate, 3, 4 from dual  3    4    5    6    7  ;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 2623540522

------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |           |     3 |     6   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | EMPLOYEES |       |            |          |
|   2 |   UNION-ALL              |           |       |            |          |
|   3 |    FAST DUAL             |           |     1 |     2   (0)| 00:00:01 |
|   4 |    FAST DUAL             |           |     1 |     2   (0)| 00:00:01 |
|   5 |    FAST DUAL             |           |     1 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------

12 rows selected.

SQL>



What we saw above was the case of a heap-table attribute clustering using a simple LINEAR ORDERing BY 2 columns (EMP_ID, DEPT_ID).  This kind of ordering will benefit queries which  always prefix EMP_ID at least to the filter clause as in:

SELECT…Where EMP_ID = 10  and DEPT_ID = 123;

But not a query which does not  specify the EMP_ID column first, as in:

SELECT… where DEPT_ID=123;


A variation of attribute clustering is clustering by interleaved ordering.
In the case of interleaved clustering the data is ordered by a z-order function calculated on the user-specified columns.

  CREATE TABLE EMPLOYEES
   (           EMP_ID NUMBER,    NAME VARCHAR2(20),
               JOIN_DATE DATE,  DEPT_ID NUMBER,
               MGR NUMBER
   )
 CLUSTERING
 BY INTERLEAVED ORDER (EMP_ID, DEPT_ID)
   YES ON LOAD  YES ON DATA MOVEMENT;

During loads, a z-order function operates on the values of emp_id and dept_id of record and the physical data placement as per the calculated z-order.

Due to z-ordering the ordering is not simply by employee id and within that, by dept_id, each of the two column’s data values will have a weight and affect the record’s position in the sort.

Consider a source table having these records.



Going from 1st record to 5th the department id assigned is random.
We can see the result of an interleaved ordered clustering by loading this source data into our modified employee table which is now clustered by interleaved ordering of (emp_id, dept_id). Loading the data into the interleaved-ordered table…

SQL> insert /*+ append */ into scott.employees
 select * from scott.trial2;
   2
5 rows created.






Rank # 1 goes to record with emp_id = 2 and dept_id =1.
Rank # 2 goes to record with emp_id = 3, dept_id = 2.
Rank # 3 goes to record with emp_id=5, dept_id = 1.

The z-ordering effect is seen here; its considering both columns during the sort.

Adding, a record with emp_id=dept_id=1 and a second one with emp_id=dept_id=5 to the source table and re-loading the resulting set, we get this:



Again, it looks like the z-ordering is assign a weight to data points from each of the columns specified, and computing a sort position for the record by considering both weights. Therefore, where a record is located is depends on the relative weights of its data point columns.

It seems to achieve a wider spread of different data points across the storage space, as compared to linear ordered clustering as applied to the same set of input data.

As per Oracle 12c documentation, Data Warehousing And Business Intelligence Guide, Chapter 12, this kind of clustering is favorable to queries with predicate on emp_id and/or dept_id.

Conclusion:

In this document we explored the attribute clustering feature’s basics, and seen how it works. Due to its very nature, attribute clustering is suited to data warehousing type of applications where there is little conventional DML and the tables are subject to frequent bulk data loading.