Tuesday, 24 May 2016

JSON searches using a domain index

Oracle text enables searches on documents like text, html, etc using indexes.
A special index, called a text index is used, which indexes documents/sections of a document. Typically, there is a table containing a clob column which has the actual data that is being indexed. Applications can be built around that capability to do various kinds of searches on the clobs and retrieve them.

Oracle text allows the user to write an SQL CONTAINS query – this query uses a text search index to look up and return a list of all documents containing occurrences of a user specified text phrase.

Here, we will see an example of a domain index that allows searching for text within JSON-formatted data.

The Oracle JSON_TEXTCONTAINS function uses a JSON-compatible text index to search for text within JSON data, and accepts JSON path expressions.

But creating and using an Oracle text index requires some terms to be described.

Index types:
Different types of indexes are provided by Oracle text to perform searches. Text search use an index type known as CONTEXT. There are other text index types provided by Oracle.

Index section group:
The idea of a section group is that structured documents (XML, HTML) can be divided into zones (section groups) and searches can be done on these zones for text.

As an example, a HTML doc contains tags which identify the header, body, etc which can be considered as section zones. A section group can be considered as a collection of such section zones and searching a section group lets Oracle determine which parts of the document contain the searched text.

There is an Oracle provided HTML_SECTION_GROUP to index and search html docs. For JSON formatted documents, there is now the JSON_SECTION_GROUP which allows indexing and querying of JSON data.

Index synchronization:
Some types of text indexes need to be explicitly updated to keep them in sync with DML that changes the underlying documents that they index. The synchronization is done with the help of the CTX_DDL db package.



JSON_TEXTCONTAINS function
Consider  the table below, named customer_data. It contains customer information, and its custdata column will contain JSON formatted data:

SQL> create table customer_data (recid number(21), customer_id number(12), custdata varchar(200));
Table created
Inserting some data into the table…
SQL> insert into customer_data values (1,1022,2,  '[{Birthdate:"19800925"},{Email:"test1@yahoo.com"},{Mobile:"6583343355"}]' );
1 row inserted
SQL> insert into customer_data values (2,1023,2,
'[{Birthdate:"19850528"},{Email:"test2@gmail.com"},{Mobile:"6456565565"}]');
1 row inserted
SQL> insert into customer_data values (3,1023,2,  '[{Birthdate:"19850528"},{Mobile:"6456565565"}]');
1 row inserted
SQL> commit;
Commit complete

The custdata column has JSON formatted data. But we cannot yet use a JSON_TEXTCONTAINS query on it, because this query needs a text index to work with…
SQL> select * from customer_data where json_textcontains(custdata,'$','test1');
select * from customer_data where json_textcontains(custdata,'$','test1')
ORA-40467: JSON_TEXTCONTAINS() cannot be evaluated without JavaScript Object Notation (JSON) index



Creating the JSON search capable index...


SQL> create index idx_custdata_jsontext on customer_data (custdata)
  2  indextype is CTXSYS.CONTEXT
  3  parameters ('section group ctxsys.json_section_group sync (on commit)');
Index created

In the above CREATE INDEX DDL, the following index parameters are specified.
§       INDEXTYPE IS CTXSYS.CONTEXT – requests a text search index

§       SECTION GROUP CTXSYS.JSON_SECTION_GROUPS - requests creation of JSON section groups to do JSON searches.

§       SYNC (ON COMMIT) - means data will be synchronized to index           immediately on commit.
After this index has been created, the JSON_TEXTCONTAINS function can be used to query the data.

SQL> select * from customer_data where json_textcontains(custdata,'$','test1');
     RECID CUSTOMER_ID CUSTDATA
---------- ----------- --------------------------------------------------
         1        1022 [{Birthdate:"19800925"},{Email:"test1@yahoo.com"},
                           {Mobile:"6583343355"}]

The JSON_TEXTCONTAINS function accepts 3 arguments
1) The column to search
2) A JSON path expression specifying the search criteria (JSON field to look in)
3) The search string.

And below is the optimizer plan showing the DOMAIN INDEX operation.





Thursday, 19 May 2016

Oracle JSON functions

In this doc we will see some JSON functions supported by Oracle 12c, like JSON_EXISTS, IS JSON [STRICT/LAX], AND JSON_TABLE. We will also see how to retrieve JSON object arrays using the json_query.

JSON_EXISTS:
This function searches the input document (for example, the contents of a clob field) for a user specified property name and returns a boolean (true/false) if that property exists.

Consider this transaction table having this structure:

SQL> CREATE TABLE txn
  2     (txnid number(12),
  3      txndate date,
  4      trxdata CLOB
  5      CONSTRAINT ensure_json CHECK (trxdata IS JSON));
Table created


Inserting some test data into it…


SQL> insert into txn values (1, to_date('01012016','ddmmyyyy'),
  2  '{
  3      txndetail_id :"1000",
  4      itemid     : "10",
  5      qty        : "1",
  6      value      : "5"
  7   }');
1 row inserted
SQL> insert into txn values (2, to_date('01012016','ddmmyyyy'),
  2  '{
  3      txndetail_id :"1001",
  4      itemid     : "10",
  5      qty        : "3",
  6      value      : "1",
  7      ref        : "34867778"
  8   }');
1 row inserted
SQL> commit;
Commit complete

Querying the table using JSON_EXISTS on trxdata:


SQL> select txnid, txndate, json_value(trxdata,'$.txndetail_id') txndetail_id, trxdata
  2  from txn where json_exists( trxdata,'$.ref');
 TXNID TXNDATE     TXNDETAIL_ID    TRXDATA
------ ----------- --------------- -------------------------------------
     2 01-Jan-16   1001            {
                                       txndetail_id :"1001",
                                       itemid     : "10",
                                       qty        : "3",
                                       value      : "1",
                                       ref        : "34867778"
                                    }  

Note:
The full json content (trxdata) as well as the json_value-extracted fields are seen.

The function json_exists (trxdata,’$.ref) searches for the ‘ref attribute within the trxdata clob content, and returns those records who have that attribute.

The function json_value(trxdata,’$.txndetail_id’) extracts the value stored against the txndetail_id attribute.


IS JSON
The IS JSON function checks the input data to see whether it is well-formatted JSON data. It returns a true or false accordingly. There are a number of syntax rules that describe how JSON data must be represented, like.
  1. No duplicate field names
  2. Each field and named enclosed in double-quotes.
  3. No trailing comma after last field.

By default, Oracle’s JSON syntax checking is lax, therefore, it allows such data representations not supported in the strict JSON definition. Below is an example of JSON syntax checking in Oracle:

Using the IS JSON constraint on a column:
A table’s column can have an IS JSON constraint imposed upon it. All dml against that column will be validated as per Oracle’s JSON syntax rules, and bad data will be rejected.


CREATE TABLE merchant
   (merchant_id number(8),
   merchant_name varchar(200),
    contact_persons CLOB
    CONSTRAINT ensure_contact_json CHECK (contact_persons IS JSON));


Below is an example where a record is created bad JSON data and is rejected.

SQL> insert into merchant values (4, 'M00000004', 'bad json data');
insert into merchant values (4, 'M00000004', 'bad json data')
*
ERROR at line 1:
ORA-02290: check constraint (SYS.ENSURE_CONTACT_JSON) violated

Below is a record that does get inserted - note that it has duplicate Address field and the field names are not double-quote delimited. It is JSON but with lax syntax. We will use this record in an later example.


SQL> insert into merchant values (3, 'M0000003',
  2  '{ name : "Lax",
  3     Address:{"street"  : "Street 13",
  4                  "city"  : "Singapore",
  5                "zipcode" : "54345"
  6               },
  7      Address:{"street"  : "test",
  8                  "city"  : "Singapore",
  9                "zipcode" : "3435345"
 10               },
 11    Phone : [{"type" : "Office", "number" : "33435335"},
 12                {"type" : "Mobile", "number" : "1343435"}]}');

1 row created.

SQL> COMMIT;

Commit complete.




Using IS JSON in an SQL statement
Its possible to apply the new IS JSON expression on the above table to select rows conditionally, as shown


SQL> select * from merchant where contact_persons is not json strict;
 MERCHANT_ID MERCHANT_NAME   CONTACT_PERSONS
------------ --------------- ------------------------------------------------------------
           3 M0000003        { name : "Lax",
                                Address:{"street"  : "Street 13",
                                             "city"  : "Singapore",
                                           "zipcode" : "730713"
                                          },
                                 Address:{"street"  : "test",
                                             "city"  : "Singapore",
                                           "zipcode" : "730688"
                                          },
                               Phone : [{"type" : "Office", "number" : "33435335"},
                                           {"type" : "Mobile", "number" : "1343435"}]}


Note the syntax above – It specifies IS NOT JSON STRICT, requesting to return rows which comply to only the lax JSON syntax, and it returns the record inserted with duplicate Address field.

Let’s create a record complying with the strict JSON standard as below:


SQL> insert into merchant values (1, 'M0000001',
  2  '{ "name" : "Alex",
  3     "Address":{"street"  : "Street 32",
  4                  "city"  : "Singapore",
  5                "zipcode" : "999999"
  6               },
  7    "Phone" : [{"type":"Office","number":"33435335"},
  8                {"type":"Mobile","number":"1343435"}]}');
1 row inserted

SQL> commit;

Commit complete.

Using IS JSON STRICT/LAX check on the table.

SQL> select * from merchant where contact_persons is json strict;

MERCHANT_ID MERCHANT_NAME   CONTACT_PERSONS
----------- --------------- --------------------------------------------------
          1 M0000001        { "name" : "Alex",
                               "Address":{"street"  : "Street 32",
                                            "city"  : "Buona Vista",
                                          "zipcode" : "123000"
                                         },
                              "Phone" : [{"type":"Office","number":"33435335"}
                            ,
                                          {"type":"Mobile","number":"1343435"}
                            ]}


All strictly JSON compliant data is of course lax compliant also. So if we query for IS JSON LAX, we get both STRICT and LAX-compliant records ( lax json compliant record is in bold below)

SQL> select * from merchant where contact_persons is json lax;
 MERCHANT_ID MERCHANT_NAME   CONTACT_PERSONS
------------ --------------- ------------------------------------------------------------
           3 M0000003        { name : "Lax",
                                Address:{"street"  : "Street 13",
                                             "city"  : "Singapore",
                                           "zipcode" : "730713"
                                          },
                                 Address:{"street"  : "test",
                                             "city"  : "Singapore",
                                           "zipcode" : "730688"
                                          },
                               Phone : [{"type" : "Office", "number" : "33435335"},
                                           {"type" : "Mobile", "number" : "1343435"}]}
           1 M0000001        { "name" : "Alex",
                                "Address":{"street"  : "Street 32",
                                             "city"  : "Singapore",
                                           "zipcode" : "999999"
                                          },
                               "Phone" : [{"type":"Office","number":"33435335"},
                                           {"type":"Mobile","number":"1343435"}]}

SQL>


JSON_table function
The JSON table function can be used to project json field values as a table.
It is possible to do such a projection using the JSON_VALUE function (which we'll see here first), but involving multiple JSON_VALUE function calls, one per projected column:

SQL> select txnid, txndate,
  2      json_value(trxdata,'$.txndetail_id') txndetail_id,
  3      json_value(trxdata,'$.itemid') itemid,
  4      json_value(trxdata,'$.qty') qty,
  5      json_value(trxdata,'$.value') value
  6      from txn;
 TXNID TXNDATE     TXNDETAIL_ID    ITEMID     QTY   VALUE
------ ----------- --------------- ---------- ----- -----
     1 01-Jan-16   1000            10         1     5
     2 01-Jan-16   1001            10         3     1



The same can be accomplished using the JSON_TABLE function. This function requires, however, a number of inputs. Below is  a sample invocation : (the row sources txn and jt are joined implicitly)


SQL> select jt.txndetail_id, jt.itemid, jt.quantity, jt.has_ref, jt.ref from txn,
  2      json_table( trxdata, '$'
  3        columns ( txndetail_id number(12) PATH '$.txndetail_id',
  4                  itemid number(5) PATH '$.itemid',
  5                  quantity number(10,4) PATH '$.qty',
  6                  value  number(10,2) PATH '$.value',
  7                  has_ref varchar2(5 char) exists path '$.ref',
  8                  ref varchar(20 char) path '$.ref'
  9       )) jt;
   TXNDETAIL_ID     ITEMID   QUANTITY HAS_REF REF
--------------- ---------- ---------- ------- --------------------
           1000         10          1 false  
           1001         10          3 true    34867778


Below are the inputs required by the JSON_TABLE function:
  1. Name of the input clob column (trxdata).
  2. JSON path expression indicating start of parse ($ in this case)
  3. A COLUMNS clause containing the list of columns to project.
  4. Columns projected are specified by a column name, display format, and a JSON path.

( txndetail_id number(12) PATH '$.txndetail_id',


With reference to the fragment above
Column name is txndetail_id
Format is number(12)
JSON path is specified as PATH ‘$.txndetail_id’

Note:

The REF  field is not present in all of the JSON-formatted records. Its not present for record # 1.

   TXNDETAIL_ID     ITEMID   QUANTITY HAS_REF REF
--------------- ---------- ---------- ------- --------------------
           1000         10          1 false  
           1001         10          3 true    34867778



The existence of the JSON field AMOUNT can be probed for and tested using the EXISTS PATH <EXPRESSION> syntax (see below the fragment from the JSON_TABLE function):

      has_ref varchar2(5 char) exists path '$.ref',

            It returns a true/false.

Below is the explain plan for the SQL, note the JSONTABLE EVALUATION operation that appears for the JSON_TABLE call. The cardinality numbers, however, are totally off, despite gathering statistics.

SQL>  select jt.txndetail_id, jt.itemid, jt.quantity, jt.has_ref, jt.ref from txn,
        json_table( trxdata, '$'
          columns ( txndetail_id number(12) PATH '$.txndetail_id',
                    itemid number(5) PATH '$.itemid',
                    quantity number(10,4) PATH '$.qty',
                    value  number(10,2) PATH '$.value',
                    has_ref varchar2(5 char) exists path '$.ref',
                    ref varchar(20 char) path '$.ref'
         )) jt;
  2    3    4    5    6    7    8    9 
Execution Plan
----------------------------------------------------------
Plan hash value: 1752208875

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      | 16336 |  3573K|    58   (0)| 00:00:01 |
|   1 |  NESTED LOOPS          |      | 16336 |  3573K|    58   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL    | TXN  |     2 |   428 |     2   (0)| 00:00:01 |
|   3 |   JSONTABLE EVALUATION |      |       |       |            |          |
-------------------------------------------------------------------------------




JSON_QUERY
This function is similar to JSON_VALUE, it extracts value portion of a field named in the JSON path expression form, but it can extract an array of values whereas JSON_VALUE can only extract scalar values.

Consider this simple table containing 2 records. It stores an owner id and a collection of associated phone numbers. Each phone number has a type (mobile/home) and a number. The number of phone entries for a record can be single or multiple.

create table phones ( owner_id number(2), phones varchar(200));

insert into test values
 (1, '{
   Phone:{  "type":"mobile",
           "number":"123"}
     }');


insert into test values
 (2, '{
   Phone:[{  "type":"mobile","number":"456"},{  "type":"home","number":"789"}]
     }');


In the above sample, the first record has just a single number of type ‘mobile’.
The second record has two entries, one each for the types mobile and home.
It is specified as an array by wrapping it with a pair of [] brackets.

For data in this form JSON_VALUE cannot be used for extraction of the fields. See the result:

SQL> select owner_id, json_value(contact_nos,'$.Phone[*].number' ) phones from phones;
OWNER_ID PHONES
---------- --------------------
1 123
2 <null>



Using JSON_QUERY, we get:

SQL> select owner_id, json_query(contact_nos,'$.Phone[*].number' with conditional wrapper) phones from phones;

  OWNER_ID PHONES
---------- --------------------
         1 ["123"]
         2 ["456","789"]

Note:
  1. The first record has only a single entry. Its data is returned as an array. Note the [].
  2. The second array has 2 elements and is returned as an array, like it should be.

  1. We queried with a JSON path expression ‘$.Phone[*].number’. The field[*].property expression means that we are expecting the field to be multi-valued and we want all values. That expression also works for the single-valued entry for record # 1 transparently.

  1. We can also interrogate the phone field, as if it contained a single-valued scalar field.

SQL> select owner_id, json_query(contact_nos,'$.Phone.number' with conditional wrapper) phones from phones;

  OWNER_ID PHONES
---------- --------------------
         1 ["123"]
         2 ["456","789"]


In the above query we have tried to interrogate the phone field as  a scalar value. The expression ‘$Phone.number’ is for a single-valued field.  But we can see that this expression works implicitly for the multi-valued content in the second record also. This behaviour is described in Oracle docs as implicit wrapping/unwrapping of data.