Tuesday 24 May 2016

JSON searches using a domain index

Oracle text enables searches on documents like text, html, etc using indexes.
A special index, called a text index is used, which indexes documents/sections of a document. Typically, there is a table containing a clob column which has the actual data that is being indexed. Applications can be built around that capability to do various kinds of searches on the clobs and retrieve them.

Oracle text allows the user to write an SQL CONTAINS query – this query uses a text search index to look up and return a list of all documents containing occurrences of a user specified text phrase.

Here, we will see an example of a domain index that allows searching for text within JSON-formatted data.

The Oracle JSON_TEXTCONTAINS function uses a JSON-compatible text index to search for text within JSON data, and accepts JSON path expressions.

But creating and using an Oracle text index requires some terms to be described.

Index types:
Different types of indexes are provided by Oracle text to perform searches. Text search use an index type known as CONTEXT. There are other text index types provided by Oracle.

Index section group:
The idea of a section group is that structured documents (XML, HTML) can be divided into zones (section groups) and searches can be done on these zones for text.

As an example, a HTML doc contains tags which identify the header, body, etc which can be considered as section zones. A section group can be considered as a collection of such section zones and searching a section group lets Oracle determine which parts of the document contain the searched text.

There is an Oracle provided HTML_SECTION_GROUP to index and search html docs. For JSON formatted documents, there is now the JSON_SECTION_GROUP which allows indexing and querying of JSON data.

Index synchronization:
Some types of text indexes need to be explicitly updated to keep them in sync with DML that changes the underlying documents that they index. The synchronization is done with the help of the CTX_DDL db package.



JSON_TEXTCONTAINS function
Consider  the table below, named customer_data. It contains customer information, and its custdata column will contain JSON formatted data:

SQL> create table customer_data (recid number(21), customer_id number(12), custdata varchar(200));
Table created
Inserting some data into the table…
SQL> insert into customer_data values (1,1022,2,  '[{Birthdate:"19800925"},{Email:"test1@yahoo.com"},{Mobile:"6583343355"}]' );
1 row inserted
SQL> insert into customer_data values (2,1023,2,
'[{Birthdate:"19850528"},{Email:"test2@gmail.com"},{Mobile:"6456565565"}]');
1 row inserted
SQL> insert into customer_data values (3,1023,2,  '[{Birthdate:"19850528"},{Mobile:"6456565565"}]');
1 row inserted
SQL> commit;
Commit complete

The custdata column has JSON formatted data. But we cannot yet use a JSON_TEXTCONTAINS query on it, because this query needs a text index to work with…
SQL> select * from customer_data where json_textcontains(custdata,'$','test1');
select * from customer_data where json_textcontains(custdata,'$','test1')
ORA-40467: JSON_TEXTCONTAINS() cannot be evaluated without JavaScript Object Notation (JSON) index



Creating the JSON search capable index...


SQL> create index idx_custdata_jsontext on customer_data (custdata)
  2  indextype is CTXSYS.CONTEXT
  3  parameters ('section group ctxsys.json_section_group sync (on commit)');
Index created

In the above CREATE INDEX DDL, the following index parameters are specified.
§       INDEXTYPE IS CTXSYS.CONTEXT – requests a text search index

§       SECTION GROUP CTXSYS.JSON_SECTION_GROUPS - requests creation of JSON section groups to do JSON searches.

§       SYNC (ON COMMIT) - means data will be synchronized to index           immediately on commit.
After this index has been created, the JSON_TEXTCONTAINS function can be used to query the data.

SQL> select * from customer_data where json_textcontains(custdata,'$','test1');
     RECID CUSTOMER_ID CUSTDATA
---------- ----------- --------------------------------------------------
         1        1022 [{Birthdate:"19800925"},{Email:"test1@yahoo.com"},
                           {Mobile:"6583343355"}]

The JSON_TEXTCONTAINS function accepts 3 arguments
1) The column to search
2) A JSON path expression specifying the search criteria (JSON field to look in)
3) The search string.

And below is the optimizer plan showing the DOMAIN INDEX operation.





No comments:

Post a Comment