Oracle text enables searches on documents like text, html, etc
using indexes.
A special index, called a text index is used, which indexes
documents/sections of a document. Typically, there is a table containing a clob
column which has the actual data that is being indexed. Applications can be built around that capability to do various kinds of searches on the clobs and retrieve them.
Oracle text allows the user to write an SQL CONTAINS query – this
query uses a text search index to look up and return a list of all documents containing
occurrences of a user specified text phrase.
Here, we will see an example of a domain index that allows
searching for text within JSON-formatted data.
The Oracle JSON_TEXTCONTAINS function uses a JSON-compatible text index
to search for text within JSON data, and accepts JSON path expressions.
But creating and using an Oracle text index requires some terms to
be described.
Index types:
Different types of indexes are provided by Oracle text to perform
searches. Text search use an index type known as CONTEXT. There are other text
index types provided by Oracle.
Index section group:
The idea of a section group is that structured documents (XML,
HTML) can be divided into zones (section groups) and searches can be done on
these zones for text.
As an example, a HTML doc contains tags which identify the header,
body, etc which can be considered as section zones. A section group can be
considered as a collection of such section zones and searching a section group
lets Oracle determine which parts of the document contain the searched text.
There is an Oracle provided HTML_SECTION_GROUP to index and search
html docs. For JSON formatted documents, there is now the JSON_SECTION_GROUP
which allows indexing and querying of JSON data.
Index synchronization:
Some types of text indexes need to be explicitly updated to keep them
in sync with DML that changes the underlying documents that they index. The
synchronization is done with the help of the CTX_DDL db package.
JSON_TEXTCONTAINS function
Consider the table below,
named customer_data. It contains customer information, and its custdata column
will contain JSON formatted data:
SQL>
create table customer_data (recid number(21), customer_id number(12), custdata
varchar(200));
Table
created
Inserting some data into the
table…
SQL>
insert into customer_data values (1,1022,2,
'[{Birthdate:"19800925"},{Email:"test1@yahoo.com"},{Mobile:"6583343355"}]'
);
1
row inserted
SQL>
insert into customer_data values (2,1023,2,
'[{Birthdate:"19850528"},{Email:"test2@gmail.com"},{Mobile:"6456565565"}]');
'[{Birthdate:"19850528"},{Email:"test2@gmail.com"},{Mobile:"6456565565"}]');
1
row inserted
SQL>
insert into customer_data values (3,1023,2,
'[{Birthdate:"19850528"},{Mobile:"6456565565"}]');
1
row inserted
SQL>
commit;
Commit
complete
The custdata column has JSON formatted data. But we cannot yet use a JSON_TEXTCONTAINS query on it, because this query needs a text index to work with…
SQL>
select * from customer_data where json_textcontains(custdata,'$','test1');
select
* from customer_data where json_textcontains(custdata,'$','test1')
ORA-40467:
JSON_TEXTCONTAINS() cannot be evaluated without JavaScript Object Notation
(JSON) index
Creating the JSON search capable index...
SQL> create index idx_custdata_jsontext on customer_data (custdata)
2 indextype is CTXSYS.CONTEXT
3 parameters ('section group ctxsys.json_section_group sync (on commit)');
Index createdIn the above CREATE INDEX DDL, the following index parameters are specified.
§
INDEXTYPE
IS CTXSYS.CONTEXT – requests a text
search index
§
SECTION
GROUP CTXSYS.JSON_SECTION_GROUPS - requests
creation of JSON section groups to do JSON searches.
§
SYNC
(ON COMMIT) - means data will be
synchronized to index immediately
on commit.
After this index has been
created, the JSON_TEXTCONTAINS function can be used to query the data.
SQL>
select * from customer_data where json_textcontains(custdata,'$','test1');
RECID CUSTOMER_ID CUSTDATA
---------- ----------- --------------------------------------------------
1 1022 [{Birthdate:"19800925"},{Email:"test1@yahoo.com"},
{Mobile:"6583343355"}]
---------- ----------- --------------------------------------------------
1 1022 [{Birthdate:"19800925"},{Email:"test1@yahoo.com"},
{Mobile:"6583343355"}]
The JSON_TEXTCONTAINS function accepts 3 arguments
1) The column to search
2) A JSON path expression specifying the search criteria (JSON field to look in)
3) The search string.
And below is the optimizer plan showing the DOMAIN INDEX operation.
No comments:
Post a Comment