|Oracle8 ConText Cartridge Application Developer's Guide
This chapter explains how to use the ConText linguistics to generate linguistic output for English text. It also provides some tips and suggestions for building linguistically-enhanced text applications.
The topics covered in this chapter are:
When a ConText server with the Linguistic personality is started, ConText automatically loads a default setting configuration (GENERIC) from the database. The default setting configuration is active during your database session unless you explicitly specify a label for a different setting configuration with the CTX_LING.SET_SETTINGS_LABEL function.
You can specify one of the two predefined setting configurations (GENERIC or SA) provided with ConText or a custom setting configuration that you create using the administration tool.
To specify a setting configuration for a session, use the CTX_LING.SET_SETTINGS_LABEL procedure with a setting label. For example, to process all-uppercase or all-lowercase text for your current session:
When you specify a setting configuration label, ConText checks the label against the setting configuration that is currently active. If the specified setting configuration is not already active, ConText loads the new settings from the database before any documents are processed by ConText servers with the Linguistic personality.
The specified setting configuration is active for your session until SET_SETTINGS_LABEL is called with a new setting configuration label.
You can use the CTX_LING.GET_SETTINGS_LABEL function to return the label for the active setting configuration for the current session.
Before theme and Gist information can be used in an application, you must perform the following tasks:
For ConText to generate linguistic output, at least one server must be running with the Linguistic (L) personality. For more information about ConText Servers, see Oracle8 ConText Cartridge Administrator's Guide.
To create a theme table called CTX_THEMES, issue the following SQL statement:
To create a Gist table called CTX_GIST, issue the following SQL statement:
create table ctx_gist ( cid number, pk varchar2(64), pov varchar2(80), gist long);
Because the combination of the CID (column ID) and PK (primary key) columns in the output tables uniquely identify each document in a text column, you can use the output tables to store theme and Gist information for multiple text columns. You can also choose to create multiple output tables to store the theme and Gist information separately for each text column.
To create a theme table whose textkey has two columns, issue the following SQL statement:
create table ctx_themes ( cid number, pk1 varchar2(64), pk2 varchar2(64), theme varchar2(2000), weight number);
To create a Gist table whose textkey has two columns, issue the following SQL statement:
create table ctx_gist ( cid number, pk1 varchar2(64), pk2 varchar2(64), pov varchar2(80), gist long);
To generate linguistic output for the documents in a text column, you first call CTX_LING.REQUEST_GISTor CTX_LING.REQUEST_THEMES for each document in the column, then call CTX_LING.SUBMIT to enter these requests in the services queue as a single transaction for that particular document.
The following example shows how you could use the procedures and functions in CTX_LING package to generate linguistic output:
declare handle number; begin ctx_ling.request_themes( 'CTXSYS.DOC_POLICY', '7039', 'CTXSYS.CTX_THEMES'); ctx_ling.request_gist( 'CTXSYS.DOC_POLICY', '7039', 'CTXSYS.CTX_GIST'); handle := ctx_ling.submit; end;
The first two calls request themes and Gist output for document 7039 in the text column for the DOC_POLICY policy. These procedures store the themes and Gists in the linguistic output tables (ctx_themes and ctx_gists), which were created in the previous step.
The final API call submits the requests as one batch request to the services queue and returns a handle which can be used to monitor the status of the request. Because the two requests are submitted as one batch request, ConText parses the document only once while still generating both theme and Gist output.
By default, ConText generates single theme names when you request a list of themes with CTX_LING.REQUEST_THEMES. To generate the hierarchical theme information with theme names, you must set the full themes flag to TRUE with CTX_LING.SET_FULL_THEMES.
For example, the following SQL statements generate and output single theme information for a document identified by pk:
SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes') SQL> exec ctx_ling.submit(200) SQL> select theme from ctx_themes; THEME ------------------------------------------------------------------------------- NASDAQ - National Association of Securities Dealers Automated Quotation System stocks indexes weakness composites prices franchises shares cellularity declining issues measures analysts OTC purchases Wall Street lows 16 rows selected.
However, when you set the full themes flag to TRUE, ConText generates theme hierarchical information:
SQL> exec ctx_ling.set_full_themes(TRUE) SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes') SQL> exec ctx_ling.submit(200) SQL> select theme from ctx_themes THEME ------------------------------------------------------------------------------- :stock market:NASDAQ - National Association of Securities Dealers Automated Quotation System: :stock market:stocks: :catalogs, itemization:indexes: :weakness, fatigue:weakness: :combination, mixture:composites: :retail trade industry:prices: :business fundamentals:franchises: :possession, ownership:shares: :cellularity: :stock market:declining issues: :analysis, evaluation:measures: :analysis, evaluation:analysts: :OTC: :general commerce:purchases: :general investment:Wall Street: :bottoms, undersides:lows:
Generating theme hierarchical information as such helps to match themes with the theme summaries generated with CTX_LING.REQUEST_GIST.
When you submit a request to the services queue with CTX_LING.SUBMIT, a handle is returned. With this handle, you can use procedures in the CTX_SVC package to perform the following tasks:
To monitor the status of requests in the Services Queue, use the CTX_SVC.REQUEST_STATUS function. This function returns one of the following statuses:
The request has not yet been picked up by a ConText server.
The request is being processed by a ConText server.
The request errored.
The request completed successfully.
For example, the following PL/SQL procedure submits a request to generate themes and gist for a document with an id of 49. It then checks the status of the request.
CREATE OR REPLACE PROCEDURE GENERATE_THEMES AS v_Handle number; v_Status varchar2(10); v_Time date; v_Errors varchar2(60); BEGIN DBMS_OUTPUT.PUT_LINE('Begin generate_themes procedure' ); ctx_ling.request_themes('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_themes' ); ctx_ling.request_gist('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_gist' ); v_Handle := ctx_ling.submit; DBMS_OUTPUT.PUT_LINE( v_Handle ); v_Status := ctx_svc.request_status( v_Handle, v_Time, v_ErrorS ); DBMS_OUTPUT.PUT_LINE( v_Status ); DBMS_OUTPUT.PUT_LINE( v_Time ); DBMS_OUTPUT.PUT_LINE( substr( v_Errors, 1, 20 ) ); EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE(' Exception handling' ); END GENERATE_THEMES; /
This procedure binds the return value of REQUEST_STATUS to v_Status for the linguistic request identified by v_Handle. The value for v_Handle is returned by the call to CTX_LING.SUBMIT which placed the requests for the themes and gists in the Services Queue.
To remove requests with a status of PENDING from the Services Queue, use the CTX_SVC.CANCEL procedure.
In this example, a pending request with handle 3321 is removed from the Services Queue.
If a request has a status of RUNNING, ERROR, or SUCCESS, it cannot be removed from the Services Queue.
To remove requests with a status of ERROR from the Services Queue, use the CTX_SVC.CLEAR_ERROR procedure.
In this example, a request with handle 3321 is removed from the Services Queue.
If a value of 0 (zero) is specified for the handle, all requests with a status of ERROR are removed from the queue. If a request has a status of PENDING, RUNNING, or SUCCESS, it cannot be removed from the queue using CLEAR_ERROR.
To specify a procedure to be called when a linguistic request completes or errors, use the SET_COMPLETION_CALLBACK and SET_ERROR_CALLBACK procedures in CTX_LING. ConText invokes the procedure defined by SET_COMPLETION_CALLBACK after it processes a linguistic request; ConText invokes the procedure defined by SET_ERROR_CALLBACK when it encounters and error.
The following is an example of how to define and use a completion callback procedure. This example is taken from genling.sql in the ctxling demonstration provided with the ConText distribution package.
For every linguistic request processed, ling_comp_callback keeps track of the number articles processed by decrementing num_docs, previously defined as the number of articles in the table. The procedure also keeps track of the any errors by incrementing num_errors.
create or replace procedure LING_COMP_CALLBACK (
p_handle in number,
p_status in varchar2,
p_errors in varchar2
l_total number; l_pk varchar2(64); BEGIN -- decrement the count in the tracking table update ling_tracking set num_docs = num_docs - 1; -- if the request errored, mark the errors in the pending table IF (p_status = 'ERROR') then update ling_tracking set num_errros = num_errors + 1; end IF; commit; END; /
The following code is an anonymous PL/SQL block that sets the linguistic completion callback procedure to ling_comp_callback and then generates linguistic output for every document in the articles table:
declare cursor c1 is select article_id from articles; l_handle number; begin -- set the completion callback procdure to keep the pending table -- in sync with the number of documents processed (completed requests) -- and the number of errored requests.
end; -- loop through all articles in the article table, requesting themes -- and gists -- for crec in c1 loop ctx_ling.request_themes('DEMO_POLICY', crec.article_id, 'ARTICLE_THEMES'); ctx_ling.request_gist('DEMO_POLICY', crec.article_id, 'ARTICLE_GISTS'); l_handle := ctx_ling.submit; end loop; end;
At start-up of a ConText server, the logging of linguistic parse information is disabled by default.
To enable logging of the parse information generated by ConText linguistics during a session, use the CTX_LING.SET_LOG_PARSE procedure.
Once you enable parse logging for a session, it is active until you explicitly disable it during the session. You can use the CTX_LING.GET_LOG_PARSE function to know whether parse logging is enabled or disabled for the session.
Parse logging is a useful feature if you are having difficulty generating linguistic output and you want to monitor how ConText is parsing your documents; however, parse logging may affect performance considerably. As such, you should only enable parse logging if you encounter problems with generating linguistic output.
Theme queries allow you to search a set of documents for a given theme. The result is a hitlist containing the IDs of the documents that match the theme.
Generating list of themes is a good way of extending theme queries. For a document in a theme query hitlist, the user can learn more about the document by reading a list-of themes or Gist.
For example, suppose a theme query on music returns a hitlist containing 20 documents. If these documents are lengthy, the user might not want to read every single document to find out what each is about. Rather than return to the user the document text, you can return a list of themes or a Gist for each document, which the user could skim.
Generally, you can generate linguistic output for a document set at two different times:
You can generate linguistic output (creating the list of themes in this case) at indexing time, that is, before the queries are issued against the document set. When you do so, the linguistic output is returned to the user immediately, since the output was already created.
However, while the retrieval time for the linguistic output is good, the drawback to this method is that you have to maintain a permanent theme output table, using your own triggers to keep it updated. A permanent theme table for an entire document set also takes up system disk space.
You could also generate a list of themes after executing a query. The advantage of generating themes as needed is that the output table lasts only for the user session; you need not maintain a permanent theme table for all your documents.
However, generating list of themes on the fly takes time depending on the number of documents, the length of the documents, and how your linguistic servers are configured. A user might not want to wait a few minutes to process a large number of documents.
The example below shows how to generate linguistic output after a theme query.
The following PL/SQL code illustrates how to generate a list of themes for every document in a hitlist table. (You can use the same method to loop through any text table, once the text column table has a policy attached to it.)
This routine first declares a cursor that selects the rows from the ctx_temp result table, to be populated with a theme query on birds.
The cursor FOR loop opens the cursor, executing the select statement that copies all textkeys in the ctx_temp table to the cursor. The loop index ctx_cur_rec is implicitly defined as a cursor record of type%ROWTYPE.
Every iteration of the loop calls the CTX_LING.REQUEST_THEMES procedure with the document textkey derived from ctx_cur_rec. Each request is submitted to the services queue with CTX_LING.SUBMIT, which returns a handle.
The theme output is written to the ctx_themes table.