SQL Server 2008 provides the functionality for applications and users to issue full-text queries against character-based data in SQL Server tables. Before full-text queries can be run on a given table, the database administrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table. These columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, or varbinary(max). Each full-text index indexes one or more columns from the base table, and each column can have a specific language. Beginning in SQL Server 2008, full-text search supports more than 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi. For a complete list of the supported full-text languages, see sys.fulltext_languages (Transact-SQL).
For each supported language, SQL Server provides language-specific linguistic components, including a word breaker and stemmer and an empty thesaurus file. For each full-text language, SQL Server also provides a file in which you can optionally define language-specific synonyms to extend the scope of search queries (a thesaurus file). In addition, beginning in SQL Server 2008, a system stoplist is provided. To support specific language or business scenarios, you can alter the system stoplist by adding and removing stopwords (also known as noise words), and you can create additional stoplists as needed.
For writing full-text queries, SQL Server provides a set of full-text predicates (CONTAINS and FREETEXT) and rowset-valued functions (CONTAINSTABLE and FREETEXTTABLE). Using these, applications and users can perform a variety of types of full-text searches, such as searching on a single word or phrase (and optionally ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word.
Note: Full-text search is an optional component of the SQL Server Database Engine. For more information, see Installing SQL Server 2008.
How to Configure a Database for Full-Text Searching
For any scenario, a database administrator performs the following basic steps to configure table columns in a database for full-text search:
1. Create a full-text catalog.
2. On each table that you want to search, create a full-text index by:
a. Identify each text columns that you want to include in the full-text index.
b. If a given column contains documents stored as binary data (varbinary, varbinary(max), or image data), you must specify a table column (the type column) that identifies the type of each document in the column being indexed.
c. Specify the language that you want full-text search to use on the documents in the column.
d. Choose the change-tracking mechanism that you want to use on the full-text index to track changes in the base table and its columns.
Full-text search supports multiple languages through the use of the following linguistic components: word breakers and stemmers, stoplists that contain stopwords (also known as noise words), and thesaurus files. Thesaurus files and, in some cases, stoplists require configuration by a database administrator. A given thesaurus file supports all full-text indexes that use the corresponding language, and a given stoplist can be associated with as many full-text indexes as you want.
After the columns have been added to a full-text index, applications and users can run full-text queries on the text in the columns. These queries can search for any of the following:
1. One or more specific words or phrases (simple term)
2. A word or a phrase where the words begin with specified text (prefix term)
3. Inflectional forms of a specific word (generation term)
4. A word or phrase close to another word or phrase (proximity term)
5. Synonymous forms of a specific word (thesaurus)
6. Words or phrases using weighted values (weighted term)
Full-text queries all use a small set of Transact-SQL predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). However, the search goals of a given business scenario influence the exact structure of the full-text queries;
for example:
a. e-business—searching for a product on a website:
SELECT product_id FROM products WHERE CONTAINS(product_description, ”Snap Happy 100EZ” OR FORMSOF(THESAURUS,’Snap Happy’) OR ‘100EZ’) AND product_cost<200
b. Recruitment scenario—searching for job candidates that have experience working with SQL Server:
SELECT candidate_name,SSN FROM candidates WHERE CONTAINS(candidate_resume,”SQL Server”) AND candidate_division =DBA
Thursday, August 27, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment