2010年8月30日 星期一

全文檢索在SQL2005 與SQL2000 的不同


全文檢索在SQL2005 與SQL2000 的不同
全文檢索在SQL2005 與SQL2000 的建index 方式不相同,2000 是使用斷字
而2005 是使用斷詞,斷詞的效能會比斷字高,但查詢的結果會比較不準確,下
文為節錄微軟提出的說明:


1. SQL 2005 與 SQL 2000 斷字工具行為不同處
SQL Server 2005 word breaker parses a sentence word by word while SQL
Server 2000 word breaker parses it character by character. For example, “微
軟公司” is parsed into “微軟” and ”公司” by SQL Server 2005 and “微”,” 軟”,”
公”,” 司” by SQL Server 2000. However, the detailed word breaker behavior is
different based on different sentences.

2. 斷詞與斷字的不同處
For Traditional Chinese, the difference between breaking words and
breaking characters is that breaking characters can help us find out all
required result but can also include not directly related data. If we breaking
sentences by words, we can concise the result returned. But due to the
different result of the word breaker, it is possible that some times the result
returned is not 100% accurate.
For example, in SQL Server 2000, to search with key word “微軟公司” using
full text search can also get us result like “公車司機”. But for SQL Server 2005
word breaker, only we’ll only hit the content which contains “微軟” or “公司”.

3. Like 語法與全文檢索的不同處
Compare with LIKE which strictly use string compare functions to search for
keyword from start to end, full text search parses a document with a certain
word breaker and build its specific index beforehand. It then traverses the
index tree to search for keywords. This can be a faster way. But based on its
design, it can only ensure a fuzzy result.

4. 英文版產品與中文版產品的斷字功能不同處
As we know, English words and other western language have their native
advantage, they all use spaces to break between words. So the English
version of word breaker generally looks for spaces between words to break a
sentence into keywords. However, for Chinese, it’s hard to break a sentence
into keywords. For example, Chinese word breaker is still not intelligent
enough to parse “流行性感冒” into keywords.


針對此問題,於SQL2005 環境下可利用變更*.dll 的方式轉換建立索引方式以
取得較正確的查詢,在SQL Server 2005 的環境下依然使用「字」當作index,步
驟如下:

1. 啟用SQL 2005 全文檢索(請參考”SQL2005For SmartIT 6 5 全文檢索設定步驟.pdf”)
2. 停止MSSQLSERVER 服務
3. 將SQL 2000 的中文斷字工具CHTBRKR.DLL(版本號碼為2.0.1.1629)複製到C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn 下,(建議可以將原檔先備份)
4. 將原本有定義全文檢索的table 都將全文檢索刪除
5. 參考” SQL2005For SmartIT 6 5 全文檢索設定步驟.pdf” 步驟(7)~(16)重新定義全文檢索
6. 若要復原,重複執行上述動作並在步驟3.中改以2005 版的CHTBRKR.DLL 去取代檔案即可





0 意見:

張貼留言

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Web Hosting Bluehost