摘 要: 如今互联网上藏文信息也不断的扩充,藏文搜索引擎作为常用的信息检索的工具和渠道,倒排索引又是搜 索引擎的核心技术之一,倒排索引直接影响搜索引擎检索的结果和响应的速度。之所以文章详细介绍了一个自主开发的 藏文网页倒排索引系统,它以XML文档的标签内容作为索引对象,定义了文档和文档属性等概念,采用C#语言对文藏文 网页正文构建倒排索引的关键技术和实现方法进一步的阐述,实现了基于XML文档的藏文网页倒排索引数据库的底层实 现,提供了技术参考。利用这种方法藏文搜索引擎中信息检索的速度和准确率有所提高。 |
关键词: XML;藏文网页;倒排索引 |
中图分类号: TP274
文献标识码: A
|
基金项目: 青海省科技厅(2016-ZJ-Y04)项目资助. |
|
Research and Implementation of Inverted Index of Tibetan Web Pages Based on XML Documents |
ZHAXI Ladan,ANJIAN Cairang
|
( College of Computer Science, Qinghai Nationalities University, Xining 810007, China)
|
Abstract: As the Tibetan search engine is a commonly used information retrieval tool and channel,and inverted index is one of the core technology of search engines,inverted index directly affects the search results and response speed of the search engine.The paper introduces a self-developed Tibetan web page inverted index system,which uses the tag content of the XML document as the index object,defines the concept of the document and the document attribute,and constructs the inverted index of the text in C # Language.The key technology and the implementation method of the index are further elaborated,and the bottom implementation of the inverted index database based on the XML document is achieved,which provides technical reference for relevant research.Through this method,the efficiency and accuracy of information retrieval in Tibetan search engines have been effectively improved. |
Keywords: XML;Tibetan web pages;inverted index |