摘 要: HBase(分布式存储数据库)是大数据存储领域的热点技术,为信息化快速发展带来的存储问题提供了有效的解决方案。针对HBase检索低效以及企业对系统的低耦合、高扩展性需求,通过分析HBase检索困难的原因,设计一个索引中间件。利用Lucene(全文检索引擎工具)技术构建二级索引,以统一接口的形式提供服务。经过实验验证,索引中间件在保证写入需求的情况下,有效地改善了查询性能,在千万级数据量下仍然达到毫秒级检索,并且耦合性低,易于部署,可以快速整合到已有系统中,具有较强的泛用性。 |
关键词: HBase;Lucene;中间件;索引 |
中图分类号: TP311.1
文献标识码: A
|
基金项目: 浙江省重点研发计划项目(2021C01048). |
|
Design and Implementation of a Distributed Storage Index Middleware |
HUANG Jing, BIE Yaoxiang, XIE Xuan
|
(School of Information, Zhejiang Sci-Tech University, Hangzhou 310018, China)
syhj_sy@163.com; 1009214965@qq.com; 15951612952@163.com
|
Abstract: HBase (Distributed Storage Database), a hot technology in the field of big data storage, provides an effective solution to the storage problems brought about by the rapid development of information technology. Aiming at the low efficiency of HBase retrieval and the enterprise's needs for system with low coupling and high scalability, this paper proposes to design an indexing middleware by analyzing reasons for the difficulty of HBase retrieval. Lucene (a full-text search engine tool) technology is used to build a secondary index to provide service in the form of a unified interface. The experimental verification shows that the proposed indexing middleware can effectively improve the query performance while ensuring the writing requirements. It can still reach the millisecond level of retrieval under data volume of tens of millions. Besides, it has low coupling, easy deployment, and can be quickly integrated into the existing system with strong versatility. |
Keywords: HBase; Lucene; middleware; index |