Data Analytics

Launch of StarRocks 2.0

business analytics

StarRocks, a new-generation massively parallel processing (MPP) database service designed for all analytical scenarios, launched the 2.0 version. This new version delivers a myriad of performance improvements in both single-table and multi-table query scenarios. The single-table query performance is twice that of its competitors. The multi-table query performance is five to ten times that of other database systems. StarRocks 2.0 introduces a new model, the primary key model, which enhances real-time update performance by three to ten times. In addition, the memory management scheme is redesigned in 2.0 to accommodate customers’ requirements for higher availability and stability.

Last September, StarRocks opened its source code to global communities and communities have become a key driving force behind the improvement of StarRocks. StarRocks has received more than 2,000 GitHub stars within the first 135 days after the code is open. Hundreds of large and medium-sized enterprises are attracted to use StarRocks.

2X Single-Table Query Performance Compared to Competitors

StarRocks 2.0 is ideal for single-table and multi-table queries. For single-table queries, StarRocks 2.0 innovatively uses global dictionaries to optimize queries on low-cardinality fields, delivering a single-table query performance twice that of its earlier versions and also other leading database service providers. For multi-table queries, StarRocks 2.0 has resigned the cost-based optimizer (CBO) to handle complex multi-table queries, improving multi-table query performance by two times and making StarRocks 2.0 five to ten times faster than other database systems.

In terms of data updates, traditional OLAP systems use the merge-on-read mode to update data, which is not the best solution because it pursues data loading efficiency at the cost of query performance. As real-time data update requirements keep rising in the finance and logistics sectors, this model no longer lives up to expectations. StarRocks 2.0 introduces a novel data model, the primary key model, to update data in delete-and-insert mode. This innovation enhances query performance by three to ten times in real-time update scenarios.

In addition, the memory management scheme is redesigned in StarRocks 2.0 to improve system stability. A pipeline execution engine built for higher concurrency and faster complex queries on multi-core machines has been released for trial use. This engine will be officially released in StarRocks 2.1.

Five Technical Highlights and R&D Directions in 2022

StarRocks announced its five major R&D directions in 2022 to the community.

Resource Management

StarRocks will introduce a new resource management mechanism to provide separate resource groups for different businesses. This mechanism guarantees sufficient resource quotas and isolated resources for businesses. This way, different services can run on the same cluster, which simplifies O&M and improves cluster resource utilization.

Materialized Views with JOINs

Data modeling in a majority of companies requires complex data development from data engineers. Materialized views with JOINs enable data engineers to create various types of materialized views to construct data models. This significantly reduces the workload of data engineers and simplifies data modeling.

StarRocks also introduces intelligent materialized views. This feature intelligently recommends materialized views to users based on query behavior to accelerate queries.

Separation of Storage and Compute

In the earlier versions of StarRocks, compute and storage are tightly coupled for excellent query performance. However, this architecture cannot achieve on-demand resource allocation and may result in unnecessary costs. In 2022, StarRocks will implement a new architecture where storage and compute are decoupled. This new architecture supports offline analytics in parallel with real-time analytics and can be deployed on public, private, and multiple clouds.

Lightning Fast Data Lake Analytics

Currently, StarRocks serves more like a data warehouse. Customers import high-value data from data lakes to StarRocks for ultra-fast data analytics. In 2022, StarRocks will press ahead with its endeavors to enhance data lake analytics capabilities and provide unified and blazing fast analytics experience for customers.

The StarRocks community has completed the first-phase development of data queries on Iceberg, with the collaboration from renowned communities and developers in world’s leading cloud computing companies. Test results show that StarRocks offers a 5X performance improvement compared to Trino. In the future, the StarRocks community will extend its support for Hudi and offer more feature enhancements.

Unified Batch and Stream Processing

StarRocks plans to implement unified stream and batch processing across hundreds of nodes. This way, customers’ raw data can be processed and then analyzed all in StarRocks. This guarantees a one-stop, unified, and blazing fast data processing and analytics experience, bringing the vision of unification to a new level. (https://www.starrocks.com)

Check Out the New Martech Cube Podcast. For more such updates, follow us on Google News Martech News

Previous ArticleNext Article