Spark Performance Tuning

2022-07-19 · Category Technology

Spark SQL is the top active component in spark 3.0 release. Most of the resolved tickets are for Spark SQL. These enhancements benefit all the higher-level libraries, including structured streaming and MLlib, and higher level APIs, including SQL and DataFrames. Various related optimizations are added in latest release.

Elasticsearch Wildcard Search

2022-05-12 · Category Technology

There is a problem about word segmentation when I use Chinese in ES. Elasticsearch is the distributed, restful search and analytics engine. You can use Elasticsearch to store, search, and manage data for Logs，Metrics，A search backend，Application monitoring，Endpoint security.

Tensorflow Identify Simple Captcha

2022-03-31 · Category Technology

CAPTCHA stands for ‘Completely Automated Public Turing test to tell Computers and Humans Apart’. It’s already possible to solve it with the rise of deep learning and computer vision.

Flink Performance Tuning

2021-06-23 · Category Technology

Flink optimization includes resource configuration optimization, back pressure processing, data skew, KafkaSource optimization and FlinkSQL optimization.

Data Warehouse: ClickHouse With Flink

2021-04-13 · Category Technology

There are systems that can store values of different columns separately, but that can’t effectively process analytical queries due to their optimization for other scenarios. Examples are HBase and BigTable. You would get throughput around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.

Data Warehouse: Real-Time, part Ⅱ

2021-03-21 · Category Technology

Data warehouse is a system that pulls together data derived from operational systems and external data sources within an organization for reporting and analysis. A data warehouse is a central repository of information that provides users with current and historical decision support information.

Data Warehouse: Real-Time, part Ⅰ

2021-03-08 · Category Technology

Data warehouse is a system that pulls together data derived from operational systems and external data sources within an organization for reporting and analysis. A data warehouse is a central repository of information that provides users with current and historical decision support information.

Data Warehouse: Offline Tuning

2021-01-08 · Category Technology

Data warehouse is a system that pulls together data derived from operational systems and external data sources within an organization for reporting and analysis. A data warehouse is a central repository of information that provides users with current and historical decision support information.