Life is the sum of experiences

Spark: A Framework for Large-Scale Data Processing, Part Ⅰ

2020-05-27 · Category Technology

Apache Spark has its architectural foundation in the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API.

The Development of App Architecture

2020-05-26 · Category Technology

The features of large applications: high availability, high concurrency and big data. High availability: system need to provide service without interruption. High concurrency: still stable under the big access. Big data: store and manage big data well.

The Final Key Word In Java

2020-05-26 · Category Technology

Final key word no doubt be mentioned most time in Java language. There are some points about final key word. First of all, final key word can modify three objects: first is modify variables, second is modify way, third is modify class.

ZooKeeper: A Coordination Service for Distributed Systems

2020-05-19 · Category Technology

ZooKeeper is an open source distributed coordination framework. It is positioned to provide consistent services for distributed applications and is the administrator of the entire big data system. ZooKeeper will encapsulate key services that are complex and error-prone, and provide users with efficient, stable, and easy-to-use services.

Kafka: A Stream Processing Platform for Distributed Applications

2020-04-27 · Category Technology

Apache Kafka aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library. Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a “message set” abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This “leads to larger network packets, larger sequential disk operations, contiguous memory blocks […] which allows Kafka to turn a bursty stream of random message writes into linear writes.”

HDFS NFS Gateway: Benefits, Configuration, and Usage

2020-03-26 · Category Technology

The major difference between the two is Replication/Fault Tolerance. HDFS was designed to survive failures. NFS does not have any fault tolerance built in. Other than fault tolerance, HDFS does support multiple replicas of files. This eliminates (or eases) the common bottleneck of many clients accessing a single file. Since files have multiple replicas, on different physical disks, reading performance scales better than NFS.

Scala: A Multi-Paradigm Programming Language for the JVM

2020-02-08 · Category Technology

Scala combines object-oriented and functional programming in one concise, high-level language. Scala’s static types help avoid bugs in complex applications, and its JVM and JavaScript runtimes let you build high-performance systems with easy access to huge ecosystems of libraries.

Static Keyword in Java

2020-01-22 · Category Technology

The static keyword can be used for variables, methods, code blocks, and inner classes to indicate that a particular member belongs only to a class itself, and not to an object of that class.