Structured, Semi-structured and Unstructured data

Big Data includes huge volume, high velocity, and extensible variety of data. These are 3 types: Structured data, Semi-structured data, and Unstructured data.

  1. Structured data is a data whose elements are addressable for effective analysis. It has been organised into a formatted repository that is typically a database. Example: Relational database.
  2. Semi-structured data is information that does not reside in a rational database but that have some organizational properties that make it easier to analyse. With some process, we can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data, JSON.
  3. Unstructured data is a data that is which is not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.

NoSQL (Not Only SQL database)

NoSQL is an approach to database design that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stand for “not only SQL,” is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. NoSQL databases are especially useful for working with large sets of distributed data.

Key-value stores, or key-value databases, implement a simple data model that pairs a unique key with an associated value.

Document databases, also called document stores, store semi-structured data and descriptions of that data in document format. They allow developers to create and update programs without needing to reference master schema. Use of document databases has increased along with use of JavaScript and the JavaScript Object Notation (JSON).

Wide-column stores organize data tables as columns instead of as rows.

Graph data stores organize data as nodes, which are like records in a relational database, and edges, which represent connections between nodes.


Couchbase Server, originally known as Membase, is an open-source, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package that is optimized for interactive applications. Couchbase Server is designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Coubase Inc. describes Couchbase as an Engagement Database, a new category of database that enables enterprises to continually create and reinvent the customer experience. Unlike traditional databases, the Engagement Database taps into dynamic data, at any scale and across any channel or device, to liberate data’s full potential at a time when the strategic use of data to create exceptional customer experiences has become a key competitive differentiator for businesses.

In Engagement Database architecture data is first cached in memory, replicated for availability and then finally written to disk.

Core features of Couchbase

Data: Couchbase Server stores data as items. Each item consists of a key, by which the item is referenced; and an associated value, which must be either binary or a JSON document.

Buckets, Memory, and Storage: Items are stored in named Buckets; being kept only in memory, or both in memory and on disk.

Services: Services can be deployed to support different forms of data-access. Details are given in next section.

Clusters and Availability: A single node running Couchbase Server is considered a cluster of one node. As successive nodes are initialized, each can be configured to join the existing cluster.

Across the nodes of each cluster, Couchbase data is evenly distributed and replicated: nodes can be removed, and node-failure handled, without data-loss. Data can be selected for replication across clusters residing in different data centres, to ensure high availability.


Couchbase Server provides the following services:

  1. Data: Supports the storing, setting, and retrieving of data-items, specified by key.
  2. Query: Parses queries specified in the N1QL query-language, executes the queries, and returns results. The Query Service interacts with both the Data and Index services.
  3. Index: Creates indexes, for use by the Query and Analytics services.
  4. Search: Create indexes specially purposed for Full Text Search. This supports language-aware searching; allowing users to search for, say, the word beauties, and additionally obtain results for beauty and beautiful.
  5. Analytics: Supports join, set, aggregation, and grouping operations; which are expected to be large, long-running, and highly consumptive of memory and CPU resources.
  6. Eventing: Supports near real-time handling of changes to data: code can be executed both in response to document-mutations, and as scheduled by timers.


N1QL (pronounced nickel), is used for manipulating the JSON data in Couchbase, just like SQL manipulates data in RDBMS. It has SELECT, INSERT, UPDATE, DELETE, MERGE statements to operate on JSON data.

The N1QL data model is non-first normal form (N1NF) with support for nested attributes and domain-oriented normalization. The N1QL data model is also a proper superset and generalization of the relational model.


Like Query

Array Query

Programming model

Couchbase provides client libraries for different programming languages such as Java / .NET / PHP / Ruby / C / Python / Node.js

Following is the core API that Couchbase offers. (in an abstract sense)

Couchbase Java SDK

The code snippet below shows how the Java SDK may be used for some common operations:

Spring Data Couchbase

The Spring Data Couchbase project provides integration with the Couchbase Server database. Key functional areas of Spring Data Couchbase are a POJO centric model for interacting with Couchbase Buckets and easily writing a Repository style data access layer.

1. Data Model

First create an entity class representing the JSON document to persist.

2. Couchbase Repository

We declare a repository interface for the Person class by extending CrudRepository<String,Person> and adding a derivable query method:

3. Service Layer

For our service layer, we define an interface and an implementation using the Spring Data repository abstraction. Here is our PersonService interface:

4. Service Implementation