Structured, Semi-structured and Unstructured data

Big Data includes huge volume, high velocity, and extensible variety of data. These are 3 types: Structured data, Semi-structured data, and Unstructured data.

  1. Structured data is a data whose elements are addressable for effective analysis. It has been organised into a formatted repository that is typically a database. Example: Relational database.
  2. Semi-structured data is information that does not reside in a rational database but that have some organizational properties that make it easier to analyse. With some process, we can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data, JSON.
  3. Unstructured data is a data that is which is not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.

NoSQL (Not Only SQL database)

NoSQL is an approach to database design that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stand for “not only SQL,” is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. NoSQL databases are especially useful for working with large sets of distributed data.

Key-value stores, or key-value databases, implement a simple data model that pairs a unique key with an associated value.

Document databases, also called document stores, store semi-structured data and descriptions of that data in document format. They allow developers to create and update programs without needing to reference master schema. Use of document databases has increased along with use of JavaScript and the JavaScript Object Notation (JSON).

Wide-column stores organize data tables as columns instead of as rows.

Graph data stores organize data as nodes, which are like records in a relational database, and edges, which represent connections between nodes.


Couchbase Server, originally known as Membase, is an open-source, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package that is optimized for interactive applications. Couchbase Server is designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Coubase Inc. describes Couchbase as an Engagement Database, a new category of database that enables enterprises to continually create and reinvent the customer experience. Unlike traditional databases, the Engagement Database taps into dynamic data, at any scale and across any channel or device, to liberate data’s full potential at a time when the strategic use of data to create exceptional customer experiences has become a key competitive differentiator for businesses.

In Engagement Database architecture data is first cached in memory, replicated for availability and then finally written to disk.

Core features of Couchbase

Data: Couchbase Server stores data as items. Each item consists of a key, by which the item is referenced; and an associated value, which must be either binary or a JSON document.

Buckets, Memory, and Storage: Items are stored in named Buckets; being kept only in memory, or both in memory and on disk.

Services: Services can be deployed to support different forms of data-access. Details are given in next section.

Clusters and Availability: A single node running Couchbase Server is considered a cluster of one node. As successive nodes are initialized, each can be configured to join the existing cluster.

Across the nodes of each cluster, Couchbase data is evenly distributed and replicated: nodes can be removed, and node-failure handled, without data-loss. Data can be selected for replication across clusters residing in different data centres, to ensure high availability.


Couchbase Server provides the following services:

  1. Data: Supports the storing, setting, and retrieving of data-items, specified by key.
  2. Query: Parses queries specified in the N1QL query-language, executes the queries, and returns results. The Query Service interacts with both the Data and Index services.
  3. Index: Creates indexes, for use by the Query and Analytics services.
  4. Search: Create indexes specially purposed for Full Text Search. This supports language-aware searching; allowing users to search for, say, the word beauties, and additionally obtain results for beauty and beautiful.
  5. Analytics: Supports join, set, aggregation, and grouping operations; which are expected to be large, long-running, and highly consumptive of memory and CPU resources.
  6. Eventing: Supports near real-time handling of changes to data: code can be executed both in response to document-mutations, and as scheduled by timers.


N1QL (pronounced nickel), is used for manipulating the JSON data in Couchbase, just like SQL manipulates data in RDBMS. It has SELECT, INSERT, UPDATE, DELETE, MERGE statements to operate on JSON data.

The N1QL data model is non-first normal form (N1NF) with support for nested attributes and domain-oriented normalization. The N1QL data model is also a proper superset and generalization of the relational model.

          "email": "",
          "friends": [

Like Query
 SELECT * FROM `bucket` WHERE email LIKE "";

Array Query
 SELECT * FROM `bucket` WHERE ANY x IN friends SATISFIES = "cate" END;  

Programming model

Couchbase provides client libraries for different programming languages such as Java / .NET / PHP / Ruby / C / Python / Node.js

Following is the core API that Couchbase offers. (in an abstract sense)

 # Get a document by key
        doc = get(key)
        # Modify a document, notice the whole document 
        #   need to be passed in
        set(key, doc)
        # Modify a document when no one has modified it 
        #  since my last read
        casVersion = doc.getCas()
        cas(key, casVersion, changedDoc)
        # Create a new document, with an expiration time 
        #   after which the document will be deleted
        addIfNotExist(key, doc, timeToLive)
        # Delete a document
        # When the value is an integer, increment the integer
        # When the value is an integer, decrement the integer
        # When the value is an opaque byte array, append more 
        #  data into existing value 
        append(key, newData)
        # Query the data 
        results = query(viewName, queryParameters)

Couchbase Java SDK

The code snippet below shows how the Java SDK may be used for some common operations:

        public class Example {
            public static void main(String... args) throws Exception {
                // Initialize the Connection
                Cluster cluster = CouchbaseCluster.create("localhost");
                cluster.authenticate("username", "password");
                Bucket bucket = cluster.openBucket("bucketname");
                // Create a JSON Document
                JsonObject arthur = JsonObject.create()
                    .put("name", "Arthur")
                    .put("email", "")
                    .put("interests", JsonArray.from("Holy Grail", "African Swallows"));
                // Store the Document
                bucket.upsert(JsonDocument.create("u:king_arthur", arthur));
                // Load the Document and print it
                // Prints Content and Metadata of the stored Document
                // Create a N1QL Primary Index (but ignore if it exists)
                bucket.bucketManager().createN1qlPrimaryIndex(true, false);
                // Perform a N1QL Query
                N1qlQueryResult result = bucket.query(
                    N1qlQuery.parameterized("SELECT name FROM `bucketname` WHERE $1 IN interests",
                    JsonArray.from("African Swallows"))
                // Print each found Row
                for (N1qlQueryRow row : result) {
                    // Prints {"name":"Arthur"}

Spring Data Couchbase

The Spring Data Couchbase project provides integration with the Couchbase Server database. Key functional areas of Spring Data Couchbase are a POJO centric model for interacting with Couchbase Buckets and easily writing a Repository style data access layer.

1. Data Model

First create an entity class representing the JSON document to persist.

        public class Person {
            private String id;
            private String firstName;
            private String lastName;
            private DateTime created;
            private DateTime updated;
            // standard getters and setters

2. Couchbase Repository

We declare a repository interface for the Person class by extending CrudRepository<String,Person> and adding a derivable query method:

 public interface PersonRepository extends CrudRepository<Person, String> {
            List findByFirstName(String firstName);

3. Service Layer

For our service layer, we define an interface and an implementation using the Spring Data repository abstraction. Here is our PersonService interface:

 public interface PersonService {
            Person findOne(String id);
            List findAll();
            List findByFirstName(String firstName);
            void create(Person person);
            void update(Person person);
            void delete(Person person);

4. Service Implementation
        public class PersonRepositoryService implements PersonService {
            private PersonRepository repo; 
            public Person findOne(String id) {
                return repo.findOne(id);
            public List findAll() {
                List people = new ArrayList();
                Iterator it = repo.findAll().iterator();
                while(it.hasNext()) {
                return people;
            public List findByFirstName(String firstName) {
                return repo.findByFirstName(firstName);
            public void create(Person person) {
            public void update(Person person) {
            public void delete(Person person) {

Share this:

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 


626 120th Ave NE, B102, Bellevue,

WA, 98005.


Chennai One IT SEZ,

Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097

© 2020 CloudIQ Technologies. All rights reserved.

Get in touch

Please contact us using the form below


626 120th Ave NE, B102, Bellevue, WA, 98005.

+1 (206) 203-4151


Chennai One IT SEZ,

Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097