+01 (414) 230 - 5550
AWS
Analytics

Analytics is the discovery, interpretation, and communication of meaningful patterns in data; and the process of applying those patterns towards effective decision making. In other words, analytics can be understood as the connective tissue between data and effective decision making, within an organization. Organizations may apply analytics to business data to describe, predict, and improve business performance.

 

Big data analytics is the complex process of examining large and varied data sets — or big data — to uncover information including hidden patterns, unknown correlations, market trends and customer preferences that can help organizations make informed business decisions.

 

Glue, Athena and QuickSight are 3 services under the Analytics Group of services offered by AWS. Glue is used for ETL, Athena for interactive queries and Quicksight for Business Intelligence (BI).

 

Glue

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. We can create and run an ETL job with a few clicks in the AWS Management Console. We simply point AWS Glue to our data stored on AWS, and AWS Glue discovers our data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, our data is immediately searchable, queryable, and available for ETL.

 

In this blog we will look at 2 components of Glue – Crawlers and Jobs

 

Glue Crawlers

Glue crawlers can scan data in all kinds of repositories, classify it, extract schema information from it, and store the metadata automatically in the AWS Glue Data Catalog. From there it can be used to guide ETL operations.

 

Suppose we have a file named people.json in S3 with the below contents:

 

Below are the steps to crawl this data and create a table in AWS Glue to store this data:

  1. On the AWS Glue Console, click “Crawlers” and then “Add Crawler”
  2. Give a name for your crawler and click next
  3. Select S3 as data source and under “Include path” give the location of json file on S3.
  4. Since we are going to crawl data from only 1 dataset, select No in next screen and click Next
  5. In next screen select an IAM role which has access to the S3 data store
  6. Select Frequency as “Run on demand” in next screen.
  7. Select a Database to store the crawler’s output. I chose a database named “saravanan” in the screen below. If no database exists, Add a database using the link given
  8. Review all details in next step and click Finish
  9. On next screen, click on “Run it now” to run the crawler
  10. The crawler runs for around a minute and finally you will be able to see status as Stopping / Ready with Tables added count as 1.
  11. Now you can go to Tables link and see that a table named “people_json” has been created under “Saravanan” database.
  12. Using the “View details” Action, and then scrolling down, you can see the schema for the table which Glue has automatically inferred and generated.

 

Glue jobs

The AWS Glue Jobs system provides managed infrastructure to orchestrate our ETL workflow. We can create jobs in AWS Glue that automate the scripts we use to extract, transform, and transfer data to different locations. Jobs can be scheduled and chained, or they can be triggered by events such as the arrival of new data.

 

To add a new job using the console

  1. Open the AWS Glue console, and choose the Jobs tab.
  2. Choose Add job and follow the instructions in the Add job wizard. Below screens copy data from the table we created earlier to a parquet file named people-parquet in same S3 bucket.




    After the above job runs and completes, you will be able to verify in S3 that the output Parquet has been created.

 

DynamicFrame

Glue Jobs use a data structure named DynamicFrame. A DynamicFrame is similar to a Spark DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.

 

Instead of just using the python job which Glue generates, we can code our own jobs using DynamicFrames and have

it run through Glue. The same Glue job on next page selects specific fields from 2 Glue tables, renames some of the fields, joins the tables and writes the joined table to S3 in parquet format.

 

Athena

Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and we pay only for the queries that we run.

 

Athena is easy to use. We must simply point to our data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare our data for analysis. This makes it easy for anyone with SQL skills to quickly analyse large-scale datasets.

 

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing us to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.

 

Since Athena uses same Data Catalog as Glue, we will be able to query and view properties of the people_json table which we created earlier using Glue.


Also, we can create new table using data from S3 bucket data as shown below:



Unlike Glue, we have to explicitly give the data format (CSV, JSON, etc) and specify the column names and types while creating the table in Athena.



We can also manually create and query the tables using SQL as shown below:


 

QuickSight

Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy for us to deliver insights to everyone in our organization.

 

QuickSight lets us create and publish interactive dashboards that can be accessed from browsers or mobile devices. We can embed dashboards into our applications, providing our customers with powerful self-service analytics.

 

QuickSight easily scales to tens of thousands of users without any software to install, servers to deploy, or infrastructure to manage.

 

Below are the steps to create a sample Analysis in QuickSight:

  1. Any Analysis in QuickSight requires data from a Data Set. First click on the “Manage data” link at top right to list the Data Sets we currently have.
  2. To create a new Data Set, click the “New data set” link
  3. We can create Data Set from any of the Data sources listed here – uploading a file, S3, Athena table, etc.
  4. For our example, I am selecting Athena as data source and giving it a name “Athena source”. Then we must map this to a database / table in Athena.
  5. After we select the Athena table, QuickSight provides us an option to import the data to SPICE. SPICE is Amazon QuickSight’s in-memory optimized calculation engine, designed specifically for fast, adhoc data visualization. SPICE stores our data in a system architected for high availability, where it is saved until we choose to delete it.
  6. Using the Edit/Preview Data option above allows us to select the columns to be included in Data set and rename them if required.
  7. Once we click the “Save & visualize” link above, QuickSight starts creating an Analysis for us. For our exercise we will select the Table visual type from the list.
  8. Add Account Name and User_id by dragging them from “Fields list” to “Group by” and course_active to “Value”
  9. Now we will add 2 parameters for Account Name and Learner id by clicking on Parameters at bottom left. While creating the parameter use the option “Link to a data set field” for Values and link the parameter to the appropriate column in the Athena table
  10. Once the parameters are added, create controls for the parameters. If we are adding 2 parameters with controls, we have option of showing only relevant values for second parameter based on the values selected for first parameter. For this select the “Show relevant values only” checkbox.
  11. Next add 2 Custom filters for Account Name and Learner id. These filters should me mapped to the parameters we had created earlier. For this choose the Filter type as “Custom filter” and select the “Use parameters” checkbox.
  12. Now using the Visualize option, we can verify if our Controls are working correctly
  13. To share the Dashboard with others, use the share option on top towards the right and use publish dashboard. We can search for users of the AWS account by email and publish to selective users.
0

AWS

Introduction to Alexa

Amazon’s Alexa is the voice activated, interactive  AI Bot, or intelligent personal assistant in the cloud that lets people speak with their Amazon Echo, Echo dot and other Amazon smart home devices. Alexa is designed to respond to number of commands and converse with people.

 

Alexa Skills are apps that give Alexa even more abilities. These skills can let her speak to more devices or websites.When the Alexa device is connected through wi-fi or Bluetooth to the internet, it wakes up by merely saying “ALEXA”.  Alexa Skills radically expands the bots repertoire, allowing users to perform more actions with voice-activated control through Alexa.

 

Overview of Alexa Skill

The most important part of ALEXA skill is its interaction design. Alexa skills don’t have visual feedback  like web or desk top applications and will guide the user through the skill using voice. All Alexa skill replies needs to tell the user clearly what the next options are.

 

Functional Architechture

An Alexa skill is a small application that interacts with Alexa via an AWS Lambda function

 

Functional Architechture

 

Designing the Alexa Skill

The most important aspect of the skill is its Vocal interface. The skill should be interacting naturally with the user. The components of Alexa Skill are :

 

  • Alexa requires a word, often called as Wake phrase which would alert the device that they can expect a command immediately after.The Default wake phrase is ALEXA. It can also be Amazon, Echo,Computer.
  • Launch Phrase is the word that tells Alexa to trigger a skill. Examples of Launch phrase are “ OPEN”, “ASK” , “START”, “LAUNCH”.
  • Invocation name is the name of the skill.
  • Intents are the goals that the user is trying to achieve by invoking the skill.
  • Utterance tells alexa what the skill should do. Apart from Static utterances such as Start and Launch, dynamic commands can be added. These Dynamic commands are called slots.
  • Each intent can contain one or more slots. A Slot is the variable that is parsed and exposed to the application code.

 

Sampleutterances

Alexa has a built in natural language processing engine. To Map the verbal phrase to an intent, Alexa handles the complexity of natural language processing through the help of a manually curated file Sampleutterances.txt.

 

The first word in each line of SampleUtteranaces.txt is the intent name. The application code reads the value of the intent name and responds appropriately. Following the intent name, is the phrase that the user says to achieve that intent. The User might tell a phrase apart from those defined as Slots in the intent. The application is free to react differently based on the presence or value of the slot. To give Alexa the best chance of understanding users, it is recommended to include as many sample utterances as possible. Depending on the skill there could be n number of ever changing sample utterances.

 

The below example sums up the entire vocal interface

 

entire vocal interface

 

Build and Publish a new skill

Building and publishing a new skill in Alexa comprises of  the below steps:

 

1. Create and Configure Skill :

Create an Alexa skill using  https://developer.amazon.com/alexa-skills-kit. This will open the skill information where we can specify the name of the skill and the invocation name.

 

2. Create Interaction Model:

Interaction model is a set of rules that defines the way the user interacts with your skill. As a part of an interaction model, Intents, Utterances  are defined. The intent schema should be in JSON format and it should define an array of intents, each with a name, and an optional list of dynamic parts — slots. Alexa will automatically train itself with the provided interaction model.

 

3. Coding the Backend system:

Once the interaction model has been designed Code and deployment of the Lambda function has to be done.

For each intent, an input/output contract has to be implemented. The input is an IntentRequest which is a representation of the user’s request and includes all the slot values.

 

The response from alexa can be of  multiple ways.

 

  • Ask the user a question and wait for response.
  • Give the details to the user and shut down.
  • Say nothing and shut down.

 

Alexa can either respond verbally or the response could be displayed on the phone.

 

4. Deploying the Backend system:

The skills can be deployed as an AWS Lambda function with code written in Java or Node.Js, Python or C#. The simplest approach would be the code in Node.Js.

 

5. Testing the Skill:

Testing of the Skill can either be done through the test simulator available in the Developer console account or through the device connected to the development account.

 

6. Publishing the Skill:

To Publish the skill, The skill has to be  submitted by filling out the “Publishing Information” and the “Privacy & Compliance” sections

 

0

AWS, Azure

This is the fifth blog in our series helping you understand all about cloud, when you are in a dilemma to choose Azure or AWS or both, if needed.

 

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on data analytics and the current trends on the subject.

If you would rather like to have quick look at the comparison table, Click here

 

This blog is intended to help you strategize your data analytics initiatives so that you can make the most informed decision possible by analyzing all the data you need in real time. Furthermore, we also will help you draw comparisons between Azure and AWS, the two leaders in cloud, and their capabilities in Big Data and Analytics as published in a handout released by Microsoft.

 

Beyond doubts, this is an era of data. Every touch point of your business generates volumes of data and these data cannot be simply whisked away, cast aside as valuable business insights can be unearthed with a little effort. Here’s where your Data Analytics infrastructure helps.

 

A 2017 Planning Guide for Data and Analytics published by Gartner written by the Analyst John Hagerty states that

 

The Key Findings as per the report are as follows:

 

  • Data and analytics must drive modern business operations, not just reflect them. Technical professionals must holistically manage an end-to-end data and analytics architecture to acquire, organize, analyze and deliver insights to support that goal.
  • Analytics are now infused in places where they never existed before.
  • Executives will seek strategies to better manage and monetize data for internal and external business ecosystems.
  • Data gravity is rapidly shifting to the cloud, with IoT, data providers and cloud-native applications leading the way. It is no longer a question of “if” for using cloud for data and analytics; it’s “how.”

 

The last point emphasizes on how cloud is playing a prominent role when it comes to Data Analytics and if you have thoughts on who and how, Gartner in its latest magic quadrant has said that AWS and Azure are the top leaders. Now, if you are in doubt whether to go the Azure way or AWS or should it be the both, here’s the comparison table showing their respective Big Data and Analytics Capabilities

 

 

 

Click here to read the entire guide published by Microsoft Azure Team:

0

AWS, Azure

This is our fourth blog in the series of blogs intended to help you embark on a cloud strategy, most importantly when you are in dilemma to choose AWS or Azure, the two prominent cloud players today.

 

If you had missed our earlier blogs, click here

1st Blog – Compute

2nd Blog- Storage

3rd Blog- CDN & Networking

 

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on the database aspect of cloud strategy.

If you would rather like to have quick look at the database comparison table, click here


Through this blog, let’s understand the database aspect of your cloud strategy. As per the guide, Database services refers to options for storing data, whether it’s a managed relational SQL database that’s globally distributed or a multi-model NoSQL database designed for any scale.


When you decide cloud, one of the critical decisions you face is which database to use – SQL or NoSQL. Though SQL has an impressive track record, NoSQL is not far behind as it is gradually making notable gains and has many proponents. Once you have picked your database, the other big decision to make is which cloud vendor to choose amongst the many vendors.

 

Here’s where you consider Gartner’s prediction; the research company published a document that states

“Public cloud services, such as Amazon Web Services (AWS), Microsoft Azure and IBM Cloud, are innovation juggernauts that offer highly operating-cost-competitive alternatives to traditional, on-premises hosting environments.

Cloud databases are now essential for emerging digital business use cases, next-generation applications and initiatives such as IoT. Gartner recommends that enterprises make cloud databases the preferred deployment model for all new business processes, workloads, and applications. As such, architects and tech professionals should start building a cloud-first data strategy now, if they haven’t done so already”


Reinstating the trend, recently Gartner has published a new magic quadrant for infrastructure-as-a-service (IaaS) that – surprising nobody – has Amazon Web Services and Microsoft alone in the leader’s quadrant and a few others thought outside of the box.

 

Now, the question really is, Azure or AWS for your cloud data? Or should it be both? Here’s a quick comparison table to guide you.

 

 

Click here to read the entire guide published by Microsoft Azure Team:

0

AWS, Azure

In line with our latest blog series highlighting how common cloud services are made available via Azure and Amazon Web Services (AWS), as published by Microsoft, this third blog in the series helps you understand Cloud Networking and Content Delivery capabilities of both Azure and AWS.

 

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on cloud content delivery networking and the current trends on the subject.

If you would rather like to have quick look at the comparison table, click here

 

When we talk about cloud Content Delivery Network (CDN) and the related networking capabilities it includes all the hardware and software that allows you to easily provision private networks, connect your cloud application to your on-premises datacenters, and more.

 

According to Gartner, Content delivery networks (CDNs) are a type of distributed computing infrastructure, where devices (servers or appliances) reside in multiple points of presence on multi-hop packet-routing networks, such as the Internet, or on private WANs. A CDN can be used to distribute rich media downloads or streams, deliver software packages and updates, and provide services such as global load balancing, Secure Sockets Layer acceleration and dynamic application acceleration via WAN optimization techniques.

 

In simpler terms, this highly distributed server platforms are optimized to deliver content in a way that improves customer experience. Hence, it is important to decrease latency by keeping the data closer to the users, protect it from security threats while ensuring rapid streamlined content delivery including general web delivery, content purge, content caching and tracking history as long as 90 days.

 

As per G2Crowd.com, most organizations use CDN services, such as web caching, request routing, and server-load balancing, to reduce load times and improve website performance. Further to qualify as a CDN provider, a service provider must:

 

  • Allow access to a geographically dispersed network of PoPs in multiple data centers
  • Help websites access this network to deliver content to website visitors
  • Offer services designed to improve website performance
  • Provide scalable Internet bandwidth allowances according to customer needs
  • Maintain data center(s) of servers to reduce the possibility of overloading individual instances

 

 

With this background, let’s look at the AWS vs Azure comparison chart in terms of Networking and Content Delivery Capabilities:

 

 

To read more about the Microsoft guide which briefs all about cloud by drawing comparisons between Azure or AWS, click here (link to PDF download)

 

You may also like to read our previous blogs in these series, if so, please click here:

http://cloudiqtech.com/azure-vs-aws-compute/
http://cloudiqtech.com/aws-vs-azure-cloud-storage/

0

AWS, Azure
Azure or AWS or Azure & AWS? What’s your cloud strategy for Storage?

This is our second blog, in our latest blog series helping you understand all about cloud, especially when you are in doubt whether to go Azure or AWS or both.

 

To read our first blog talking about Cloud strategy in general and Compute in particular, click here…

 

Moving on, in this blog let’s find what Azure or AWS offer when it comes to Storage Capabilities for your Cloud Infrastructure.

 

Globally CIOs are increasingly looking to cease running their own data centers and move to cloud which is evident when we read the projection made by a leading researcher, MarketsandMarkets. They had reported that the global cloud storage business sector to grow from $18.87 billion in 2015 to $65.41 billion by 2020, at a compound annual growth rate (CAGR) of 28.2 percent during the forecast period.

 

Reinstating the fact, 451 Research’s Voice of the Enterprise survey last year stated that Public cloud storage spending will double by next year (2017). “IT managers are recognizing the need for storage transformation to meet the realities of the new digital economy, especially in terms of improved efficiency and agility in the face of relentless data growth,” said Simon Robinson, research vice president at 451 and research director of the new Voice of the Enterprise: Storage service. “It’s clear from our Q4 study that emerging options, especially public cloud storage and all-flash array technologies, will be increasingly important components in this transformation” he added further.

 

As we see, many companies are in for Cloud Storage, undoubtedly. But the big question – Whom to choose from a gamut of leading public cloud players including big players like AZURE, AWS; Should it be AZURE alone for your cloud storage or AWS or a combination of both still prevails.

 

This needs a thorough understanding. To help you decide for good, we have decided to re-produce a guide, published by Microsoft that briefs Azure‘s capabilities in comparison to AWS when it comes to Cloud Strategy. And we will see the Storage part in this blog, but before, that a little backgrounder on Cloud Storage.

 

When we talk about cloud storage device mechanisms, we include all logical units of data storage covering from files, blocks, and datasets to objects and their relative storage interfaces. These instances of virtual storage devices are designed specifically for cloud-based provisioning and can be scaled as per need. It is to be noted that different cloud service consumers utilize different technologies to interface with virtualized cloud storage devices.

 

 

For a more detailed understanding download the document here

0

AWS, Azure

Surprisingly, as per an article published by Gartner, “Cloud Computing is still perplexing to many CIOs even after a decade of cloud’. While cloud computing is a foundation for digital business, Gartner estimates that less than one-third of enterprises have a documented cloud strategy. This indeed comes as a surprise given the fact that cloud has evolved from a disruption to the indispensable tech of today and tomorrow, all along strategically adopted by many progressive companies.

 

In the same article Donna Scott, Vice President and distinguished analyst at Gartner states that “Cloud computing will become the dominant design style for new applications and for refactoring a large number of existing applications over the next 10-plus years”. She also added that “A cloud strategy clearly defines the business outcomes you seek, and how you are going to get there. Having a cloud strategy will enable you to apply its tenets quickly with fewer delays, thus speeding the arrival of your ultimate business outcomes.”

 

However, it is easier said than done. Many top businesses still have questions like how to make the most from cloud computing? What kind of architectures and techniques need to be strategized to support the many flavors of evolving cloud computing? Private or Public? Hybrid or Public? Azure or AWS, or it should be a hybrid combo?

 

Through a series of blogs we intent to bring answers to these questions. As a first one, we would like to highlight and represent a comparative cloud service map focusing on both Azure and AWS both leaders in public cloud platforms, as published by Microsoft.

 

The well-researched article draws detailed comparisons between Azure and AWS and how common cloud services across parameters such as Marketplace, Compute, Storage, Networking, Database, Analytics, Big Data, Intelligence, IOT, Mobile and Enterprise Integration are made available via Azure and Amazon Web Services (AWS)

 

It should be noted that as prominent public cloud platforms providers, Azure and AWS each offer businesses a wide and comprehensive capabilities across the globe. Many organizations have chosen either one of them or both depending upon their needs in order to gain more agility, and flexibility while minimizing the risk and maximizing the larger benefits of a multi-cloud environment.

 

For starters, let’s start with COMPUTE and the points one should consider and compare before deciding the Azure or AWS approach or a combination of both.

 


For a more detailed understanding download the document here

0

AWS, DevOps, Docker

A microservices-based architecture introduces agility, flexibility and supports a sustainable DEVOPS culture ensuring closer collaboration within businesses and the news is that it’s actually happening for those who embraced it.

 

True, monolith apps architectures have enabled businesses to benefit from IT all along as it is single coded, simple to develop, test and run. As they are also based on a logical modular hexagonal or layered architectures (Presentation Layer responsible for handling HTTP requests and responding with either HTML or JSON/XML, Business logic layer, Database access and Apps integration) they cover and tie all processes, functions and gaps to an extent.

Despite these ground level facts, monolith software, which is instrumental for businesses embrace IT in their initial stages and which even exists today, is seeing problems. The growing complex business operation conditions are purely to be blamed.

 

So, how do businesses today address new pressures caused by digitization, continuous technology disruptions, increased customer awareness & interceptions and sudden regulatory interventions? The answer lies in agility, flexibility and scalability of the underlying IT infrastructure- the pillars of rapid adaptability to changes.

 

Monolith Apps, even though it is based on a well-designed 3 tier architecture, in the long run, loses fluidity and turns rigid. Irrespective of its modularity, modules are still dependent on each other and any minimal change in one module needs generation and deployment of all artifacts in each server pool, touched across the distributed environment.

 

Besides whenever there is a critical problem, the blame game starts amongst the UI developers, business logic experts, backend developers, database programmers, etc as they are predominantly experts in their domains, but have little knowledge about other processes. As the complexity of business operations sets in, the agility, flexibility and scalability part of your software is highly tested in a monolithic environment.

 

Here’s where Microservices plays a huge role as the underlying architecture helps you break your software applications into independent loosely coupled services that can be deployed and managed solely at that level and needn’t have to depend on other services.

 

For example, if your project needs you to design and manage inventory, sales, shipping, and billing and UI shopping cart modules, you can break each service down as an independently deployable module. Each has its own database, where monitoring and maintenance of application servers are done independently as the architecture allows you to decentralize the database, reducing complexity. Besides it enables continuous delivery/deployment of large, complex applications which means technology also evolves along with the business.

 

The other important aspect is that microservices promotes a culture wherein whoever develops the service is also responsible to manage it. This avoids the handover concept and the following misunderstandings and conflicts whenever there is a crisis.

In line with the DevOps concept, Microservices enables easy collaboration between the development and operations team as they embrace and work on a common toolset that establishes common terminology, as well as processes for requirements, dependencies, and problems. There is no denying the fact that DevOps and microservices work better when applied together.

 

Perhaps that’s the reason companies like Netflix, Amazon, etc are embracing the concept of microservices in their products. And for other new businesses embracing it, a new environment where agility, flexibility and closer collaboration between business and technology becomes a reality providing the much-needed edge in these challenging times.

0

Allow access to s3 bucket only from vpc

Currently I am evaluating options to lockdown permissions to my S3 Buckets as part of Security Enhancements.
These are the steps I followed to lock down S3 Bucket access only to my VPC

Create VPC End Points

VPC End Points Screen Shot



Attach the S3 Bucket Policy to Restrict Access


Access the Buckets Outside VPC

Once you have attached the policy, if you access the S3 Files through console not being on VPC, you will receive the error.
Access the Buckets from VPC

If you log into a EC2 Instance which is hosted on VPC, you will be able to access the s3 Bucket.

SSH Into your EC2 Machine and verify your VPC through Instance Meta Data Store.

If you execute s3 commands to access the bucket, you will be able to access the S3 Bucket without access denied error.

0

AWS

Here is a look at some of the common queries that will be useful when troubleshooting AURORA database.

 

Number of Connections by Host

 

Aurora Max Connections
0

PREVIOUS POSTSPage 1 of 3NO NEW POSTS