hero-image

AWS Business Intelligence Stack

2024-07-05

Introduction to AWS for Data Engineering and BI

You find yourself with a few different data sources. You need to clean up the data, store it, and put a business intelligence tool on top of it so you can gain insights from the data. This is a very common situation companies find themselves in. But often the platform you put together is a mishmash of tools and services from different providers. Wouldn’t it be nice to have all your business intelligence components under one cloud provider? AWS has you covered.

Transforming Data: Flexible AWS Solutions

For transforming data for storage and insights, AWS has many options: You can write serverless functions using your preferred programming language (python is what we usually use for this); you can have a small application for web scraping data running in a docker container; or you can orchestrate a transformation through AWS Glue. There are alternatives for edge cases as well. One data source could only export via email - which was fine because the AWS Data Lake can actually subscribe to email. Once the data is received, a lambda function gets kicked off to transform and load the data.

AWS has tried to lower the barrier of entry into using their platform for data engineering by providing some “no-code” services -- such as Glue Studio and DataBrew -- but we found them to be geared toward beginners and cumbersome to add to the infrastructure code. By using serverless functions, for example, we have more flexibility to grow the system as more data-sources are added.

Storing Data Efficiently with AWS S3

At the storage level, S3 is terrific as a data lake. It’s inexpensive and durable. AWS Glue Data Catalog defines the schema and how data is stored in S3, and AWS Lake Formation manages permissions and security.

AWS Glue and Lake Formation for Schema and Security

So far, AWS providing flexible and efficient tools comes as no surprise. But for the business intelligence tool we were a little skeptical about Quicksight. A few years ago we evaluated it for a project but decided the customer was better served by one of the existing enterprise-level tools such as Microsoft Power BI or Tableau. But recently we re-evaluated Quicksight for a new project and were impressed with the improvements made.

Evaluating AWS Quicksight for Business Intelligence

Quicksight doesn’t join data as easily as other tools, but we found it to be more flexible in creating simple visualizations. In this case, the customer defined a set of use cases and we delivered a small set of dashboards to serve various departments in their organization. We also demonstrated to the customer how easy it was for them to create new visualizations - which they were able to do almost immediately.

An added benefit to using Quicksight is not having to manage access for third-party software. It’s also 100% serverless, so you don’t have to manage any software.

Managing Cloud Infrastructure with Terraform and AWS CDK

With any cloud deployment, it’s critical to have your entire infrastructure in code. Our devops engineers prefer to use Terraform, but AWS CDK would work fine too. We even used AWS CodeCommit instead of Github and found no issues with that.

Conclusion

In a short amount of time we are able to build a robust data ingestion and reporting platform using only AWS tools and services. It’s low-cost, easy to use, and has the flexibility to grow with the customer. 

Share:

Recent Posts