AWS Data & Analytics Architecture Best Practices

Author: Igor Royzis

“Best practice is a procedure that has been shown by research and experience to produce optimal results and that is established or proposed as a standard suitable for widespread adoption” – Merriam-Webster Dictionary

Data, Analytics, Web & Mobile on AWS

Here is how your architecture would look like on AWS if you needed to implement most of the data, analytics, web and mobile use cases.

Yes, looks very busy and complex. The good news is most organizations only need to implement part of this architecture for their specific use cases.

So let’s get right into it and cover some of the popular use cases and architecture best practices? As you’re going through each use case, notice how they all have the same data lake foundation. In other words, regardless of what kind of data you’re ingesting, your data lake structure stays the same. This provides a consistent approach for storing, organizing, securing and governing your data, and allows to transform and analyze data from different sources and of different types using common technologies and even the same codebase.

Ingest, process and organize CSV files in near real-time on AWS

This is a straight forward and very popular use case for organizations that have many departments or lines of business with heavy use of spreadsheets. At some point organizations realizes that spending days or weeks creating, combing through and aggregating data from 20, 50 or 100 spreadsheets just to create end-of-month reports is very inefficient. This architecture allows to ingest and organize various spreadsheets into AWS data lake, transform and aggregate data in near real time using Glue jobs and allow organizations to use Athena/SQL queries to explore the data.

On-going replication of small to medium size Oracle or MS SQL Server databases to AWS data lake

Another popular use case is to establish a data warehouse/mart and BI foundation on AWS. This architecture ensures near-realtime replication of data from on-premise database to AWS data lake via DMS (Database Migration Service), provides ETL/ELT capability via Glue jobs and allows data exploration using Athena/SQL.

Process and organize events in near real-time

Many organizations adopted event based or event sourced architectures for their applications. This use case is appropriate for organizations that need to store and organize events produced by applications in AWS data lake in near real time.

Run ETL/ELT jobs and publish results to Redshift

Some organizations already have a way of ingesting data into AWS S3, but need a proven way of transforming and loading (ETL) or loading and transforming (ELT) data into RedShift data warehouse.

And now let’s put it all together for a typical medium complexity data platform on AWS with both internal and external data sources


There are more use cases that organizations are implementing on AWS, while utilizing best practices. They key to successful implementation is choosing the right architectural patterns and technologies for the job.