I am trying to put together a solution for below mentioned use case. We are an aws mobile analytics customer generating more than 2 million events at peak hours in a day with half a million users. This event data is exported to S3 on a hourly basis.
Now we want to process this data and create our own tables on a hourly basis and simultaneously scale with increase in users count. We should be at least be able to process 20-40 Million events per hour. Data is stored as compressed file containing json per line.
- What is the best data store to create the tables ?
- What is the best architecture to process the data ? Is the last solution proposed at Link:http://theburningmonk.com/2016/04/aws-lambda-use-recursive-function-to-process-sqs-messages-part-1/ using recursive lambda functions good ? Please suggest
- We need to be able to store all our data in one platform which can be used by business applications. Redshift seems slow for us with 300 GB Data and 4 nodes (dc1.large).