Help required to choose processing architecture


I am trying to put together a solution for below mentioned use case. We are an aws mobile analytics customer generating more than 2 million events at peak hours in a day with half a million users. This event data is exported to S3 on a hourly basis.

Now we want to process this data and create our own tables on a hourly basis and simultaneously scale with increase in users count. We should be at least be able to process 20-40 Million events per hour. Data is stored as compressed file containing json per line.

  1. What is the best data store to create the tables ?
  2. What is the best architecture to process the data ? Is the last solution proposed at Link: using recursive lambda functions good ? Please suggest
  3. We need to be able to store all our data in one platform which can be used by business applications. Redshift seems slow for us with 300 GB Data and 4 nodes (dc1.large).