Comparing Data Science Platform Capabilities

big_data
datavisualization
data_mining
data_wrangling
spark

#1

I want to understand and evaluate the cost of building a data science platform that has the capabilities listed below.

Data Ingestion - File uploads from filesystem (FTP, SFTP)
    Cloud (S3)
    HDFS
    Oracle
    Plugin support
Data Versioning - Ability to manage versions of data
File Format Support CSV
    Text
    JSON
    Excel
Automatic Schema Detection  
Data Wrangling - Visual interactive & collaborative data cleaning and data imputation
Data Preparation - Apply data transformations (visually)
    Variable type detection
    Encoding
    Data grouping and aggregation
Data Pipeline - Ability to visually create and manage data pipelines
    Automating & Scheduling data pipelines
Machine Learning - Comparing models
    Feature engineering
    Model versioning
Distributed Processing  
Data Mining Interactive & collaborative notebooks for data exploration
Data Visualization
    Many built in charts
    Ability to integrate javascript libraries (d3, leaflet etc)
    Dashboards for executives
Design To Production    
    Expose your model as REST api's
    Running multiple versions of the same model for testing

Can someone guide me on what tools/frameworks would we need to add on top of apache spark and zeppelin to get the expected results?