As ML models are developed to cover the whole data stored in a big data architecture,I have few questions.
- How do we analyze the whole data,which tool and technique could be used?
As python/R has limitation in analyzing the whole data in a standalone m/c,How could we do the same with whole data in hadoop clusters?
- How to develop and train the ML models in big data environment?
Should we create pyspark/python/R models in single stand alone machines and deploy the same in production?
Or is there another way to address the scenario.