Spark Cheatsheet


#1

Hi,

Can someone share Spark data manipulation and Machine Learning concepts and code cheatsheet.


#2

Hi Arihant,

Please go thought the series of articles on PySpark.

  1. Article1 for Introduction of Apache Spark
  2. Article2 for Operations on RDD
  3. Article3 for DataFrame Manipulation

Best!
Ankit Gupta


#3

@Arihant,

If you want a live example of working codes you can look at my big data project on GitHub. Here is the link - https://github.com/aayushmnit/big-data-project. It uses amazon food review dataset openly available on kaggle and my report and presentation with code. Although this code is written using spark in local but we have also implemented it using AWS which only takes some minor changes ex - not importing libraries, not setting environmental variables, importing data from S3 buckets, writing as a .py file etc.

Hope this helps.
Regards,
Aayush Agrawal