We recently formed a small data science team working on big data projects related to e-commerce. Now, I, as the supervisor, am thinking about how to maintain productive teamwork among us. Regarding toolsets, we have options like google cloud, AWS, GitLab, Trello, Slack, and Jira, where required.
I have the following specific concerns:
- How can we collaborate on the same piece of code? For example, I want to check the code DS1 has written and modify or comment on it and also use it myself where needed. Is it better to use GitLab, Jupiter notebooks, google colab, or AWS sagemker?
- How can one DS quickly share the result of his/her EDA analysis with others instead of exporting the figure as an image and sending it to the slack channel? Should we use Jupiter/google notebooks, google Sheets/data studio/AWS quicksight, python code on GitLab, or any better idea?
- Our data is stored in our on-premises local servers as CSV files. For prototyping phases, I batch download a specific period of it to my local drive as raw data. Then I run my ETL pipeline on the local version each time I want to try an EDA analysis or an ML algorithm. How can we change this pipeline to be convenient for teamwork? Does each DS need its own personalized ETL pipeline? or is it better to create an archive of ETL codes that each DS can call based on the need? Should we store the results of specific ETLs on the Cloud to be more accessible to others and also prevent having different results by each DSs?
- Any other tools to improve the productivity of a data science team?