I recently started a new position as a data scientist at an E-commerce company. The company is founded about 4-5 years ago and is new to many data-related areas. Specifically, I’m their first data science employee. So I have to take care of both data analysis tasks as well as bringing new technologies to the company.
They have used Elastic Search (and Kibana) to have reporting dashboards on their daily purchases and user’s interactions on their e-commerce website.
They also use the Oracle database system to keep records of their daily turnovers and lists of their current products, clients, and sellers lists.
They use Data-Warehouse with cockpit 10 for generating reports on different aspects of their business including number 2 in this list.
At the moment, I grab batches of data from their system to perform predictive analytics from data science perspectives. In some cases, I use a static form of data such as monthly turnover, client values, and high-demand products, and run my predictive analysis using python (VS code). Also, I use google data studio or google sheets to present my findings. In other cases, I try to do time-series analysis using offline batches of data extracted from Elastic Search to do user recommendations and user personalization.
I really want to use modern data science tools such as Apache spark, Bigquery, AWS, Azure, or others where they really fit. I think these tools can improve my performance as a data scientist and can provide more continuous analytics of their business interactions. But honestly, I’m not sure where each tool is needed and what part of their system should be replaced by or combined with the current state of technology to improve productivity from the above perspectives.