Chapter 0: What is Data Science? (An Overview of the Data Science Workflow)



Hi there! I am Kai Jing, an NUS undergraduate currently pursuing Accountancy and Business Analytics. Two years ago, I started on my journey in Business Analytics, eventually branching out into Machine Learning and Data Science.

Today, I hope to bring you through a series of blog posts, where the aim is to give you a more complete understanding of Data Science, Machine Learning, and the end-to-end process of a typical project. 

Fundamentally, Data Science is a field of study that aims to utilize statistical methods to perform data analysis and understand the implications of any actions in the real world. 

Machine Learning focuses more on the union of the statistical methods of Data Science with the theory of learning to find the best way to model problems in the real world to improve our understanding of situations and enable us to perform better for similar problems.

Business Analytics is a specific application of Data Science and Machine Learning to various problems in the world of business such as credit card fraud analysis, stock market predictions and business pricing decisions.


[Source: KDNugget]

So, a typical data science workflow mainly consists of the following stages:

1. Data Acquisition, Data Types and Choice of Data Analysis Platform

2. Data Wrangling, Data Manipulation and Data Cleaning

3. Feature Selection and Feature Engineering

4. Exploratory Data Analysis and Principle Component Analysis

5. Machine Learning (Supervised, Unsupervised and Reinforcement Learning)

6. Interpretations and Decision Analytics

The above-mentioned components can be neatly summarized in the diagram shown below:


[Source: Dataquest]

There are also generally 3 main themes in the analytics space of Data Science: Descriptive, Predictive and Prescriptive Analytics. 

For descriptive analytics, it is more commonly applied during the Data Wrangling and Exploratory Data Analysis stage, where we are trying to understand the relationships between the features in our dataset. Another stage where descriptive analytics plays an important role is during the Model Analysis and Interpretation stage, where is it important to be able to effectively understand results from any data science models used and communicate them across to stakeholders (if any).

For Predictive Analytics, it is applied most frequently during the Machine Learning stage, where we are trying to identify the relationships between the target feature (what we are trying to predict) and the predictor features (what we think explains our target feature) of the machine learning model. Modelling is also applied in this stage to help us better understand the problems we are trying to solve.

For Prescriptive Analytics, it is usually more applicable in the stages subsequent to Machine Learning, where action plans are drafted and decisions are made based on the interpretations we have drawn from the predictions of our machine learning models. It is also more commonly associated with various optimization techniques ranging from Linear Programming to Robust Optimization.

This post only provides a very skeletal overview of the Data Science workflow. For a more detailed understanding of the Data Science process, here are some videos that might be useful for reference:





Comments

Popular Posts