Today, we are going to talk about one of the most important aspects of data science, which is problem formulation. Problem formulation is the process of defining the problem you want to solve using data science techniques. It involves identifying the problem, defining the scope, and setting the objectives of the project.
To better understand problem formulation, let’s consider an example from the book “Python for Data Analysis” by Wes McKinney. In Chapter 14, McKinney presents a case study of analyzing the NYC bike share dataset. The problem statement for this case study is to understand the patterns of bike usage in New York City and to identify the factors that influence bike usage.
To formulate this problem, we need to first identify the problem statement. In this case, the problem statement is “to understand the patterns of bike usage in New York City.” Once we have identified the problem statement, we need to define the scope of the problem. In this case, the scope is limited to the bike share dataset in New York City. Finally, we need to set the objectives of the project. In this case, the objectives are to identify the factors that influence bike usage, such as weather conditions, day of the week, time of the day, and so on.
Another example of problem formulation can be found in the book “Data Science from Scratch” by Joel Grus. In Chapter 3, Grus presents a case study of predicting whether a user will click on an ad or not. The problem statement for this case study is to predict whether a user will click on an ad or not, based on the user’s profile and other contextual data.
To formulate this problem, we need to first identify the problem statement. In this case, the problem statement is “to predict whether a user will click on an ad or not.” Once we have identified the problem statement, we need to define the scope of the problem. In this case, the scope is limited to the user’s profile and other contextual data. Finally, we need to set the objectives of the project. In this case, the objectives are to build a predictive model that can accurately predict whether a user will click on an ad or not, based on the available data.
In conclusion, problem formulation is a critical step in any data science project. It helps to define the problem you want to solve and sets the stage for the rest of the project. By identifying the problem statement, defining the scope, and setting the objectives of the project, you can ensure that your project is focused and that you are using the right data science techniques to solve the problem at hand.