Data Science and Machine Learning

AIH04 Launching A Data Science Project: Cleaning Is Half the Battle


9:30am - 10:45am

Level: Introductory

Kevin Feasel


Catallaxy Services, LLC

There's an old adage in software development: Garbage In, Garbage Out. This adage certainly applies to data science projects: if you simply throw raw data at models, you will end up with garbage results. In this session, we will build an understanding of just what it takes to implement a data science project whose results are not garbage. We will the Microsoft Team Data Science Process as our model for project implementation, learning what each step of the process entails. To motivate this walkthrough, we will see what we can learn from a survey of data professionals' salaries.

You will learn:

  • The steps involved in launching a data science process
  • Insights on data cleansing and feature engineering strategies.
  • About the different classes of algorithms available, as well as tips on when to use them