The Team Data Science Process lifecycle
The Team Data Science Process (TDSP) provides a lifecycle that your team can use to structure your data science projects. The lifecycle outlines the steps you can take to successfully complete a project.
You should use this lifecycle if you have a data science project that's part of an intelligent application. Intelligent applications deploy machine learning or AI models for predictive analytics. You can also use this process for exploratory data science projects and improvised analytics projects, but you might not need to implement every step of the lifecycle.
Your team can combine the task-based TDSP with other data science lifecycles, such as the cross-industry standard process for data mining (CRISP-DM), the knowledge discovery in databases (KDD) process, or your organization's own custom process.
Purpose and credibility
The purpose of TDSP is to streamline and standardize your approach to data science and AI projects. Microsoft has applied this structured methodology in hundreds of projects. Researchers studied TDSP and published their findings in peer-reviewed literature. The architectural framework of the TDSP is thoroughly tested and proven effective in many areas.
Five lifecycle stages
The TDSP lifecycle is composed of five major stages that your team performs iteratively. These stages include:
Here's a visual representation of the TDSP lifecycle:
The TDSP lifecycle is a sequence of steps that provide guidance for creating predictive models. Your team deploys the predictive models in a production environment that you plan to use to build intelligent applications. The goal of this process lifecycle is to navigate a data science project toward a clear engagement endpoint. Data science is an exercise in research and discovery. When you use a well-defined process to communicate tasks to your team, you increase the chance of successfully carrying out a data science project.
Each stage has its own article that outlines:
- Goals: The objectives of the stage.
- How to do it: An outline of the tasks you perform in the stage and guidance about how to complete them.
- Artifacts: The deliverables that you need to produce during the stage and resources that you can use to help you create them.
Peer-reviewed citations
Researchers publish peer-reviewed literature about the TDSP. Review the following material to investigate TDSP features and applications.
Software Engineering for Machine Learning: A Case Study (pages 291-300)
An Artificial Intelligence Life Cycle: From Conception to Production
Management of Machine Learning Lifecycle Artifacts: A Survey (pages 18–35)
Construction of a Quality Model for Machine Learning Systems (pages 307–335)
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal author:
- Mark Tabladillo | Senior Cloud Solution Architect
To see non-public LinkedIn profiles, sign in to LinkedIn.
Related resources
- For the first stage of the lifecycle, see Business understanding.
- What is the Team Data Science Process?
- Compare machine learning products and technologies