1  The CRISP-DM Process

CRISP-DM (Cross-Industry Standard Process for Data Mining) is the most widely used methodology for planning and executing data science and data mining projects. It breaks a project into six phases that form an iterative life cycle — the arrows between phases run in both directions, and the whole loop often repeats as new insight emerges.

1.1 The Six Phases

┌─────────────────────┐      ┌─────────────────────┐
│ 1. Business          │ ───▶ │ 2. Data              │
│    Understanding     │ ◀─── │    Understanding     │
└─────────────────────┘      └──────────┬──────────┘
          ▲                              │
          │                              ▼
          │                   ┌─────────────────────┐
          │                   │ 3. Data Preparation  │
          │                   └──────────┬──────────┘
          │                              │
          │                              ▼
          │                   ┌─────────────────────┐
          │                   │ 4. Modeling          │
          │                   └──────────┬──────────┘
          │                              │
          │                              ▼
┌─────────┴───────────┐      ┌─────────────────────┐
│ 6. Deployment        │ ◀─── │ 5. Evaluation        │
└─────────────────────┘      └─────────────────────┘

1.1.1 1. Business Understanding

Focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan.

  • Determine business objectives and success criteria
  • Assess the situation (resources, risks, constraints, costs/benefits)
  • Define data mining goals
  • Produce a project plan

1.1.2 2. Data Understanding

Starts with initial data collection and proceeds to activities that build familiarity with the data, identify quality problems, and form first hypotheses.

  • Collect initial data
  • Describe data (format, volume, fields)
  • Explore data (summary stats, visualizations)
  • Verify data quality

1.1.3 3. Data Preparation

Covers all activities to construct the final dataset that will be fed into the modeling tools. Typically the most time-consuming phase.

  • Select data (rows and columns)
  • Clean data (handle missing values, errors, outliers)
  • Construct data (derived attributes, feature engineering)
  • Integrate data (merge sources)
  • Format data for modeling

1.1.4 4. Modeling

Various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Often loops back to data preparation.

  • Select modeling techniques
  • Generate a test design (train/validation/test split)
  • Build models
  • Assess models (technical performance)

1.1.5 5. Evaluation

Evaluates the model(s) against the business objectives (not just technical metrics) to be certain the model properly achieves the goals before deployment.

  • Evaluate results against business success criteria
  • Review the process
  • Determine next steps (proceed, iterate, or stop)

1.1.6 6. Deployment

Organizes and presents the knowledge so the customer can use it. Ranges from a report to a repeatable, production-grade scoring process.

  • Plan deployment
  • Plan monitoring and maintenance
  • Produce a final report
  • Review the project (lessons learned)

1.2 Key Characteristics

  • Iterative, not linear — phases feed back into one another; you frequently return to earlier phases as you learn.
  • Business-driven — the process begins and ends with business value, not algorithms.
  • Tool- and industry-agnostic — applies to any domain and tech stack.
  • Documentation-heavy — each phase produces deliverables that inform the next.

1.3 This Repository

This repository is organized as a Quarto book where each CRISP-DM phase is a chapter with its own working folder:

Phase Folder
1. Business Understanding 01_business_understanding/
2. Data Understanding 02_data_understanding/
3. Data Preparation 03_data_preparation/
4. Modeling 04_modeling/
5. Evaluation 05_evaluation/
6. Deployment 06_deployment/