01. Generate an EDA Report with Pandas Profiling

This script demonstrates how to generate a detailed Exploratory Data Analysis (EDA) report with a single line of code using the pandas-profiling library. It loads the standard Iris dataset, creates a comprehensive and interactive data profile, and saves the final report as a self-contained HTML file.

Note

The pandas-profiling library has been renamed to ydata-profiling. While the old import may still work for backward compatibility, it is recommended to install and import ydata-profiling in new projects.

16 # Libraries
17 import pandas as pd
18
19 # Specific
20 from pandas_profiling import ProfileReport
21 from sklearn.datasets import load_iris
22 from pathlib import Path
23
24 # Load data object
25 obj = load_iris(as_frame=True)
26
27 # Create report
28 profile = ProfileReport(obj.data,
29     title="Pandas Profiling Report",
30     explorative=True)
31
32 # Save to file
33 Path('./outputs').mkdir(parents=True, exist_ok=True)
34 profile.to_file("./outputs/profile01-report.html")

Out:

Upgrade to ydata-sdk
Improve your data and profiling with ydata-sdk, featuring data quality scoring, redundancy detection, outlier identification, text validation, and synthetic data generation.
Register at https://ydata.ai/register

Summarize dataset:   0%|                                                                                                                  | 0/5 [00:00<?, ?it/s]
Summarize dataset:   0%|                                                                            | 0/9 [00:00<?, ?it/s, Describe variable: sepal length (cm)]
Summarize dataset:   0%|                                                                             | 0/9 [00:00<?, ?it/s, Describe variable: sepal width (cm)]
Summarize dataset:   0%|                                                                            | 0/9 [00:00<?, ?it/s, Describe variable: petal length (cm)]
Summarize dataset:   0%|                                                                             | 0/9 [00:00<?, ?it/s, Describe variable: petal width (cm)]

  0%|                                                                                                                                     | 0/4 [00:00<?, ?it/s]
100%|###########################################################################################################################| 4/4 [00:00<00:00, 2000.14it/s]

Summarize dataset:  44%|#####################################7                                               | 4/9 [00:00<00:00, 200.25it/s, Get variable types]
Summarize dataset:  50%|#######################################                                       | 5/10 [00:00<00:00, 238.37it/s, Get dataframe statistics]
Summarize dataset:  55%|#########################################4                                  | 6/11 [00:00<00:00, 273.02it/s, Calculate auto correlation]
Summarize dataset:  64%|#####################################################4                              | 7/11 [00:00<00:00, 241.35it/s, Get scatter matrix]
Summarize dataset:  26%|###############                                           | 7/27 [00:00<00:00, 241.35it/s, scatter sepal length (cm), sepal length (cm)]
Summarize dataset:  30%|#################4                                         | 8/27 [00:00<00:00, 68.22it/s, scatter sepal length (cm), sepal length (cm)]
Summarize dataset:  30%|#################7                                          | 8/27 [00:00<00:00, 68.22it/s, scatter sepal width (cm), sepal length (cm)]
Summarize dataset:  33%|###################6                                       | 9/27 [00:00<00:00, 68.22it/s, scatter petal length (cm), sepal length (cm)]
Summarize dataset:  37%|#####################8                                     | 10/27 [00:00<00:00, 68.22it/s, scatter petal width (cm), sepal length (cm)]
Summarize dataset:  41%|########################                                   | 11/27 [00:00<00:00, 68.22it/s, scatter sepal length (cm), sepal width (cm)]
Summarize dataset:  44%|##########################6                                 | 12/27 [00:00<00:00, 68.22it/s, scatter sepal width (cm), sepal width (cm)]
Summarize dataset:  48%|############################4                              | 13/27 [00:00<00:00, 68.22it/s, scatter petal length (cm), sepal width (cm)]
Summarize dataset:  52%|###############################1                            | 14/27 [00:00<00:00, 68.22it/s, scatter petal width (cm), sepal width (cm)]
Summarize dataset:  56%|#################################3                          | 15/27 [00:00<00:00, 17.56it/s, scatter petal width (cm), sepal width (cm)]
Summarize dataset:  56%|################################2                         | 15/27 [00:00<00:00, 17.56it/s, scatter sepal length (cm), petal length (cm)]
Summarize dataset:  59%|##################################9                        | 16/27 [00:00<00:00, 17.56it/s, scatter sepal width (cm), petal length (cm)]
Summarize dataset:  63%|####################################5                     | 17/27 [00:00<00:00, 17.56it/s, scatter petal length (cm), petal length (cm)]
Summarize dataset:  67%|#######################################3                   | 18/27 [00:00<00:00, 17.56it/s, scatter petal width (cm), petal length (cm)]
Summarize dataset:  70%|#########################################5                 | 19/27 [00:01<00:00, 16.67it/s, scatter petal width (cm), petal length (cm)]
Summarize dataset:  70%|#########################################5                 | 19/27 [00:01<00:00, 16.67it/s, scatter sepal length (cm), petal width (cm)]
Summarize dataset:  74%|############################################4               | 20/27 [00:01<00:00, 16.67it/s, scatter sepal width (cm), petal width (cm)]
Summarize dataset:  78%|#############################################8             | 21/27 [00:01<00:00, 16.67it/s, scatter petal length (cm), petal width (cm)]
Summarize dataset:  81%|################################################           | 22/27 [00:01<00:00, 16.02it/s, scatter petal length (cm), petal width (cm)]
Summarize dataset:  81%|################################################8           | 22/27 [00:01<00:00, 16.02it/s, scatter petal width (cm), petal width (cm)]
Summarize dataset:  79%|#################################################################8                 | 23/29 [00:01<00:00, 16.02it/s, Missing diagram bar]
Summarize dataset:  83%|##################################################################2             | 24/29 [00:01<00:00, 16.02it/s, Missing diagram matrix]
Summarize dataset:  86%|####################################################################9           | 25/29 [00:01<00:00, 16.00it/s, Missing diagram matrix]
Summarize dataset:  86%|##############################################################################4            | 25/29 [00:01<00:00, 16.00it/s, Take sample]
Summarize dataset:  90%|#########################################################################5        | 26/29 [00:01<00:00, 16.00it/s, Detecting duplicates]
Summarize dataset:  93%|#####################################################################################6      | 27/29 [00:01<00:00, 16.00it/s, Get alerts]
Summarize dataset:  97%|###########################################################################3  | 28/29 [00:01<00:00, 16.00it/s, Get reproduction details]
Summarize dataset: 100%|#############################################################################################| 29/29 [00:01<00:00, 16.00it/s, Completed]
Summarize dataset: 100%|#############################################################################################| 29/29 [00:01<00:00, 20.40it/s, Completed]

Generate report structure:   0%|                                                                                                          | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|##################################################################################################| 1/1 [00:00<00:00,  1.27it/s]
Generate report structure: 100%|##################################################################################################| 1/1 [00:00<00:00,  1.27it/s]

Render HTML:   0%|                                                                                                                        | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|################################################################################################################| 1/1 [00:00<00:00,  1.25it/s]
Render HTML: 100%|################################################################################################################| 1/1 [00:00<00:00,  1.25it/s]

Export report to file:   0%|                                                                                                              | 0/1 [00:00<?, ?it/s]
Export report to file: 100%|#####################################################################################################| 1/1 [00:00<00:00, 249.74it/s]

Show

39 profile

Total running time of the script: ( 0 minutes 4.612 seconds)

Gallery generated by Sphinx-Gallery