Note
Click here to download the full example code
01. Generate an EDA Report with Pandas Profiling
This script demonstrates how to generate a detailed Exploratory Data Analysis (EDA) report with a single line of code using the pandas-profiling library. It loads the standard Iris dataset, creates a comprehensive and interactive data profile, and saves the final report as a self-contained HTML file.
Note
The pandas-profiling library has been renamed to ydata-profiling. While the old import may still work for backward compatibility, it is recommended to install and import ydata-profiling in new projects.
16 # Libraries
17 import pandas as pd
18
19 # Specific
20 from pandas_profiling import ProfileReport
21 from sklearn.datasets import load_iris
22 from pathlib import Path
23
24 # Load data object
25 obj = load_iris(as_frame=True)
26
27 # Create report
28 profile = ProfileReport(obj.data,
29 title="Pandas Profiling Report",
30 explorative=True)
31
32 # Save to file
33 Path('./outputs').mkdir(parents=True, exist_ok=True)
34 profile.to_file("./outputs/profile01-report.html")
Out:
Upgrade to ydata-sdk
Improve your data and profiling with ydata-sdk, featuring data quality scoring, redundancy detection, outlier identification, text validation, and synthetic data generation.
Register at https://ydata.ai/register
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Summarize dataset: 0%| | 0/9 [00:00<?, ?it/s, Describe variable: sepal length (cm)]
Summarize dataset: 0%| | 0/9 [00:00<?, ?it/s, Describe variable: sepal width (cm)]
Summarize dataset: 0%| | 0/9 [00:00<?, ?it/s, Describe variable: petal length (cm)]
Summarize dataset: 0%| | 0/9 [00:00<?, ?it/s, Describe variable: petal width (cm)]
0%| | 0/4 [00:00<?, ?it/s]
100%|###########################################################################################################################| 4/4 [00:00<00:00, 2000.14it/s]
Summarize dataset: 44%|#####################################7 | 4/9 [00:00<00:00, 200.25it/s, Get variable types]
Summarize dataset: 50%|####################################### | 5/10 [00:00<00:00, 238.37it/s, Get dataframe statistics]
Summarize dataset: 55%|#########################################4 | 6/11 [00:00<00:00, 273.02it/s, Calculate auto correlation]
Summarize dataset: 64%|#####################################################4 | 7/11 [00:00<00:00, 241.35it/s, Get scatter matrix]
Summarize dataset: 26%|############### | 7/27 [00:00<00:00, 241.35it/s, scatter sepal length (cm), sepal length (cm)]
Summarize dataset: 30%|#################4 | 8/27 [00:00<00:00, 68.22it/s, scatter sepal length (cm), sepal length (cm)]
Summarize dataset: 30%|#################7 | 8/27 [00:00<00:00, 68.22it/s, scatter sepal width (cm), sepal length (cm)]
Summarize dataset: 33%|###################6 | 9/27 [00:00<00:00, 68.22it/s, scatter petal length (cm), sepal length (cm)]
Summarize dataset: 37%|#####################8 | 10/27 [00:00<00:00, 68.22it/s, scatter petal width (cm), sepal length (cm)]
Summarize dataset: 41%|######################## | 11/27 [00:00<00:00, 68.22it/s, scatter sepal length (cm), sepal width (cm)]
Summarize dataset: 44%|##########################6 | 12/27 [00:00<00:00, 68.22it/s, scatter sepal width (cm), sepal width (cm)]
Summarize dataset: 48%|############################4 | 13/27 [00:00<00:00, 68.22it/s, scatter petal length (cm), sepal width (cm)]
Summarize dataset: 52%|###############################1 | 14/27 [00:00<00:00, 68.22it/s, scatter petal width (cm), sepal width (cm)]
Summarize dataset: 56%|#################################3 | 15/27 [00:00<00:00, 17.56it/s, scatter petal width (cm), sepal width (cm)]
Summarize dataset: 56%|################################2 | 15/27 [00:00<00:00, 17.56it/s, scatter sepal length (cm), petal length (cm)]
Summarize dataset: 59%|##################################9 | 16/27 [00:00<00:00, 17.56it/s, scatter sepal width (cm), petal length (cm)]
Summarize dataset: 63%|####################################5 | 17/27 [00:00<00:00, 17.56it/s, scatter petal length (cm), petal length (cm)]
Summarize dataset: 67%|#######################################3 | 18/27 [00:00<00:00, 17.56it/s, scatter petal width (cm), petal length (cm)]
Summarize dataset: 70%|#########################################5 | 19/27 [00:01<00:00, 16.67it/s, scatter petal width (cm), petal length (cm)]
Summarize dataset: 70%|#########################################5 | 19/27 [00:01<00:00, 16.67it/s, scatter sepal length (cm), petal width (cm)]
Summarize dataset: 74%|############################################4 | 20/27 [00:01<00:00, 16.67it/s, scatter sepal width (cm), petal width (cm)]
Summarize dataset: 78%|#############################################8 | 21/27 [00:01<00:00, 16.67it/s, scatter petal length (cm), petal width (cm)]
Summarize dataset: 81%|################################################ | 22/27 [00:01<00:00, 16.02it/s, scatter petal length (cm), petal width (cm)]
Summarize dataset: 81%|################################################8 | 22/27 [00:01<00:00, 16.02it/s, scatter petal width (cm), petal width (cm)]
Summarize dataset: 79%|#################################################################8 | 23/29 [00:01<00:00, 16.02it/s, Missing diagram bar]
Summarize dataset: 83%|##################################################################2 | 24/29 [00:01<00:00, 16.02it/s, Missing diagram matrix]
Summarize dataset: 86%|####################################################################9 | 25/29 [00:01<00:00, 16.00it/s, Missing diagram matrix]
Summarize dataset: 86%|##############################################################################4 | 25/29 [00:01<00:00, 16.00it/s, Take sample]
Summarize dataset: 90%|#########################################################################5 | 26/29 [00:01<00:00, 16.00it/s, Detecting duplicates]
Summarize dataset: 93%|#####################################################################################6 | 27/29 [00:01<00:00, 16.00it/s, Get alerts]
Summarize dataset: 97%|###########################################################################3 | 28/29 [00:01<00:00, 16.00it/s, Get reproduction details]
Summarize dataset: 100%|#############################################################################################| 29/29 [00:01<00:00, 16.00it/s, Completed]
Summarize dataset: 100%|#############################################################################################| 29/29 [00:01<00:00, 20.40it/s, Completed]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|##################################################################################################| 1/1 [00:00<00:00, 1.27it/s]
Generate report structure: 100%|##################################################################################################| 1/1 [00:00<00:00, 1.27it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|################################################################################################################| 1/1 [00:00<00:00, 1.25it/s]
Render HTML: 100%|################################################################################################################| 1/1 [00:00<00:00, 1.25it/s]
Export report to file: 0%| | 0/1 [00:00<?, ?it/s]
Export report to file: 100%|#####################################################################################################| 1/1 [00:00<00:00, 249.74it/s]
Show
39 profile
Total running time of the script: ( 0 minutes 4.612 seconds)