02. Summary statistic for Dengue data

This script demonstrates a fundamental application of the tableone Python library for creating a clinical summary table. Using a dengue patient dataset, it showcases the process of summarizing patient characteristics. The script loads the data, selects key demographic and clinical variables (like age, gender, and platelet count), and defines a grouping variable to compare these characteristics across different patient subgroups. Finally, it generates and displays the formatted summary table.

14 # Libraries
15 import pandas as pd
16
17 # Specific
18 from pathlib import Path
19 from tableone import TableOne
20
21
22 # ------------------------
23 # Load data
24 # ------------------------
25 # Load data
26 path = Path('../../datasets/dengue-htd-dataset')
27 data = pd.read_csv(path / 'dengue.csv')
28
29 print(data)
30 print(data.columns)
31 # ------------------------
32 # Create tableone
33 # ------------------------
34 # Columns
35 columns = ['age', 'gender', 'haematocrit_percent', 'plt']
36
37 # Categorical
38 categorical = ['gender']
39
40 # Groupby
41 groupby = 'cvs_hos_split'
42
43 #
44 mytable = TableOne(data, columns=columns,
45     categorical=categorical, groupby=groupby)

Out:

     Unnamed: 0  index   age  gender  haematocrit_percent    plt  outcome  missing  ... split_2 split_3 split_4 split_5 split_6 split_7 split_8 split_9
0             0      0  84.0       1                 42.0   20.0        0        0  ...   train   train   train   train   train   train   train   train
1             1      1  84.0       1                 36.0  107.0        0        0  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2             2      2  39.0       1                 32.0  145.0        0        0  ...   train   train   train   train   train   train   train   train
3             3      3  39.0       1                 35.5  120.0        0        0  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
4             4      4  39.0       1                 29.0   93.0        0        0  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
..          ...    ...   ...     ...                  ...    ...      ...      ...  ...     ...     ...     ...     ...     ...     ...     ...     ...
520         520    520  30.0       1                 35.7   61.0        0        0  ...   train   train   train   train   train   train   train    test
521         521    521  30.0       1                 38.5   93.0        0        0  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
522         522    522  30.0       1                 42.1   61.0        0        0  ...   train   train   train   train   train   train   train    test
523         523    523  30.0       1                 39.3   63.0        0        0  ...   train   train   train   train   train   train   train    test
524         524    524  30.0       1                 37.8   78.0        0        0  ...   train   train   train   train   train   train   train    test

[525 rows x 19 columns]
Index(['Unnamed: 0', 'index', 'age', 'gender', 'haematocrit_percent', 'plt',
       'outcome', 'missing', 'cvs_hos_split', 'split_0', 'split_1', 'split_2',
       'split_3', 'split_4', 'split_5', 'split_6', 'split_7', 'split_8',
       'split_9'],
      dtype='object')

Lets see the table

51 mytable.tableone
Grouped by cvs_hos_split
Missing Overall cvs hos
n 525 393 132
age, mean (SD) 0 37.3 (16.3) 37.4 (16.3) 36.9 (16.5)
gender, n (%) 0 249 (47.4) 190 (48.3) 59 (44.7)
1 276 (52.6) 203 (51.7) 73 (55.3)
haematocrit_percent, mean (SD) 0 38.5 (8.6) 38.2 (8.6) 39.3 (8.7)
plt, mean (SD) 0 106.5 (132.5) 110.8 (146.6) 93.8 (75.9)


Lets show the raw HTML

Html

57 html = mytable.to_html()
58
59 # show
60 print(html)

Out:

<table border="1" class="dataframe">
  <thead>
    <tr>
      <th></th>
      <th></th>
      <th colspan="4" halign="left">Grouped by cvs_hos_split</th>
    </tr>
    <tr>
      <th></th>
      <th></th>
      <th>Missing</th>
      <th>Overall</th>
      <th>cvs</th>
      <th>hos</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>n</th>
      <th></th>
      <td></td>
      <td>525</td>
      <td>393</td>
      <td>132</td>
    </tr>
    <tr>
      <th>age, mean (SD)</th>
      <th></th>
      <td>0</td>
      <td>37.3 (16.3)</td>
      <td>37.4 (16.3)</td>
      <td>36.9 (16.5)</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">gender, n (%)</th>
      <th>0</th>
      <td></td>
      <td>249 (47.4)</td>
      <td>190 (48.3)</td>
      <td>59 (44.7)</td>
    </tr>
    <tr>
      <th>1</th>
      <td></td>
      <td>276 (52.6)</td>
      <td>203 (51.7)</td>
      <td>73 (55.3)</td>
    </tr>
    <tr>
      <th>haematocrit_percent, mean (SD)</th>
      <th></th>
      <td>0</td>
      <td>38.5 (8.6)</td>
      <td>38.2 (8.6)</td>
      <td>39.3 (8.7)</td>
    </tr>
    <tr>
      <th>plt, mean (SD)</th>
      <th></th>
      <td>0</td>
      <td>106.5 (132.5)</td>
      <td>110.8 (146.6)</td>
      <td>93.8 (75.9)</td>
    </tr>
  </tbody>
</table>

Total running time of the script: ( 0 minutes 0.135 seconds)

Gallery generated by Sphinx-Gallery