Note
Click here to download the full example code
02. Summary statistic for Dengue data
This script demonstrates a fundamental application of the tableone Python library for creating a clinical summary table. Using a dengue patient dataset, it showcases the process of summarizing patient characteristics. The script loads the data, selects key demographic and clinical variables (like age, gender, and platelet count), and defines a grouping variable to compare these characteristics across different patient subgroups. Finally, it generates and displays the formatted summary table.
14 # Libraries
15 import pandas as pd
16
17 # Specific
18 from pathlib import Path
19 from tableone import TableOne
20
21
22 # ------------------------
23 # Load data
24 # ------------------------
25 # Load data
26 path = Path('../../datasets/dengue-htd-dataset')
27 data = pd.read_csv(path / 'dengue.csv')
28
29 print(data)
30 print(data.columns)
31 # ------------------------
32 # Create tableone
33 # ------------------------
34 # Columns
35 columns = ['age', 'gender', 'haematocrit_percent', 'plt']
36
37 # Categorical
38 categorical = ['gender']
39
40 # Groupby
41 groupby = 'cvs_hos_split'
42
43 #
44 mytable = TableOne(data, columns=columns,
45 categorical=categorical, groupby=groupby)
Out:
Unnamed: 0 index age gender haematocrit_percent plt outcome missing ... split_2 split_3 split_4 split_5 split_6 split_7 split_8 split_9
0 0 0 84.0 1 42.0 20.0 0 0 ... train train train train train train train train
1 1 1 84.0 1 36.0 107.0 0 0 ... NaN NaN NaN NaN NaN NaN NaN NaN
2 2 2 39.0 1 32.0 145.0 0 0 ... train train train train train train train train
3 3 3 39.0 1 35.5 120.0 0 0 ... NaN NaN NaN NaN NaN NaN NaN NaN
4 4 4 39.0 1 29.0 93.0 0 0 ... NaN NaN NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
520 520 520 30.0 1 35.7 61.0 0 0 ... train train train train train train train test
521 521 521 30.0 1 38.5 93.0 0 0 ... NaN NaN NaN NaN NaN NaN NaN NaN
522 522 522 30.0 1 42.1 61.0 0 0 ... train train train train train train train test
523 523 523 30.0 1 39.3 63.0 0 0 ... train train train train train train train test
524 524 524 30.0 1 37.8 78.0 0 0 ... train train train train train train train test
[525 rows x 19 columns]
Index(['Unnamed: 0', 'index', 'age', 'gender', 'haematocrit_percent', 'plt',
'outcome', 'missing', 'cvs_hos_split', 'split_0', 'split_1', 'split_2',
'split_3', 'split_4', 'split_5', 'split_6', 'split_7', 'split_8',
'split_9'],
dtype='object')
Lets see the table
51 mytable.tableone
Lets show the raw HTML
Html
57 html = mytable.to_html()
58
59 # show
60 print(html)
Out:
<table border="1" class="dataframe">
<thead>
<tr>
<th></th>
<th></th>
<th colspan="4" halign="left">Grouped by cvs_hos_split</th>
</tr>
<tr>
<th></th>
<th></th>
<th>Missing</th>
<th>Overall</th>
<th>cvs</th>
<th>hos</th>
</tr>
</thead>
<tbody>
<tr>
<th>n</th>
<th></th>
<td></td>
<td>525</td>
<td>393</td>
<td>132</td>
</tr>
<tr>
<th>age, mean (SD)</th>
<th></th>
<td>0</td>
<td>37.3 (16.3)</td>
<td>37.4 (16.3)</td>
<td>36.9 (16.5)</td>
</tr>
<tr>
<th rowspan="2" valign="top">gender, n (%)</th>
<th>0</th>
<td></td>
<td>249 (47.4)</td>
<td>190 (48.3)</td>
<td>59 (44.7)</td>
</tr>
<tr>
<th>1</th>
<td></td>
<td>276 (52.6)</td>
<td>203 (51.7)</td>
<td>73 (55.3)</td>
</tr>
<tr>
<th>haematocrit_percent, mean (SD)</th>
<th></th>
<td>0</td>
<td>38.5 (8.6)</td>
<td>38.2 (8.6)</td>
<td>39.3 (8.7)</td>
</tr>
<tr>
<th>plt, mean (SD)</th>
<th></th>
<td>0</td>
<td>106.5 (132.5)</td>
<td>110.8 (146.6)</td>
<td>93.8 (75.9)</td>
</tr>
</tbody>
</table>
Total running time of the script: ( 0 minutes 0.135 seconds)