.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "_examples/ukvi-trips/plot_main01.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download__examples_ukvi-trips_plot_main01.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr__examples_ukvi-trips_plot_main01.py:


UKVI trips visualisation
------------------------

.. GENERATED FROM PYTHON SOURCE LINES 6-285


.. image-sg:: /_examples/ukvi-trips/images/sphx_glr_plot_main01_001.png
   :alt: Voyage Durations (total abroad 535 days)
   :srcset: /_examples/ukvi-trips/images/sphx_glr_plot_main01_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

             Outbound Date        Inbound Date Outbound Ports Inbound Ports  Days Difference Voyage Code
    0  2018-07-27 11:00:00 2018-07-30 21:25:00        LHR-FRA       FRA-LHR                2      BA0904
    1  2018-08-11 08:45:00 2018-08-25 13:25:00        STN-BLQ       SKG-LGW               13      FR0194
    2  2018-09-29 11:30:00 2018-10-03 16:40:00        LHR-FRA       FRA-LHR                3      LH0905
    3  2018-11-03 07:30:00 2018-11-06 14:45:00        LHR-FRA       FRA-LHR                2      LH0923
    4  2018-11-25 10:40:00 2018-12-01 13:50:00        STN-RAK       RAK-STN                5      FR3556
    5  2018-12-22 11:50:00 2018-12-26 22:40:00        STN-AOI       AOI-STN                3      FR0124
    6  2019-01-04 08:00:00 2019-01-08 11:30:00        STN-FRA       FRA-LHR                3      FR1687
    7  2019-01-21 08:00:00 2019-01-24 18:35:00        STN-FRA       FRA-STN                2      FR1687
    8  2019-02-09 08:05:00 2019-02-15 07:40:00        STN-BLQ       BLQ-STN                4      FR0194
    9  2019-02-20 17:35:00 2019-02-21 21:55:00        LGW-MXP       MXP-LTN                0      U28197
    10 2019-03-01 18:05:00 2019-03-15 06:25:00        LHR-JNB       WDH-JNB               12      SA0235
    11 2019-03-16 18:40:00 2019-03-20 07:40:00        STN-AOI       BLQ-STN                2      FR0124
    12 2019-03-26 08:05:00 2019-03-30 23:55:00        STN-BLQ       AOI-STN                3      FR0194
    13 2019-04-18 09:25:00 2019-04-22 20:21:00        LTN-AMS       AMS-LDN                3      U22157
    14 2019-04-27 06:25:00 2019-04-30 23:30:00        STN-AOI       BLQ-STN                2      FR0124
    15 2019-06-09 06:20:00 2019-06-15 22:00:00        STN-AOI       RMI-STN                5      FR0124
    16 2019-07-11 07:10:00 2019-07-15 13:30:00        LGW-BLQ       BLQ-STN                3      U28989
    17 2019-08-16 07:10:00 2019-08-22 08:05:00        LGW-BLQ       BLQ-STN                5      U28989
    18 2019-08-23 06:25:00 2019-09-02 23:55:00        STN-AOI       BLQ-STN                9      FR0124
    19 2019-11-16 11:35:00 2019-11-19 14:15:00        STN-BGY       BLQ-LTN                2      FR4219
    20 2019-12-22 08:05:00 2020-01-08 19:55:00        STN-BLQ       BLQ-LHR               16      FR0194
    21 2020-02-08 06:05:00 2020-02-10 21:30:00        LGW-ZRH       ZRH-LTN                1      U28113
    22 2020-02-20 09:15:00 2020-02-24 14:15:00        LTN-BLQ       BLQ-LTN                3      FR3406
    23 2020-03-04 06:30:00 2020-03-07 23:15:00        STN-ALC       ALC-STN                2      FR8382
    24 2020-07-26 14:35:00 2020-08-17 20:10:00        LHR-BLQ       BLQ-LHR               21      BA0542
    25 2020-09-06 14:35:00 2021-02-08 11:05:00        LHR-BLQ       FCO-LHR              153      BA0542
    26 2021-06-17 07:00:00 2021-09-25 12:05:00        STN-BLQ       AOI-STN               99      FR0194
    27 2021-11-15 17:25:00 2021-11-29 13:05:00        LTN-BLQ       PEG-STN               12      FR3406
    28 2022-01-04 08:05:00 2022-01-09 22:55:00        STN-BLQ       AOI-STN                4      FR0194
    29 2022-02-10 11:50:00 2022-02-22 07:40:00        STN-AOI       BLQ-STN               10      FR0124
    30 2022-03-24 09:50:00 2022-03-30 19:05:00        STN-TFS       TFS-LGW                5      LS1663
    31 2022-03-31 06:25:00 2022-04-03 11:25:00        STN-AOI       AOI-STN                2      FR0124
    32 2022-04-30 20:10:00 2022-05-02 21:20:00        LHR-BSL       BSL-LHR                1      BA0756
    33 2022-06-03 12:55:00 2022-06-14 12:05:00        STN-MAD       PMI-STN                9      FR5996
    34 2022-07-05 13:05:00 2022-07-27 15:45:00        STN-AOI       BRI-LGW               21      FR0124
    35 2022-09-13 13:05:00 2022-09-21 11:25:00        STN-AOI       AOI-STN                6      FR0124
    36 2022-10-06 16:15:00 2022-10-08 10:15:00        LGW-VRN       VRN-LGW                0      U28449
    37 2022-10-26 13:05:00 2022-12-26 15:25:00        STN-MAD       KUL-LHR               60      FR5996
    38 2023-01-13 06:15:00 2023-01-16 11:55:00        STN-BLQ       AOI-STN                2      FR0194
    39 2023-02-04 06:45:00 2023-02-06 16:10:00        LTN-ATH       ATH-LGW                1      W94467
    40 2023-02-21 16:05:00 2023-03-01 11:40:00        LTN-BLQ       PEG-STN                6      FR3406
    41 2023-04-21 18:35:00 2023-04-24 16:00:00        STN-AOI       PEG-STN                1      FR0261
    42 2023-04-28 14:45:00 2023-05-09 16:35:00        STN-LPA       LPA-STN               10      FR2842
    43 2023-05-28 17:10:00 2023-05-31 22:10:00        LHR-ZRH       ZRH-LGW                2      LX0325
    44 2023-06-09 11:10:00 2023-06-15 20:35:00        LGW-FCO       AOI-STN                5      W45781


|

.. code-block:: default
   :lineno-start: 7


    import pdfplumber
    import re
    import pandas as pd
    import matplotlib.dates as mdates
    import matplotlib.pyplot as plt

    from matplotlib.dates import DateFormatter
    from pathlib import Path


    def extract_basic_travel_data(pdf_path, start_page, end_page):
        """Extract the data from a PDF file.

        Ensure that the format is appropriate, ent the column headers match those
        included below. Otherwise modify as appropriate.

        Parameters
        ----------
        pdf_path: str
            The path to the file.
        start_page: int
            The start page where the table apperas.
        end_page: int
            The end page where the table appears.

        Returns
        -------
        """
        # Define the headers for essential data
        headers = [
            "Departure Date/Time", "Arrival Date/Time", "Voyage Code", "In/Out",
            "Dep Port", "Arrival Port"
        ]

        travel_data = []

        # Regex pattern to capture essential information
        row_pattern = re.compile(
            r"(\d{2}/\d{2}/\d{4} \d{2}:\d{2})\s+"  # Departure Date/Time
            r"(\d{2}/\d{2}/\d{4} \d{2}:\d{2})\s+"  # Arrival Date/Time
            r"(\S+)\s+"                            # Voyage Code
            r"(Outbound|Inbound)\s+"               # In/Out
            r"(\S+)\s+"                            # Dep Port
            r"(\S+)"                               # Arrival Port
        )

        # Open the PDF file and iterate over specified pages
        with pdfplumber.open(pdf_path) as pdf:
            for page_num in range(start_page - 1, end_page):
                page = pdf.pages[page_num]
                text = page.extract_text()
                if not text:
                    continue

                # Match rows using the regex pattern
                matches = row_pattern.findall(text)
                if matches:
                    for match in matches:
                        travel_data.append(dict(zip(headers, match)))

        # Return
        return travel_data


    def combine_outbound_inbound(df):
        """Combine Outbound-Ibound rows into a single one.

        Paramters
        ---------
        df: pd.DataFrame
            The DataFrame with the data.

        Returns
        -------
        pd.DataFrame

        """

        # Convert date columns to datetime format
        df["Departure Date/Time"] = \
            pd.to_datetime(df["Departure Date/Time"], 
                format="%d/%m/%Y %H:%M")
        df["Arrival Date/Time"] = \
            pd.to_datetime(df["Arrival Date/Time"], 
                format="%d/%m/%Y %H:%M")

        # Sort the DataFrame by "Departure Date/Time"
        df = df.sort_values(by="Departure Date/Time").reset_index(drop=True)

        # Process the DataFrame
        result = []
        for i in range(0, len(df) - 1, 2):  # Step by 2 to handle consecutive rows
            outbound = df.iloc[i]
            inbound = df.iloc[i + 1]

            # Ensure the pair consists of an outbound followed by an inbound
            if outbound["In/Out"] == "Outbound" and inbound["In/Out"] == "Inbound":
                # Calculate the difference in days
                days_difference = (inbound["Arrival Date/Time"] - outbound["Departure Date/Time"]).days - 1

                # Create a combined row with desired columns
                combined_row = {
                    "Outbound Date": outbound["Departure Date/Time"],
                    "Inbound Date": inbound["Arrival Date/Time"],
                    "Outbound Ports": outbound["Dep Port"] + '-' + outbound["Arrival Port"],
                    "Inbound Ports": inbound["Dep Port"] + '-' + inbound["Arrival Port"],
                    "Days Difference": days_difference,
                    "Voyage Code": outbound["Voyage Code"]
                }

                result.append(combined_row)

        # Return
        return pd.DataFrame(result)


    def display(df, cmap=None):
        """Plotting the graph.

        Parameters
        ----------
        df: pd.DataFrame
            The pandas DataFrame.

        Returns
        -------
        None
        """
        # Set up plot
        fig, ax = plt.subplots(figsize=(16, 8))

        # Fore each row (voyage)
        for i, row in df.iterrows():

            if cmap is None:
                color = 'skyblue' 
            else:
                cmap.get(row['Outbound Ports'].split('-')[1], 'skyblue')


            # Plot each voyage as a horizontal bar with text annotations
            ax.plot([row["Outbound Date"], row["Inbound Date"]], [i, i], marker='o', color=color, lw=6)
       
            # Formatting outbound and inbound dates
            outbound_str = row["Outbound Date"].strftime("%d %b")  # Day and abbreviated month
            inbound_str = row["Inbound Date"].strftime("%d %b")    # Day and abbreviated month

            # Adjust the text position to be further right
            ax.text(row["Inbound Date"] + pd.Timedelta(days=10), i - 0.05,  # Increased offset to 10 days
                    f"{row['Outbound Ports']} ({outbound_str}) to {row['Inbound Ports']} ({inbound_str}) | {row['Days Difference']} days", 
                    va='center', ha='left', fontsize=9, color="black")

        # Alternate month shading
        start_date = df["Outbound Date"].min().replace(day=1)
        end_date = df["Inbound Date"].max()
        current_date = start_date
        month = 0
        while current_date < end_date:
            next_month = (current_date + pd.DateOffset(months=1)).replace(day=1)
            ax.axvspan(current_date, next_month, color='gray' if month % 2 == 0 else 'lightgray', alpha=0.2)
            current_date = next_month
            month += 1

        # Add horizontal lines for each year
        years = pd.date_range(start=start_date, end=end_date+pd.DateOffset(years=1), freq='Y')
        for year in years:
            ax.axvline(year, color='black', linestyle='--', lw=1)  # Vertical line for each year
            ax.text(year - pd.Timedelta(days=90), len(df) + 0.5, year.year, 
                ha='left', va='center', fontsize=10, color='black')  # Year label

        # Setting the x-axis limits to include full years
        full_start_date = pd.Timestamp(year=start_date.year, month=1, day=1)
        full_end_date = pd.Timestamp(year=end_date.year + 1, month=1, day=1)  # Next January
        ax.set_xlim(full_start_date, full_end_date)

        # Set x-axis ticks to show full years from January to December
        ax.xaxis.set_major_locator(mdates.YearLocator())   # Major ticks at the beginning of each year
        ax.xaxis.set_minor_locator(mdates.MonthLocator())  # Minor ticks for each month
        ax.xaxis.set_major_formatter(DateFormatter("%Y"))  # Year as the format for major ticks

        # Formatting the plot
        ax.set_yticks(range(len(df)))
        ax.set_yticklabels(df["Voyage Code"])
        #ax.set_yticklabels(df['Days Difference'])
        ax.set_xlabel("Date")
        ax.set_title("Voyage Durations (total abroad %s days)" % df['Days Difference'].sum())

        # Set x-axis ticks to show abbreviated month names and year
        ax.xaxis.set_major_locator(mdates.MonthLocator())
        ax.xaxis.set_major_formatter(DateFormatter("%b %Y"))  # Month abbreviation and year
        plt.xticks(rotation=45)

        plt.grid(axis='x', linestyle='--', alpha=0.5)
        plt.tight_layout()
        plt.show()


    # ------------------------------------------------------
    # Main
    # ------------------------------------------------------
    # Include any missing entry. This could happen if the travel
    # was done by bus or train, as only flights have been recorded
    # in the system.
    MISSING = {
        'veronica': [
            {
                "Departure Date/Time": "22/04/2019 18:21",
                "Arrival Date/Time": "22/04/2019 20:21",
                "Voyage Code": "BUS001",
                "In/Out": "Inbound",
                "Dep Port": "AMS",
                "Arrival Port": "LDN"
            }
        ]
    }

    # Include the colors desired for each airport. For example 
    # they could be colored by country.
    COLORMAP = {
        'FRA': 'black',
        'BLQ': 'green',
        'LHR': 'blue',
        'LGW': 'blue',
        'RAK': 'skyblue',
        'STN': 'blue',
        'AOI': 'green', 
        'MXP': 'green', 
        'LTN': 'blue',
        'JNB': 'skyblue',
        'AMS': 'skyblue',
        'BGY': 'green',
        'ZRH': 'skyblue',
        'ALC': 'yellow',
        'TFS': 'yellow',
        'BSL': 'skyblue',
        'MAD': 'yellow',
        'VRN': 'green', 
        'ATH': 'skyblue',
        'LPA': 'yellow',
        'FCO': 'green', 
        'FRFHN': 'black',
        'LDN': 'blue'
    }


    # Define the PDF file path and page range to extract
    pdf_path = Path('./data/775243 Final Bundle.pdf')
    start_page = 6  # Page number where the tables start
    end_page = 8    # Page number where the tables end

    # Define the JSON file
    #pdf_path = Path('./data/bernard-2024.json')

    # Load DataFrame
    if pdf_path.suffix == '.pdf':
        trips = extract_basic_travel_data(pdf_path, start_page, end_page)
    elif pdf_path.suffix == '.json':
        trips = pd.read_json(pdf_path)
    else:
        print('File extension <%s> not supported.' % pdf_path.suffix)

    # Convert to DataFrame
    df = pd.DataFrame(trips)
    # Append missing rows using concat
    df = pd.concat([df, pd.DataFrame(MISSING['veronica'])], ignore_index=True)
    # Combine consecutive outbound-inbound trips into one row.
    df_cmb = combine_outbound_inbound(df)

    # Show
    print(df_cmb)

    # Display
    display(df_cmb)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  1.785 seconds)


.. _sphx_glr_download__examples_ukvi-trips_plot_main01.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_main01.py <plot_main01.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_main01.ipynb <plot_main01.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_