Working with Humanitarian Exchange Language (HXL) Hashtags
Applying HXL to a global dataset of pandemic- and epidemic-prone disease outbreaks.
data management
Humanitarian Data Exchange
Humanitarian Exchange Language
disease outbreaks
epidemics
pandemics
Setting Up the Environment
We begin by loading the necessary R packages:
Step 1: Accessing the Dataset
We retrieve the latest dataset of disease outbreaks from GitHub:
# Define the API URL from GitHub containing the latest dataset update
url_api <- "https://api.github.com/repos/jatorresmunguia/disease_outbreak_news/contents/Last%20update"
# Retrieve the list of files in the "Last update" folder
file_list <- fromJSON(content(GET(url_api), as = "text"))$name
# Filter the most recent file that starts with "outbreaks"
last_file <- file_list[grepl(file_list, pattern = paste0("^outbreaks"))]
# Select the file with a .csv extension
rdata_file <- last_file[grepl(".csv$", last_file)]
# Construct the CSV file URL from GitHub and read the data into a dataframe
last_version <- read.csv(paste0("https://raw.githubusercontent.com/jatorresmunguia/disease_outbreak_news/refs/heads/main/Last%20update/", rdata_file), header = TRUE)
# Remove unnecessary ICD-11 classification columns
last_version <- last_version |> select(-c(icd11c1, icd11c2, icd11c3, icd11l1, icd11l2, icd11l3))
Step 2: Adding WHO and UNSD Regional Classifications
We integrate additional geographic classifications from WHO and UNSD.
WHO Regions
# Define the URL for the WHO regional classification dataset
url_who_class <- "https://cdn.who.int/media/docs/default-source/air-pollution-documents/air-quality-and-health/un-agencies-region-classification-for-country.xlsx?sfvrsn=289af35f_3"
# Create a temporary file to store the downloaded Excel file
temp_file <- tempfile(fileext = ".xlsx")
# Download the WHO classification file and save it to the temporary file
GET(url_who_class, write_disk(temp_file, overwrite = TRUE))
Response [https://cdn.who.int/media/docs/default-source/air-pollution-documents/air-quality-and-health/un-agencies-region-classification-for-country.xlsx?sfvrsn=289af35f_3]
Date: 2025-04-28 06:21
Status: 200
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 28.9 kB
<ON DISK> C:\Users\jator\AppData\Local\Temp\RtmpiGD1DX\file990423923f25.xlsx
# Read the WHO classification data from the downloaded Excel file
who_class <- read_excel(temp_file)
# Select relevant columns (ISO country code and WHO region name)
# Rename columns for consistency with the main dataset
last_version <- who_class |>
select(`ISO Country code`, `WHO Region name2`) |>
rename(iso3 = `ISO Country code`, who_region = `WHO Region name2`) |>
right_join(last_version, by = "iso3") # Merge with the main dataset based on the ISO3 country code
UNSD Regions
# Read the UNSD regional classification dataset from an Excel file
# The dataset is available at: https://unstats.un.org/unsd/methodology/m49/
unsd_class <- read_excel(path = "UNSD — Methodology.xlsx")
# Select relevant columns: ISO-alpha3 country code, region name, and sub-region name
# Rename columns to match the main dataset
last_version <- unsd_class |>
select(`ISO-alpha3 Code`, `Region Name`, `Sub-region Name`) |>
rename(
iso3 = `ISO-alpha3 Code`,
unsd_region = `Region Name`,
unsd_subregion = `Sub-region Name`
) |>
right_join(last_version, by = "iso3") # Merge with the main dataset based on the ISO3 country code
Step 3: Structuring the Dataset
To facilitate better organization, we:
- Order the dataset by year, country, and disease code.
- Assign a unique outbreak ID.
- Insert an empty row for HXL headers.
# Arrange the dataset in descending order by Year, then by ISO3 country code, and ICD-10 4-character code
last_version <- last_version |>
arrange(desc(Year), iso3, icd104c) |>
# Create a unique outbreak ID by concatenating Year, ISO3 country code, and ICD-10 4-character code
mutate(id_outbreak = paste0(Year, iso3, icd104c)) |>
# Add an empty row at the beginning of the dataset (for later HXL hashtag insertion)
add_row(.before = 1)
Step 5: Exporting the Final Dataset
Finally, we save the dataset as an Excel file:
write_xlsx(x = list(Data = last_version), path = "disease_outbreaks_HDX.xlsx")
Citation
BibTeX citation:
@online{torres munguía2025,
author = {Torres Munguía, Juan Armando},
title = {Working with {Humanitarian} {Exchange} {Language} {(HXL)}
{Hashtags}},
date = {2025-03-26},
langid = {en}
}
For attribution, please cite this work as:
Torres Munguía, Juan Armando. 2025. “Working with Humanitarian
Exchange Language (HXL) Hashtags.” March 26, 2025.