Using a tidy approach in R to analyze Mexico’s 2024 ENVIPE

Part 1: Estimating crime prevalence with the {srvyr} package

data analysis
official statistics
Mexico
household surveys
ENVIPE
crime
victimization
public safety
tidyverse
srvyr
microdata
Author

Juan Armando Torres Munguía

Published

January 27, 2025

Overview

In this article, I will guide you through the process of analyzing Mexico’s 2024 National Survey of Victimization and Perception of Public Safety (Encuesta Nacional de Victimización y Percepción sobre Seguridad Pública, ENVIPE) using the {tidyverse} framework for survey analysis with the {srvyr} package. Specifically, we will calculate the crime prevalence rate per 100,000 inhabitants by state, according to the victim’s sex ( Tasa de prevalencia delictiva por entidad federativa por cada cien mil habitantes, según sexo de la víctima).

Official tables with this indicator can be found here.

By the end of this article, you will be able to use the {tidyverse} to:

  • Load the survey microdata from INEGI’s website.
  • Specify the ENVIPE survey design.
  • Calculate crime prevalence indicators.

About the ENVIPE

ENVIPE is an official survey that provides nationally and state-representative information on household crime victimization and perceptions of public safety in Mexico. Conducted annually since 2011 by the National Institute of Statistics and Geography (INEGI), ENVIPE collects data on the following crimes:

  • Total vehicle theft (car, pickup truck, or truck).
  • Theft of vehicle accessories, spare parts, or tools (car, pickup truck, or truck).
  • Wall graffiti, intentional vehicle damage, and other types of vandalism.
  • Burglary or attempted burglary.
  • Kidnapping.
  • Enforced disappearance.
  • Homicide.
  • Robbery or assault on the street or public transportation.
  • Theft in a manner different from the above.
  • Bank fraud.
  • Consumer fraud.
  • Extortion.
  • Threats.
  • Injuries.
  • Kidnapping.
  • Harassment or intimidation.
  • Groping.
  • Exhibitionism.
  • Attempted rape.
  • Sexual rape.
  • Other crimes.

All survey-related information, including questionnaires, methodology, glossaries, summaries, and microdata, is fully available at INEGI’s website.

Set-up

First, install and load the required packages:

library(tidyverse) # Tidy data analysis framework
library(srvyr) # Tidy syntax for survey data analysis
library(archive) # Extract files from a ZIP archive
library(kableExtra) # Create HTML table

Loading survey microdata from INEGI

Microdata is available here in various formats (CSV, DBF, DTA, SAV, and RData). In this post, we will use the RData files.

load(archive_read(
  archive = "https://www.inegi.org.mx/contenidos/programas/envipe/2024/microdatos/bd_envipe_2024_RData.zip",
  file = "BD_ENVIPE_2024.RData"
))
Tip

Using the {archive} package, we can directly extract the required file from bd_envipe_2024_RData.zip.

This loads the following objects into the global environment:

  • THogar: This object contains the interview result, the row of the selected respondent, the household factor, and other household characteristics.

  • TMod_Vic: This object includes information on the crimes suffered by the selected respondent and their household during the reference year, specifically during 2023. Information comes from the victimization module of the survey.

  • TPer_Vic1: This object includes information related to the perception of public security within the respondent’s geographic area, antisocial behaviors in their immediate surroundings, and changes in habits due to fear of becoming a crime victim in 2023. Additionally, it presents information regarding the performance of authorities and trust in them. In other words, this table covers Sections IV and V of the main questionnaire.

  • TPer_Vic2: This object consolidates information related to household and personal victimization, particularly focusing on high-impact crimes experienced by the household (such as kidnapping, enforced disappearance, or homicide) or by the selected respondent. This table includes Sections VI and VII of the main questionnaire.

  • TSDem: This object contains the socio-demographic characteristics of household members, corresponding to Section III of the main questionnaire.

  • TSVivienda: This object contains the characteristics of the households captured through the questionnaire cover page, as well as the count of household residents and the number of households. In other words, it includes Sections I and II of the main questionnaire.

Calculating crime prevalence rate per 100,000 thousand inhabitants by state, according to the victim’s sex

To estimate the crime prevalence rate, we use information from TPer_Vic2 (screening questions on victimization from the main questionnaire) and TMod_Vic (specific victimization module applied to the respondents who reported being a victim of at least one of the crimes included in the survey during the reference period). TPer_Vic2 will serve to characterize the total population and TMod_Vic to describe the victims and the associated crimes.

1. Specifying the ENVIPE survey design

ENVIPE follows a probabilistic, three-stage, stratified, and clustered sampling design. The sampling unit is the household, while the unit of analysis is the population aged 18 and older in selected households. More details on the sampling design can be found here(in Spanish).

From the documentation on data structure(in Spanish), the key variables for specifying the survey design are:

  • ID_PER: Person identifier

  • EST_DIS: Stratum

  • UPM_DIS: Primary sampling unit

  • FAC_ELE: Weighting factor for public security perception and victimization estimates.

We specify the survey design as follows:

# Identify victims from TMod_Vic, excluding
# "Wall graffiti, intentional vehicle damage, and other vandalism" (BPCOD != "03")
id_victims <- TMod_Vic |>
  filter(BPCOD != "03") |>
  pull(ID_PER)

# Create victimization variable in TPer_Vic2
TPer_Vic2 <- TPer_Vic2 |>
  mutate(victimization = ifelse(ID_PER %in% id_victims, 1, 0))

# Convert FAC_ELE (weights) to numeric
TPer_Vic2 <- TPer_Vic2 |>
  mutate(FAC_ELE = as.numeric(FAC_ELE))

# Label SEXO variable
TPer_Vic2 <- TPer_Vic2 |>
  mutate(SEXO = recode(SEXO, "1" = "Man", "2" = "Woman"))

# Create survey object
envipe_design <- TPer_Vic2 |>
  as_survey_design(
    weights = FAC_ELE, # Weights (inverse probability)
    strata = EST_DIS, # Stratum
    ids = UPM_DIS, # Primary sampling unit
    nest = TRUE # Nesting within strata
  )

2. Calculating point estimates and variablity indicators

To compute proportions from survey data, we use survey_mean() from {srvyr} package. From ENVIPE’s documentation, we know that NOM_ENT represents the state, while SEXO corresponds to the respondent’s sex.

envipe_design |>
  # `interact` function is used to calculate the proportion
  # over the interaction of state and sex
  group_by(interact(NOM_ENT, SEXO)) |>
  summarize(
    Proportion = survey_mean(victimization,
      level = 0.90, # Confidence level
      vartype = c("se", "ci", "cv") # Variability indicators
    )
  ) |>
  ungroup() |>
  mutate(
    `Prevalence rate` = Proportion * 100000, # Rate per 100,000 people
    `Standard error` = Proportion_se * 100000,
    `Low 90% CI` = Proportion_low * 100000,
    `Upper 90% CI` = Proportion_upp * 100000,
    `Coefficient of variation %` = Proportion_cv * 100,
    State = str_to_sentence(NOM_ENT) # Capitalize only the first letter
  ) |>
  rename(Sex = SEXO) |>
  select(
    State, Sex, `Prevalence rate`, `Standard error`,
    `Low 90% CI`, `Upper 90% CI`, `Coefficient of variation %`
  ) |>
  kbl() |> # HTML table
  kable_styling() # Apply bootstrap theme to the table

The output table looks like this:

State Sex Prevalence rate Standard error Low 90% CI Upper 90% CI Coefficient of variation %
Aguascalientes Man 33710.81 2172.7372 30136.71 37284.91 6.445224
Aguascalientes Woman 32041.57 1831.7081 29028.45 35054.68 5.716662
Baja California Man 23043.35 1604.9769 20403.20 25683.50 6.965032
Baja California Woman 24480.64 1536.1771 21953.66 27007.61 6.275070
Baja California Sur Man 19329.94 1244.1353 17283.36 21376.51 6.436314
Baja California Sur Woman 23817.58 1466.2963 21405.56 26229.60 6.156361
Campeche Man 21832.87 1455.8508 19438.03 24227.71 6.668161
Campeche Woman 23213.74 1381.0933 20941.88 25485.61 5.949465
Chiapas Man 16602.99 986.0052 14981.03 18224.94 5.938722
Chiapas Woman 12074.36 978.7862 10464.28 13684.44 8.106318
Chihuahua Man 22303.22 1234.5065 20272.49 24333.96 5.535103
Chihuahua Woman 22804.31 1181.0327 20861.54 24747.08 5.178990
Ciudad De Mexico Man 34562.66 1089.2714 32770.84 36354.49 3.151584
Ciudad De Mexico Woman 30746.76 990.3528 29117.66 32375.87 3.220999
Coahuila De Zaragoza Man 18179.70 1249.1422 16124.89 20234.51 6.871084
Coahuila De Zaragoza Woman 20447.19 1238.7111 18409.54 22484.84 6.058099
Colima Man 21106.85 1593.5970 18485.43 23728.28 7.550140
Colima Woman 21295.28 1351.1455 19072.68 23517.88 6.344813
Durango Man 15240.74 1133.1868 13376.67 17104.80 7.435250
Durango Woman 17401.18 1166.0910 15482.99 19319.37 6.701219
Guanajuato Man 20057.38 1428.0981 17708.20 22406.57 7.120063
Guanajuato Woman 18930.15 1287.0811 16812.93 21047.37 6.799107
Guerrero Man 16981.89 1494.6554 14523.22 19440.56 8.801466
Guerrero Woman 14572.44 1238.6234 12534.93 16609.94 8.499769
Hidalgo Man 22397.82 1338.0474 20196.77 24598.88 5.974006
Hidalgo Woman 20215.02 1157.0294 18311.73 22118.30 5.723613
Jalisco Man 24390.13 1548.7916 21842.40 26937.85 6.350076
Jalisco Woman 25141.90 1438.5170 22775.58 27508.23 5.721592
Mexico Man 32906.59 1632.3183 30221.47 35591.71 4.960460
Mexico Woman 33023.78 1483.3471 30583.71 35463.85 4.491755
Michoacan De Ocampo Man 14405.17 1031.1125 12709.01 16101.32 7.157934
Michoacan De Ocampo Woman 15485.20 1036.3320 13780.46 17189.94 6.692401
Morelos Man 26995.43 1517.6054 24499.00 29491.85 5.621713
Morelos Woman 24869.74 1688.5969 22092.04 27647.44 6.789765
Nayarit Man 16075.56 1516.6195 13580.76 18570.36 9.434318
Nayarit Woman 18639.61 1597.1842 16012.28 21266.94 8.568766
Nuevo Leon Man 23602.12 1261.9498 21526.25 25678.00 5.346764
Nuevo Leon Woman 21440.72 1211.6372 19447.60 23433.83 5.651104
Oaxaca Man 12940.56 1158.2366 11035.29 14845.83 8.950434
Oaxaca Woman 13548.61 1017.7020 11874.51 15222.70 7.511489
Puebla Man 26519.64 1342.0473 24312.00 28727.27 5.060579
Puebla Woman 24176.45 1124.4656 22326.73 26026.17 4.651078
Queretaro Man 30297.50 1620.6534 27631.57 32963.44 5.349132
Queretaro Woman 25746.31 1260.3524 23673.06 27819.56 4.895273
Quintana Roo Man 22592.04 1384.4250 20314.70 24869.39 6.127932
Quintana Roo Woman 21775.64 1237.8095 19739.48 23811.81 5.684376
San Luis Potosi Man 24019.02 1874.7438 20935.11 27102.93 7.805247
San Luis Potosi Woman 23227.97 1491.3111 20774.80 25681.14 6.420324
Sinaloa Man 21181.44 1122.0160 19335.76 23027.13 5.297165
Sinaloa Woman 20562.22 1086.2790 18775.32 22349.12 5.282887
Sonora Man 26864.66 1816.2677 23876.94 29852.37 6.760808
Sonora Woman 25742.16 1902.0264 22613.37 28870.95 7.388759
Tabasco Man 25881.72 1354.6212 23653.40 28110.04 5.233891
Tabasco Woman 25522.45 1091.8488 23726.38 27318.51 4.277994
Tamaulipas Man 17788.15 1075.8038 16018.47 19557.82 6.047869
Tamaulipas Woman 18368.67 1017.1639 16695.46 20041.88 5.537493
Tlaxcala Man 23968.33 1621.2198 21301.47 26635.20 6.764007
Tlaxcala Woman 22295.82 1473.2391 19872.38 24719.26 6.607692
Veracruz De Ignacio De La Llave Man 16365.05 1055.8247 14628.24 18101.86 6.451705
Veracruz De Ignacio De La Llave Woman 14712.45 947.9121 13153.15 16271.74 6.442927
Yucatan Man 20677.82 1239.0909 18639.55 22716.10 5.992366
Yucatan Woman 17805.62 1141.8291 15927.34 19683.90 6.412745
Zacatecas Man 17328.49 1346.8466 15112.96 19544.02 7.772440
Zacatecas Woman 15690.25 1335.9463 13492.65 17887.85 8.514502

Citation

BibTeX citation:
@online{torres munguía2025,
  author = {Torres Munguía, Juan Armando},
  title = {Using a Tidy Approach in {R} to Analyze {Mexico’s} 2024
    {ENVIPE}},
  date = {2025-01-27},
  langid = {en}
}
For attribution, please cite this work as:
Torres Munguía, Juan Armando. 2025. “Using a Tidy Approach in R to Analyze Mexico’s 2024 ENVIPE.” January 27, 2025.