library(tidyverse) # Tidy data analysis framework
library(srvyr) # Tidy syntax for survey data analysis
library(archive) # Extract files from a ZIP archive
library(kableExtra) # Create HTML table
Using a tidy approach in R to analyze Mexico’s 2024 ENVIPE
Part 1: Estimating crime prevalence with the {srvyr} package
Overview
In this article, I will guide you through the process of analyzing Mexico’s 2024 National Survey of Victimization and Perception of Public Safety (Encuesta Nacional de Victimización y Percepción sobre Seguridad Pública, ENVIPE) using the {tidyverse} framework for survey analysis with the {srvyr} package. Specifically, we will calculate the crime prevalence rate per 100,000 inhabitants by state, according to the victim’s sex ( Tasa de prevalencia delictiva por entidad federativa por cada cien mil habitantes, según sexo de la víctima).
Official tables with this indicator can be found here.
By the end of this article, you will be able to use the {tidyverse} to:
- Load the survey microdata from INEGI’s website.
- Specify the ENVIPE survey design.
- Calculate crime prevalence indicators.
About the ENVIPE
ENVIPE is an official survey that provides nationally and state-representative information on household crime victimization and perceptions of public safety in Mexico. Conducted annually since 2011 by the National Institute of Statistics and Geography (INEGI), ENVIPE collects data on the following crimes:
- Total vehicle theft (car, pickup truck, or truck).
- Theft of vehicle accessories, spare parts, or tools (car, pickup truck, or truck).
- Wall graffiti, intentional vehicle damage, and other types of vandalism.
- Burglary or attempted burglary.
- Kidnapping.
- Enforced disappearance.
- Homicide.
- Robbery or assault on the street or public transportation.
- Theft in a manner different from the above.
- Bank fraud.
- Consumer fraud.
- Extortion.
- Threats.
- Injuries.
- Kidnapping.
- Harassment or intimidation.
- Groping.
- Exhibitionism.
- Attempted rape.
- Sexual rape.
- Other crimes.
All survey-related information, including questionnaires, methodology, glossaries, summaries, and microdata, is fully available at INEGI’s website.
Set-up
First, install and load the required packages:
Loading survey microdata from INEGI
Microdata is available here in various formats (CSV, DBF, DTA, SAV, and RData). In this post, we will use the RData files.
load(archive_read(
archive = "https://www.inegi.org.mx/contenidos/programas/envipe/2024/microdatos/bd_envipe_2024_RData.zip",
file = "BD_ENVIPE_2024.RData"
))
Using the {archive} package, we can directly extract the required file from bd_envipe_2024_RData.zip
.
This loads the following objects into the global environment:
THogar
: This object contains the interview result, the row of the selected respondent, the household factor, and other household characteristics.TMod_Vic
: This object includes information on the crimes suffered by the selected respondent and their household during the reference year, specifically during 2023. Information comes from the victimization module of the survey.TPer_Vic1
: This object includes information related to the perception of public security within the respondent’s geographic area, antisocial behaviors in their immediate surroundings, and changes in habits due to fear of becoming a crime victim in 2023. Additionally, it presents information regarding the performance of authorities and trust in them. In other words, this table covers Sections IV and V of the main questionnaire.TPer_Vic2
: This object consolidates information related to household and personal victimization, particularly focusing on high-impact crimes experienced by the household (such as kidnapping, enforced disappearance, or homicide) or by the selected respondent. This table includes Sections VI and VII of the main questionnaire.TSDem
: This object contains the socio-demographic characteristics of household members, corresponding to Section III of the main questionnaire.TSVivienda
: This object contains the characteristics of the households captured through the questionnaire cover page, as well as the count of household residents and the number of households. In other words, it includes Sections I and II of the main questionnaire.
Calculating crime prevalence rate per 100,000 thousand inhabitants by state, according to the victim’s sex
To estimate the crime prevalence rate, we use information from TPer_Vic2
(screening questions on victimization from the main questionnaire) and TMod_Vic
(specific victimization module applied to the respondents who reported being a victim of at least one of the crimes included in the survey during the reference period). TPer_Vic2
will serve to characterize the total population and TMod_Vic
to describe the victims and the associated crimes.
1. Specifying the ENVIPE survey design
ENVIPE follows a probabilistic, three-stage, stratified, and clustered sampling design. The sampling unit is the household, while the unit of analysis is the population aged 18 and older in selected households. More details on the sampling design can be found here(in Spanish).
From the documentation on data structure(in Spanish), the key variables for specifying the survey design are:
ID_PER
: Person identifierEST_DIS
: StratumUPM_DIS
: Primary sampling unitFAC_ELE
: Weighting factor for public security perception and victimization estimates.
We specify the survey design as follows:
# Identify victims from TMod_Vic, excluding
# "Wall graffiti, intentional vehicle damage, and other vandalism" (BPCOD != "03")
<- TMod_Vic |>
id_victims filter(BPCOD != "03") |>
pull(ID_PER)
# Create victimization variable in TPer_Vic2
<- TPer_Vic2 |>
TPer_Vic2 mutate(victimization = ifelse(ID_PER %in% id_victims, 1, 0))
# Convert FAC_ELE (weights) to numeric
<- TPer_Vic2 |>
TPer_Vic2 mutate(FAC_ELE = as.numeric(FAC_ELE))
# Label SEXO variable
<- TPer_Vic2 |>
TPer_Vic2 mutate(SEXO = recode(SEXO, "1" = "Man", "2" = "Woman"))
# Create survey object
<- TPer_Vic2 |>
envipe_design as_survey_design(
weights = FAC_ELE, # Weights (inverse probability)
strata = EST_DIS, # Stratum
ids = UPM_DIS, # Primary sampling unit
nest = TRUE # Nesting within strata
)
2. Calculating point estimates and variablity indicators
To compute proportions from survey data, we use survey_mean()
from {srvyr} package. From ENVIPE’s documentation, we know that NOM_ENT
represents the state, while SEXO
corresponds to the respondent’s sex.
|>
envipe_design # `interact` function is used to calculate the proportion
# over the interaction of state and sex
group_by(interact(NOM_ENT, SEXO)) |>
summarize(
Proportion = survey_mean(victimization,
level = 0.90, # Confidence level
vartype = c("se", "ci", "cv") # Variability indicators
)|>
) ungroup() |>
mutate(
`Prevalence rate` = Proportion * 100000, # Rate per 100,000 people
`Standard error` = Proportion_se * 100000,
`Low 90% CI` = Proportion_low * 100000,
`Upper 90% CI` = Proportion_upp * 100000,
`Coefficient of variation %` = Proportion_cv * 100,
State = str_to_sentence(NOM_ENT) # Capitalize only the first letter
|>
) rename(Sex = SEXO) |>
select(
`Prevalence rate`, `Standard error`,
State, Sex, `Low 90% CI`, `Upper 90% CI`, `Coefficient of variation %`
|>
) kbl() |> # HTML table
kable_styling() # Apply bootstrap theme to the table
The output table looks like this:
State | Sex | Prevalence rate | Standard error | Low 90% CI | Upper 90% CI | Coefficient of variation % |
---|---|---|---|---|---|---|
Aguascalientes | Man | 33710.81 | 2172.7372 | 30136.71 | 37284.91 | 6.445224 |
Aguascalientes | Woman | 32041.57 | 1831.7081 | 29028.45 | 35054.68 | 5.716662 |
Baja California | Man | 23043.35 | 1604.9769 | 20403.20 | 25683.50 | 6.965032 |
Baja California | Woman | 24480.64 | 1536.1771 | 21953.66 | 27007.61 | 6.275070 |
Baja California Sur | Man | 19329.94 | 1244.1353 | 17283.36 | 21376.51 | 6.436314 |
Baja California Sur | Woman | 23817.58 | 1466.2963 | 21405.56 | 26229.60 | 6.156361 |
Campeche | Man | 21832.87 | 1455.8508 | 19438.03 | 24227.71 | 6.668161 |
Campeche | Woman | 23213.74 | 1381.0933 | 20941.88 | 25485.61 | 5.949465 |
Chiapas | Man | 16602.99 | 986.0052 | 14981.03 | 18224.94 | 5.938722 |
Chiapas | Woman | 12074.36 | 978.7862 | 10464.28 | 13684.44 | 8.106318 |
Chihuahua | Man | 22303.22 | 1234.5065 | 20272.49 | 24333.96 | 5.535103 |
Chihuahua | Woman | 22804.31 | 1181.0327 | 20861.54 | 24747.08 | 5.178990 |
Ciudad De Mexico | Man | 34562.66 | 1089.2714 | 32770.84 | 36354.49 | 3.151584 |
Ciudad De Mexico | Woman | 30746.76 | 990.3528 | 29117.66 | 32375.87 | 3.220999 |
Coahuila De Zaragoza | Man | 18179.70 | 1249.1422 | 16124.89 | 20234.51 | 6.871084 |
Coahuila De Zaragoza | Woman | 20447.19 | 1238.7111 | 18409.54 | 22484.84 | 6.058099 |
Colima | Man | 21106.85 | 1593.5970 | 18485.43 | 23728.28 | 7.550140 |
Colima | Woman | 21295.28 | 1351.1455 | 19072.68 | 23517.88 | 6.344813 |
Durango | Man | 15240.74 | 1133.1868 | 13376.67 | 17104.80 | 7.435250 |
Durango | Woman | 17401.18 | 1166.0910 | 15482.99 | 19319.37 | 6.701219 |
Guanajuato | Man | 20057.38 | 1428.0981 | 17708.20 | 22406.57 | 7.120063 |
Guanajuato | Woman | 18930.15 | 1287.0811 | 16812.93 | 21047.37 | 6.799107 |
Guerrero | Man | 16981.89 | 1494.6554 | 14523.22 | 19440.56 | 8.801466 |
Guerrero | Woman | 14572.44 | 1238.6234 | 12534.93 | 16609.94 | 8.499769 |
Hidalgo | Man | 22397.82 | 1338.0474 | 20196.77 | 24598.88 | 5.974006 |
Hidalgo | Woman | 20215.02 | 1157.0294 | 18311.73 | 22118.30 | 5.723613 |
Jalisco | Man | 24390.13 | 1548.7916 | 21842.40 | 26937.85 | 6.350076 |
Jalisco | Woman | 25141.90 | 1438.5170 | 22775.58 | 27508.23 | 5.721592 |
Mexico | Man | 32906.59 | 1632.3183 | 30221.47 | 35591.71 | 4.960460 |
Mexico | Woman | 33023.78 | 1483.3471 | 30583.71 | 35463.85 | 4.491755 |
Michoacan De Ocampo | Man | 14405.17 | 1031.1125 | 12709.01 | 16101.32 | 7.157934 |
Michoacan De Ocampo | Woman | 15485.20 | 1036.3320 | 13780.46 | 17189.94 | 6.692401 |
Morelos | Man | 26995.43 | 1517.6054 | 24499.00 | 29491.85 | 5.621713 |
Morelos | Woman | 24869.74 | 1688.5969 | 22092.04 | 27647.44 | 6.789765 |
Nayarit | Man | 16075.56 | 1516.6195 | 13580.76 | 18570.36 | 9.434318 |
Nayarit | Woman | 18639.61 | 1597.1842 | 16012.28 | 21266.94 | 8.568766 |
Nuevo Leon | Man | 23602.12 | 1261.9498 | 21526.25 | 25678.00 | 5.346764 |
Nuevo Leon | Woman | 21440.72 | 1211.6372 | 19447.60 | 23433.83 | 5.651104 |
Oaxaca | Man | 12940.56 | 1158.2366 | 11035.29 | 14845.83 | 8.950434 |
Oaxaca | Woman | 13548.61 | 1017.7020 | 11874.51 | 15222.70 | 7.511489 |
Puebla | Man | 26519.64 | 1342.0473 | 24312.00 | 28727.27 | 5.060579 |
Puebla | Woman | 24176.45 | 1124.4656 | 22326.73 | 26026.17 | 4.651078 |
Queretaro | Man | 30297.50 | 1620.6534 | 27631.57 | 32963.44 | 5.349132 |
Queretaro | Woman | 25746.31 | 1260.3524 | 23673.06 | 27819.56 | 4.895273 |
Quintana Roo | Man | 22592.04 | 1384.4250 | 20314.70 | 24869.39 | 6.127932 |
Quintana Roo | Woman | 21775.64 | 1237.8095 | 19739.48 | 23811.81 | 5.684376 |
San Luis Potosi | Man | 24019.02 | 1874.7438 | 20935.11 | 27102.93 | 7.805247 |
San Luis Potosi | Woman | 23227.97 | 1491.3111 | 20774.80 | 25681.14 | 6.420324 |
Sinaloa | Man | 21181.44 | 1122.0160 | 19335.76 | 23027.13 | 5.297165 |
Sinaloa | Woman | 20562.22 | 1086.2790 | 18775.32 | 22349.12 | 5.282887 |
Sonora | Man | 26864.66 | 1816.2677 | 23876.94 | 29852.37 | 6.760808 |
Sonora | Woman | 25742.16 | 1902.0264 | 22613.37 | 28870.95 | 7.388759 |
Tabasco | Man | 25881.72 | 1354.6212 | 23653.40 | 28110.04 | 5.233891 |
Tabasco | Woman | 25522.45 | 1091.8488 | 23726.38 | 27318.51 | 4.277994 |
Tamaulipas | Man | 17788.15 | 1075.8038 | 16018.47 | 19557.82 | 6.047869 |
Tamaulipas | Woman | 18368.67 | 1017.1639 | 16695.46 | 20041.88 | 5.537493 |
Tlaxcala | Man | 23968.33 | 1621.2198 | 21301.47 | 26635.20 | 6.764007 |
Tlaxcala | Woman | 22295.82 | 1473.2391 | 19872.38 | 24719.26 | 6.607692 |
Veracruz De Ignacio De La Llave | Man | 16365.05 | 1055.8247 | 14628.24 | 18101.86 | 6.451705 |
Veracruz De Ignacio De La Llave | Woman | 14712.45 | 947.9121 | 13153.15 | 16271.74 | 6.442927 |
Yucatan | Man | 20677.82 | 1239.0909 | 18639.55 | 22716.10 | 5.992366 |
Yucatan | Woman | 17805.62 | 1141.8291 | 15927.34 | 19683.90 | 6.412745 |
Zacatecas | Man | 17328.49 | 1346.8466 | 15112.96 | 19544.02 | 7.772440 |
Zacatecas | Woman | 15690.25 | 1335.9463 | 13492.65 | 17887.85 | 8.514502 |
Citation
@online{torres munguía2025,
author = {Torres Munguía, Juan Armando},
title = {Using a Tidy Approach in {R} to Analyze {Mexico’s} 2024
{ENVIPE}},
date = {2025-01-27},
langid = {en}
}