
THE DATA CRITIQUE
Photo of the Los Angeles skyline with the Santa Monica Mountain range behind. Image taken from Californiathroughmylense.com.
The data set we selected, Crime Data from 2020 to Present, is the record published by the City of Los Angeles, updated by the Los Angeles Police Department. Its intended focus is on crime incidents reported in Los Angeles from 2020 to its last update in January 2026.
What Information is included in the dataset?
This dataset contains information about crimes reported in Los Angeles from 2021 to 2025. Each row, containing a single crime incident, includes information on various aspects of the crime. For example, it includes the date the crime was reported and the time and date it occurred, providing more temporal information. It also includes geographical information about the area where it occurred, based on LAPD’s designated geographic regions and their sub-areas (districts). It classifies the crime, provides the modus operandi (activities associated with the suspect in the commission of the crime), and includes victim information, such as age, sex, and descent. In terms of where the crime was committed, it includes the type of building, vehicle, or location where the crime occurred. The weapon used is also included. This dataset also includes the crime status, the crime code (how severe it is), additional crime codes (up to 4), the street address of the crime and the street across from it, and finally, the latitude and longitude of the crime.
For the sake of our analysis, we have decided to filter the dataset to 2021 and 2025. This felt vital for narrative purposes, as 2021 marks the first year of Biden’s administration, and 2025 marks the first year of Trump’s second term. We thought it would be interesting to compare how crime has varied and evolved across administrations to see if there is a correlation, while acknowledging that correlation does not imply causation.
What information, events, or phenomena can the dataset illuminate?
Looking at our dataset, it is clear that it will illuminate patterns and changes in crime behaviors between 2021 and 2025. Specifically, it allows for comparison of crime types, geographical locations, and frequencies across years. This is vital for situating these crimes within broader societal contexts that may not be immediately apparent. For instance, the aftermath of the COVID-19 pandemic, changes in the police force, and evolving economic conditions could all play a significant role in crime trends and individual behaviors.
It is also important to situate this dataset within the 2020 nationwide uprisings following the murder of George Floyd, which fundamentally challenged how crime is defined, who gets policed, and what role law enforcement plays in public safety. Analyzing crime data without acknowledging that backdrop risks reproducing the very frameworks those protests were contesting. By analyzing the patterns in the data, we can dig deeper into how larger social issues and shifting environments may affect people’s minds, behaviors, and crime habits, but only if we remain critical of the ideological assumptions embedded in how “crime” is counted in the first place.
What can the dataset not reveal?
This brings us to what the dataset cannot reveal. While the dataset will be useful for showing how structural factors may influence collective behavior, it cannot directly explain individual motivations, intentions, or the circumstances of each criminal in each incident. These crimes are represented individually as one-off events rather than direct results of social experiences. This makes it difficult to fully understand the complexity of the human behavior behind them.
It is also worth noting that the dataset does not capture unreported crimes. Research consistently shows that many crimes, particularly sexual violence, domestic abuse, and crimes committed against undocumented immigrants, unhoused individuals, or those with prior records, go unreported due to fear of retaliation, distrust of law enforcement, or past experiences of police harm. In communities where policing has historically been a source of trauma rather than safety, the gap between actual harm and officially recorded crime is likely substantial. This means the dataset reflects not the full landscape of harm in Los Angeles, but rather what the LAPD chose to record.
Additionally, the dataset uses crime severity codes to classify incidents, but it is worth asking: severity by whose standards? These classifications reflect legal categories and institutional priorities that have historically over-policed communities of color for low-level infractions while undercharging those with more power and resources for harms with far broader social consequences, such as environmental violations, wage theft, or financial fraud. Working within these categories means working within a system of classification that carries real ideological weight.
In terms of location, the addresses of each crime are rounded to the nearest hundred blocks, and cross-street information is partially omitted. Therefore, the dataset cannot reveal the precise house, business, or exact spot where an incident occurred. This lack of precision limits the analysis of place dynamics and more fine-tuned research into how crime manifests within individual neighborhoods.
There are also no police report narratives, body cam summaries, witness statements, or contextual explanations. Without the qualitative context behind these reports, we cannot know why the event happened, what led up to it, or the personal circumstances. Moreover, the information on the suspect is extremely limited. As a result, we cannot identify the victims, suspects, or officers involved in the incidents. It is worth noting, though, that this omission is not purely an oversight. Including identifiable suspect data could compromise ongoing legal proceedings, infringe on the privacy and due process rights of individuals who have not been convicted, and put the safety of victims and their families at risk. The dataset cannot then be used to assess accountability or responsibility beyond basic facts, but that limitation also reflects real considerations around harm and privacy.
How was the data generated?
The data was generated from various crime reports recorded by the Los Angeles Police Department between 2020 and the present. They were collected through the LAPD’s internal records management system, a vital part of law enforcement operations. The information has then been standardized, stored in a CSV file, and officially published online through the Los Angeles data portal for public access. The original sources are the official LAPD crime reports. These sources are important because they are created with more institutional and administrative purposes rather than for the sake of a narrative. As a result, the data reflect operational efficiency rather than contextual detail, limiting analysis to the human and social dimensions of crime in Los Angeles.
Who or what organization funded the creation of the dataset?
Based on our understanding, no external funder is listed for the creation of this dataset. All evidence indicates that the dataset was created, maintained, and published internally by the Los Angeles Police Department through its LAPD Open Data division. Because law enforcement is a publicly funded organization, the LAPD’s data is shaped by government policy on accountability, transparency, and personal agendas. This is important to understand the type of data collected, why it is categorized the way it is, and most importantly, what information is purposely included or excluded from the dataset. It is also worth recognizing that what gets recorded, how it is categorized, and what gets left out are not neutral choices. They reflect an organization’s understanding of its own role, one that, especially in the aftermath of 2020, has been actively contested by communities, scholars, and activists.
What information is left out of the spreadsheet?
Overall, one of the most significant omissions of the dataset is the lack of demographic information about the suspects. The dataset includes characteristics such as victims’ age and sex, but it provides minimal details about the accused. This limits our ability to analyze broader social patterns and systemic factors related to the crime and the potential “why” behind it. Additionally, there is no information about what the criminal was charged with or any legal outcomes. Without this, it is impossible to assess whether similarly situated individuals received similar treatment, which would be essential to any meaningful analysis of systemic bias in prosecution. The lack of information on each suspect makes it impossible to determine whether there are variations in charges among those who committed crimes of the same severity.
Finally, the dataset lacks information on whether the crime was committed by multiple people or just one. Without this information, it is challenging to capture the full reason the crime was committed.
Perhaps most significantly, certain categories of harm are not represented as “crime” in this dataset at all. White-collar crime, corporate negligence, and state violence, including excessive use of force by police, rarely appear in datasets like this, even when they cause profound and widespread harm. By working with this data, we are implicitly working within a definition of crime shaped by institutional power, and our analysis should be mindful of that framing rather than take it as given.