Curating Training Data for Reliable Large-Scale Visual Data Analysis: Lessons from Identifying Trash in Street View Imagery

Jackelyn Hwang, Nima Dahir, Mayuka Sarukkai, Gabby Wright

DSEID: DSEID-001-3994759
DOI: 10.1177/00491241231171945
Journal: Sociological Methods & Research
Publisher: SAGE Publications
Published: 2023-8
Status: metadata_only

Abstract

Visual data have dramatically increased in quantity in the digital age, presenting new opportunities for social science research. However, the extensive time and labor costs to process and analyze these data with existing approaches limit their use. Computer vision methods hold promise but often require large and nonexistent training data to identify sociologically relevant variables. We present a cost-efficient method for curating training data that utilizes simple tasks and pairwise comparisons to interpret and analyze visual data at scale using computer vision. We apply our approach to the detection of trash levels across space and over time in millions of street-level images in three physically distinct US cities. By comparing to ratings produced in a controlled setting and utilizing computational methods, we demonstrate generally high reliability in the method and identify sources that limit it. Altogether, this approach expands how visual data can be used at a large scale in sociology.

Metadata is indexed. Open-access discovery has not completed for this record yet.

Publisher or DOI landing page

PDF

No local PDF is available.

GROBID Extracted text; discontinued.

This text is generated from TEI extraction for accessibility, search, and TTS. Formulas, tables, figures, page layout, and references may not perfectly match the original PDF.

No accessible text representation is available. The text extraction service has been discontinued for the time being. If you require this service, for accessibility or any other reason, please submit an issue/request on this page.

Metadata

Title: Curating Training Data for Reliable Large-Scale Visual Data Analysis: Lessons from Identifying Trash in Street View Imagery
Delta ID: DSEID-001-3994759
Authors: Jackelyn Hwang, Nima Dahir, Mayuka Sarukkai, Gabby Wright
Abstract source: crossref
Source URL: None
Access: closed_or_uncertain
Licence: unknown
PDF SHA-256
TEI SHA-256
GROBID

Issues

No public issues have been filed for this DOI.

Submit an issue

Record history

When	Event	Field	Old	New
2026-06-18 19:37:53.011249+00:00	identifier_assigned	DSEID		DSEID-001-3994759