Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945–2019
Abstract
Sociologists are discussing the need for more formal ways to extract meaning from digital text archives. We focus attention on the seeded topic model, a semi-supervised extension to the standard topic model that allows sociological knowledge to be infused into the computational learning of meaning structures. Seed words help crystallize topics around known concepts, while utilizing topic models’ functionality to identify associations in text based on word co-occurrences. The method estimates a concept’s shared interpretation (or framing) via its associations with other frequently co-occurring topics. In a case study, we extract longitudinal measures of media frames regarding immigration from a vast corpus of millions of Swedish newspaper articles from the period 1945–2019. We infer turning points that partition the immigration discourse into meaningful eras and locate Sweden’s era of multicultural ideals that coined its tolerant reputation.
GROBID Extracted text; discontinued.
This text is generated from TEI extraction for accessibility, search, and TTS. Formulas, tables, figures, page layout, and references may not perfectly match the original PDF.
Extracted abstract
Sociologists are discussing the need for more formal ways to extract meaning from digital text archives. We focus attention on the seeded topic model, a semi-supervised extension to the standard topic model that allows sociological knowledge to be infused into the computational learning of meaning structures. Seed words help crystallize topics around known concepts, while utilizing topic models' functionality to identify associations in text based on word co-occurrences. The method estimates a concept's shared interpretation (or framing) via its associations with other frequently co-occurring topics. In a case study, we extract longitudinal measures of media frames regarding immigration from a vast corpus of millions of Swedish newspaper articles from the period 1945-2019. We infer turning points that partition the immigration discourse into meaningful eras and locate Sweden's era of multicultural ideals that coined its tolerant reputation.
Introduction
In recent years, an increasing number of sociologists have embraced machine learning algorithms to infer latent patterns in text data (e.g., DiMaggio, Nag, and Blei, 2013; Mohr et al., 2013; Rule, Cointet, and Bearman, 2015; Bail, 2016; Nelson, 2020 Nelson, , 2021a,b;,b; Kozlowski, Taddy, and Evans, 2019; Goldenstein and Poschmann, 2019; Bail, Brown, and Mann, 2017; Karell and Freedman, 2019; Wu, Wang, and Evans, 2019; Bohr, 2020; Taylor and Stoltz, 2020; Stoltz and Taylor, 2021; Arseniev-Koehler et al., 2022; Bonikowski, Luo, and Stuhler, 2022; Boutyline, Arseniev-Koehler, and Cornell, 2023; Best and Arseniev-Koehler, 2023) . One suite of algorithms, unsupervised topic models (Blei, Ng, and Jordan, 2003; Griffiths and Steyvers, 2004; Blei, 2012) , infers linguistic themes based on word co-occurrences. Topic models have been found to resonate well with sociological ideas about how people create meaning and make sense of the social world by linking themes to other concepts and ideas (DiMaggio, Nag, and Blei, 2013; Mohr et al., 2013; Törnberg and Törnberg, 2016; Fligstein, Stuart Brundage, and Schultz, 2017; Nelson, 2020) . This article addresses a central limitation of topic models: while they are suited to inductive research that identifies emergent themes from document collections, they fare poorly at identifying, in transparent and replicable ways, specific concepts predefined by the researcher. Topic models, and unsupervised methods more generally, rely on post hoc analysis to make sense of the output in light of sociological theory, opening up an old rift between inductive and deductive research within the discipline. As computational text analysis has matured as a methodology in the sociological toolkit, calls have been made for an important next step: to move beyond the implementation of standard models and to strive to apply specialized models that are more transparent, replicable, theory-driven, and interpretable, and thus more attuned to the central demands of social science research (DiMaggio, 2015; Nelson, 2019; Mohr et al., 2020; Pääkkönen and Ylikoski, 2021; Nelson, 2021b; Grimmer, Roberts, and Stewart, 2022; Bonikowski and Nelson, 2022) .
We contribute further to this debate and argue for the use of semisupervised text analysis. We focus on the seeded (or constrained) topic model (Arora, Ge, and Moitra, 2012; Jagarlamudi, Daumé, and Udupa, 2012; Watanabe, Xuan-Hieu, and Watanabe, 2022) , which combines the original model's unsupervised nature with sociological domain knowledge. 1 In contrast to other topic model extensions commonly used by social scientists, such as the structural topic model (Roberts et al., 2014) that utilizes document-level covariates to interpret model results in light of theory, the seeded topic model creates an informative dimension reduction of the corpus. In practice, scholars often want to take advantage of the exploratory capabilities of topic models, while also hoping that the models will capture themes that are presumed a priori to exist in a corpus. Our proposed approach makes it possible to achieve both objectives by seeding certain topics, while letting other topics emerge inductively, combining the inductive power of topic models with some degree of researcher supervision. Here lies an important advantage over the deterministic use of dictionary approaches to measure predefined concepts in that seeding helps crystallize topics of interest but it allows for imperfect knowledge of the topics before running the model.
The seeding crystallizes topics around predefined words that describe themes of interest. We use the term "topics" to refer to model output, and we use "themes," "issues," and "frames" when referring to theoretical concepts. Seed words require researchers to be explicit about how a concept is operationalized, and seeding is one way to constrain the model to search for specific themes of interest. Seeding can also increase the robustness of computational text analysis to language change, an endemic challenge when analyzing text archives of historical timescales (Bearman, 2015; Rule, Cointet, and Bearman, 2015; Voyer et al., 2022; Bonikowski, Luo, and Stuhler, 2022) . By identifying associations between a focal topic and other topics with which it frequently co-occurs, the model can detect widely shared interpretations (or frames) associated with the theme in question. These model features provide an attractive complement to the mixedmethods approaches (e.g., DiMaggio, 2015; Karell and Freedman, 2019; Nelson, 2019 Nelson, , 2020) ) that are currently being discussed as a way of bringing computational text analysis into sociological research.
One strength of the topic model approach is to allow for words' mixed memberships in topics. Our use of the seeded topic model, however, aims at measuring clearly defined and interpretable topics, which we will achieve by using seed words that we believe to have a single, very clear meaning. Seeding will work less well if one starts from polysemic words, i.e., words with multiple meanings, or if one tries to seed a polysemic topic altogether. While the words associated with the seeded words within a given topic are also allowed to emerge from the data, forced monosemy is a limitation of our approach that will hinder its applicability to certain use cases.
Seeded topic models have been around for a decade and have more recently become available in general-purpose programming languages such as R (Watanabe, Xuan-Hieu, and Watanabe, 2022) and Python (Anoop and Asharaf, 2017) . However, strong computational requirements and limitations in the scalability of off-the-shelf implementations (Lu et al., 2011; Jagarlamudi, Daumé, and Udupa, 2012; Fan, Doshi-Velez, and Miratrix, 2019; Eshima, Imai, and Sasaki, 2024; Watanabe and Baturo, 2024) have hampered their application in sociology. We discuss a scalable implementation for big text data (Magnusson et al., 2018) that removes previous bottlenecks and that we hope will make the algorithm attractive to a broader sociological audience. We illustrate the method using an important case study that measures the ways the media have framed immigration in a Swedish newspaper corpus spanning 75 years. The corpus, one of the most extensive ever analyzed in the social sciences, contains 30 million text blocks from more than 100,000 editions of the country's four national newspapers from the period 1945-2019.
Our study connects to a long tradition of sociological research studying newspaper discourses (e.g., Gamson and Modigliani, 1989; Marx Ferree, 2003; Koopmans and Olzak, 2004; Fiss and Hirsch, 2005; Janssen, Kuipers, and Verboord, 2008; Bail, 2012; Shor et al., 2015) . Previous immigration-related research has relied on corpora comprising between a few thousand and 130,000 articles, which have typically been assembled using keyword searches, and which have spanned time frames of between 1 and 14 years (Helbling, 2014; Lawlor and Tolley, 2017; Greussing and Boomgaarden, 2017; Heidenreich et al., 2019; Czymara and van Klingeren, 2022) . The largest studies to date have included 850,000 articles in six European languages (Eberl and Galyga, 2021) and 850,000 immigration-related headlines from UK newspapers (Bleich and van der Veen, 2021) . Compared to past snap-shot corpora, our data are vast and-in combination with a scalable algorithm-permit a finegrained mapping of the newspaper discourse on immigration over 75 years.
Using the corpus described above, we map how shared interpretations of immigration have evolved over time. We operationalize interpretative media frames as associations between a focal topic and other topics, estimating the co-occurrence patterns of predefined themes (combining "immigration" with, e.g., "the economy," "culture," or "security"). Issues that frequently co-occur with the focal topic represent prominent logics for the topic's interpretation. Through the ways journalists curate and present the news flow, the media frames that we measure in this study establish a shared context of meaning-making (Scheufele, 1999; Fiss and Hirsch, 2005; Chong and Druckman, 2017; Lizardo, 2021) , placing events, people, and ideas into a wider context of interpretability (Strauss and Quinn, 1997; DiMaggio, 1997; Cerulo, Leschziner, and Shepherd, 2021; Arseniev-Koehler and Foster, 2022) .
Since we estimate changes in cultural associations and delineate periods during which associations measurably differed, our computational approach adds scale to the qualitative analysis of "turning points" in collective meaning-making (Sewell, 1996; Abbott, 1997 Abbott, , 2001;; Wagner-Pacifici, 2017) . It further lends a broader empirical foundation to the casing of timelines than the narrative accounts usually heralded in the historical social sciences (Ermakoff, 2019; Griffin, 1992; Bearman, Faris, and Moody, 1999) .
In the following, we provide a brief primer on frames of interpretation and turning points in media discourse, and we introduce the Swedish case study in relation to earlier large-scale studies of newspaper content. We then turn to the method itself and describe its implementation as a means of estimating predefined topics and their relations to one another over time. We present results for the Swedish newspaper corpus that highlight the interpretability of model outputs. In the concluding section, we discuss our insights into the Swedish media coverage of immigration over the past 75 years, and we ponder the degree to which text measures, drawn for example from the mainstream media as in our case, provide social sensors that can help us learn about trends in contemporary societies.
Frames and Turning Points
Frames concern how information is conveyed in communication, and how specific interpretations are promoted by relating one concept to other concepts, thereby linking new information to existing ideas and previous experiences (Gamson and Modigliani, 1989; Entman, 1993; Scheufele, 2000; Rawlings and Childress, 2021) . As such, frames are "interpretive packages" (Gamson and Modigliani, 1989 ) that evoke particular perspectives and problem definitions through which objects in the social world can be seen and understood (Weaver, 2007; Gamson, 1992; Benford and Snow, 2000) . Immigration, for example, might be interpreted, among others, through a security frame or an economic frame. Individuals may have opposing opinions on immigration (e.g., "immigrants provide necessary labor" and "immigrants take our jobs"), but they can still agree to interpret immigration through a similar lens (e.g., the economy). Taken together, frames provide the cognitive contexts that speak to and activate the learned categories of individuals' cognition (Lizardo, 2017; Wood et al., 2018; Hunzaker and Valentino, 2019; Cerulo, Leschziner, and Shepherd, 2021) , and they organize cognition at a higher order of abstraction than do opinions, attitudes, or values (DiMaggio, 1997; Goldberg, 2011; Mohr et al., 2020) .
In our application, we focus on how immigration has been framed in national news media, exploring the interpretations of immigration formulated by journalists and editors. In line with the idea that an interpretative frame can be viewed as an associative pattern, we operationalize media frames as associations between a focal theme and other topics. Media frames that frequently co-occur with the focal issue represent prominent logics for the issue's interpretation. For example, one frame may connect immigration with issues of religion in order to highlight the cultural differences between natives and migrants, while another may connect immigration with party politics in order to promote a politicized perspective on immigration. The composition of salient frames at a certain time point aggregates into what we refer to as the shared interpretation of immigration communicated by the media.
In contests over sovereignty in interpretation (Swidler, 1986; Gamson, 1992; Benford and Snow, 2000) , entrepreneurs of meaning-such as governments, political parties, advocacy groups, and media outlets themselves-are keen to obtain ownership of salient issues and to influence their shared interpretations (Andrews and Caren, 2010; Quinsaat, 2014; Tsur, Calacci, and Lazer, 2015; Farrell, 2016; Bail, Brown, and Mann, 2017) . But how do publicly available interpretations change? Influential social science theorizing refers to "turning points" that constitute breaks with routine practices of meaning-making (Sewell, 1996; Abbott, 1997; Wagner-Pacifici, 2017) . Turning points take shape in "unsettled times" (Swidler, 1986) or "periods of rupture" (Wagner-Pacifici, 2017) in which sequences of events occur that imply thresholds and shifts that are recognizable to contemporaries. In retrospect, we give names to these ruptures because they bring with them a series of occurrences that challenge established interpretations and "durably transforms previous structures and practices" (Sewell, 1996) .
We use the concept of turning points that are grounded in, and operative on, publicly available interpretations to partition Sweden's immigration discourse into recognizable eras. We estimate annual salience shifts in the composition of dominant frames over time to identify breakpoints in the media's framing of immigration and to parse discursive periods during which meaning-making measurably differed.
The Swedish Newspaper Corpus in Context
The Swedish Newspaper Corpus 1945-2019, digitized by the National Library of Sweden (Börjeson et al., 2023) , contains 75 years of journalistic content from the country's four largest newspapers Aftonbladet, Dagens Nyheter, Expressen, and Svenska Dagbladet. The corpus allows for a macroscopic analysis of the Swedish migration discourse as reflected in the mainstream media, dating back to the time when mass immigration to Sweden started. Sweden entered Europe's post-war reconstruction period as a neutral country without an influential colonial history and with an ethnically homogeneous population of 6.6 million. In the decades that followed, Sweden received labor migrants and, increasingly, refugees at an average annual rate of 0.6% of the population (Statistics Sweden, 2024) . Figure 1A shows the number of immigrants arriving in Sweden during the observation period. Today, 20% of the 10.3 million Swedes are foreign born (Statistics Sweden, 2022) .
The news articles we study represent a broad mixture of different formats and political orientations (see Table 1 ). Newspapers divide their content into multiple stand-alone sections, e.g., op-eds, domestic politics, world news, culture, sports, and TV listings. We restrict our analysis to the front sections of each newspaper. We believe these sections contribute most to meaningmaking in newspapers. Using the front sections leaves us with 29.3 million documents and 1.6 billion words after removing rare words and documents shorter than 15 words. The corpus consists of text blocks, i.e., units of cohesive text identified in the segmentation procedure during digitization. The segmentation relies on a rule-based approach curated by the Swedish National Library (using the software Zissor with ABBYY as the optical character recognition engine); there are different segmentation rules for each newspaper that are updated when newspaper layouts change (Dannélls, Johansson, and Björk, 2019) . We use each text block as a document. Previous research (Hurtado Bodell, Magnusson, and Mützel, 2022) has shown that an article is commonly captured by multiple text blocks and, importantly, that only 16% of text blocks contain content from more than one article. See Supplemental Material Section S1 in the Appendix for more details on corpus creation. By comparison with earlier computational studies of archival text that have described national conversations based on sets of political speeches (Rule, Cointet, and Bearman, 2015; Barron et al., 2018; Fuhse et al., 2020; Card et al., 2022) , the extreme breadth of the newspaper archive (106,000 daily issues in total) permits us to focus on the national conversation about one particular issue, immigration, with high granularity. In relation to the newspaper corpora studied in prior immigration-related research, our data set is much larger. As a comparison, Eberl and Galyga (2021) searched for manually selected keywords and found an increase in the attention focused on immigration during the period of the 2015 European "refugee crisis" in Germany, Hungary, Poland, Spain, Sweden, and the UK (102,000 articles, 2003-2017) ; with regard to Sweden, the study reported that most found keywords centered around security and welfare issues. Greussing and Boomgaarden (2017) used principal component analysis to analyze data based on 89 predefined immigration-related words in Austrian newspapers (10,000 articles, 2015) and found media frames focused on security and economic issues. A lexicon-based sentiment analysis of immigration-related headlines in 850,000 articles from UK newspapers (2001) (2002) (2003) (2004) (2005) (2006) (2007) (2008) (2009) (2010) (2011) (2012) found negative connotations and problem frames, particularly in news reporting on Muslim immigrants (Bleich and van der Veen, 2021) . Based on the original topic modeling framework, Heidenreich et al. (2019) explored framing during the "refugee crisis" in 24 newspapers from Germany, Hungary, Spain, Sweden, and the UK (130,000 articles, 2015-2016), and found a stronger humanitarian framing of immigration in Sweden than in the other European countries. Using a structural topic model, Czymara and van Klingeren (2022) showed that, in Germany, print reporting perpetuated a more diverse set of frames of the "refugee crisis" than online reporting (32,000 articles, 2015-2017).
While they have been innovative and carefully implemented, previous topic-model studies have relied exclusively on an inductive operationalization of meaningful frames that were detected as topics in articles identified as having a focus on immigration based on a keyword search. The inferred topics, and the sociological concepts they may represent, have been interpreted post hoc, after seeing the model outputs. In this article, we argue that this practice invites researchers to adapt the boundaries of theoretical constructs on the basis of model outputs rather than on what is suggested by theory. Because topics inferred by unsupervised topic models differ each time a model is estimated, this could create a situation in which the conceptualization of a theoretical construct changes with each model run. In our use case, a topic model may capture different aspects of the "immigration discourse" with each re-run. The use of seed words to anchor an immigration topic stabilizes inferences across model estimations. As we explain in the next section, the seeded topic model improves both replicability and interpretability and combines improvements in transparency with a more theoretically informed approach to detecting topics and topical associations.
Methods
For many in the social sciences, computational text analysis comes in two variants: supervised or unsupervised. Supervised methods rest on the researcher's access to labels for meaning structures in text data, such as categories and a coding scheme, and then extrapolate these labels to unseen texts (Nelson et al., 2021; Chen et al., 2018; Lichtenstein and Rucks-Ahidiana, 2021; Do, Ollion, and Shen, 2022) . By contrast, unsupervised methods infer information about language patterns, such as co-occurrences of words in documents, without drawing on predefined categories or coding schemes. A growing number of studies are using unsupervised methods to describe the cultural meanings of sociological concepts-such as class (Kozlowski, Taddy, and Evans, 2019) , gender (Garg et al., 2018) , race (Nelson, 2021b) , stigma (Best and Arseniev-Koehler, 2023), and art (DiMaggio, Nag, and Blei, 2013) . Unsupervised methods rely on algorithms that either trace the meaning of individual words-for word embedding models in recent sociological research see Kozlowski, Taddy, and Evans (2019) ; Nelson et al. (2021) ; Bonikowski, Luo, and Stuhler (2022) ; Voyer et al. (2022) ; Best and Arseniev-Koehler (2023)-or on algorithms that identify thematic structures in ensembles of text-for topic models see, e.g., DiMaggio, Nag, and Blei (2013) ; Karell and Freedman (2019) ; Bohr (2020); Greve et al. (2022) .
Topic models or, more specifically, models based on Latent Dirichlet Allocation (LDA, Blei, Ng, and Jordan, 2003) represent an important class of unsupervised methods that inductively detect themes by learning the topics that are present in a document and the words that best describe them. LDA represents a generative probabilistic process that treats each document as a bag of words from which each word (token) is randomly drawn from a mixture of topics present in the document. The model then assigns each word in a document to a topic, allowing the same word to belong to various topics to a differing degree. Each topic, in turn, is a low entropy distribution over words that tend to co-occur. This graded membership property aligns closely with our analytical aim of determining which co-occurring topics are most relevant for describing the shared interpretation (or framing) of an issue.
As was mentioned above, unsupervised methods quantify what would otherwise be inaccessible, making the interpretive process that is always an important part of text analysis more transparent and systematic. However, unsupervised methods require post hoc operations to connect the model output to meaningful sociological concepts. Word embedding models, such as the one used by Kozlowski, Taddy, and Evans (2019) , rely on vector algebra and focus on a set of manually selected keywords in order to identify interpretable dimensions of a concept. In applications that use LDA models, the standard practice employed to achieve interpretability involves qualitatively inspecting each inferred topic and making iterative decisions as to which topics are meaningful and relevant for inclusion in the final analysis (e.g., Törnberg and Törnberg, 2016; Karell and Freedman, 2019; Nelson, 2020; Czymara and van Klingeren, 2022) . As a consequence, "sociologists using text as data must make a dizzying number of decisions about what information to extract and how to answer their research question" (Nelson, 2019: 139) . While they are important as a result of their exploratory potential and for their links to existing qualitative methodologies, iterative mixedmethod approaches such as "computational grounded theory" (Baumer et al., 2017; Nelson, 2020) , or "computational hermeneutics" (Mohr, Wagner-Pacifici, and Breiger, 2015) remain reliant on making sense of the output after a model is learned (Goldenstein and Poschmann, 2019; Nelson, 2019; Pääkkönen and Ylikoski, 2021) . Because the inductive finding of relevant sociological concepts places researchers at risk of also finding seemingly meaningful interpretations where none actually exist, calls have been made for the development and use of intrinsically interpretable models (Hurtado Bodell, Arvidsson, and Magnusson, 2019; Rudin, 2019; Madsen, Reddy, and Chandar, 2021) .
Seeded Topic Model. We suggest an extension to the original topic model, the seeded topic model (Lu et al., 2011; Arora, Ge, and Moitra, 2012; Jagarlamudi, Daumé, and Udupa, 2012; Magnusson et al., 2018; Fan, Doshi-Velez, and Miratrix, 2019; Eshima, Imai, and Sasaki, 2024; Watanabe and Zhou, 2022; Watanabe and Baturo, 2024) , as a middle ground between supervised and unsupervised approaches. The fully unsupervised nature of the original topic model does not guarantee that the topics identified will meaningfully reflect concepts of interest. By applying a simple extension of the original LDA framework, we aim to measure specific topics that we believe a priori to exist in a corpus. Seed words-a collection of words that the researcher believes represent topics of interest prior to seeing model outputs-guide the model toward the topics of interest. This extension makes the decisions that must be made during the topic definition procedure more transparent and reproducible.
Allowing researchers to seed topics on the basis of existing domain knowledge constitutes an important step toward a more deductive, insight-oriented approach to modeling that is both less reliant on post hoc interpretations of model outputs (as are required in the unsupervised approach) and not restricted to a priori manually annotated categories or manually selected keywords (as are required in the supervised approach). Instead, the seed words help form topics around predefined concepts, names, or ideas, while at the same time utilizing the functionality of LDA to find new associations in text data based on word co-occurrences.
It is important to note that there is a crucial difference between the seed word strategy used here and the use of keyword searches to identify meaningful topics and identify documents that "belong" to or are most salient in relation to specific topics. Keyword search involves a deterministic procedure that requires detailed knowledge of the configuration of topics before models are run. Previous research shows that even domain experts perform poorly in identifying the keywords that are most relevant for capturing specific concepts (King, Lam, and Roberts, 2017) . This results in biased text measures and differences in substantive conclusions. In contrast, seed words are only the starting point from which a model proceeds to learn which words go together. The unsupervised part of the algorithm will expand upon the original list of seed words in crystallizing topics of interest. We discuss the model and its implementation in detail in Supplemental Material Sections S2 and S4.
Previous contributions that have introduced seeded topic models using informative priors on preselected seed words (Lu et al., 2011; Jagarlamudi, Daumé, and Udupa, 2012; Fan, Doshi-Velez, and Miratrix, 2019; Eshima, Imai, and Sasaki, 2024; Watanabe and Baturo, 2024) relied on the standard collapsed Gibbs sampler as described in Griffiths and Steyvers (2004) , limiting their applicability to large-scale data. By increasing scalability, and by using the model as a method for measuring sociological concepts, our implementation extends in important ways to the existing methodological literature. Seeded topic models that are implemented via highly scalable parallelizable sampling (Magnusson et al., 2018) permit the extraction of predefined topics and their associations with other themes from massive text data. Even though we have used this highly specialized algorithm, the model estimation process based on our vast corpus took 4.5 days using a machine with 360 GB RAM and 32 cores. 2 Without the specialized algorithm, our analysis would not have been possible. See Authors' Note for information about the code and data that reproduces our analysis.
Seeding the Immigration Topic. Seeded topic models rely on Bayesian informative priors to decide which topics the algorithm should identify. In practice, informative priors are placed on the topic-word distribution such that a word used to guide the model has a zero probability of belonging to any other topic than the one for which it is a seed word. The seed words one uses to guide the model should be highly unlikely to occur in contexts outside the topic of interest-in our case, immigration. We use five types of words that are highly unlikely to be used in texts that do not relate to immigration: (i) names of immigration laws, (ii) titles of ministers responsible for immigration, (iii) names of agencies responsible for immigration, (iv) terms referring to related policy areas (e.g., integration policy), and (v) terms referring to different types of immigration (e.g., labor migration). Moving beyond the predefined seed words, the model learns other meaningful words that define the topic of interest. Among these, we find words that relate, for example, to race and ethnicity, such as names and slurs associated with minorities in Sweden (see Supplemental Material Section S7 for details). Our choice of seed words allows us to capture different dimensions of the immigration issue including, for example, discourses on different types of migrants such as refugees, asylum seekers, and labor migrants.
Seeding also allows the model to be infused with a priori knowledge of language change. Conceptually, actors, meanings, and contexts change over time, which implies that no single measure of discourse may be appropriate over long timescales. Lexical shifts and the changing meanings of social categorizations are critical challenges to the computational analysis of historical text (Bail, 2014; Rule, Cointet, and Bearman, 2015; Bonikowski, Luo, and Stuhler, 2022; Voyer et al., 2022) . The word "immigrant," for example, had rarely been used prior to the 1970s ("foreigner"" was the term of the day), and concepts such as "family reunification" and "unaccompanied minor" first appeared in the 1970s and 1990s, respectively. We implement the semi-supervised seeded topic model using domain knowledge to guide the model estimation over language changes that introduce new words to discuss the same topic. Topic seeding is best equipped to handle this type of language change that, in a standard modeling approach, would lead to the splitting of a theme into various topics. A previous name of the current Migration Agency (Migrationsverket), for example, was Statens Invandrarverk, and-by placing a prior on multiple words to inform the model that they belong to the same topic-we allow the immigration topic to crystallize around both these names (see Supplemental Material Section S2 for details on the seeding procedure and S3 for a full list of the seed words employed).
We measure the salience of the immigration topic (Figure 1B ) by calculating the proportion of words in all documents that are estimated to belong to the seeded immigration topic each week.
Co-occurring Topics as Interpretative Frames. The seeding strategy also permits us to define a set of additional topics that meaningfully co-occur with immigration and that we wish to flesh out from the media discourse as potential interpretations of immigration. We operationalize prominent media frames via the focal topic's associations with other frequently co-occurring topics, and we interpret these relationships as culturally shared associations between concepts. This implies that we abstract away from word-level analyses, such as keyword in context, and instead, focus on how topics (rather than words) co-occur. In our analysis, it is not crucial whether the word "immigrant" is discussed alongside words such as "workplace" or "murder"; what matters instead is the association of the immigration topic with the economy topic and the crime topic, respectively.
We have predefined co-occurring topics on the basis of existing research on the common themes found in European news reporting on immigration (Korkut et al., 2013; Greussing and Boomgaarden, 2017; Eberl et al., 2018; Heidenreich et al., 2019) and research documenting Sweden's immigration history (Geddes and Scholten, 2016; Byström and Frohnert, 2017; Krzyżanowski, 2018; Andersson et al., 2010) . Based on this research, we expect five dominant frames-"culture," "economy," "human rights," "politics," and "security"-to co-occur with discussions of immigration. We capture each frame that represents a known interpretation of immigration by seeding several topics (Table 2 ). We seed multiple topics to capture each frame such that an interpretative frame can be viewed as a "supratopic" covering different dimensions of a related issue. For example, "crime," which constitutes part of the security frame, is a highly diverse issue that includes a focus on offenses such as burglary, narcotics, murder, and sexual assault, to name only a few. To capture the many different crime-related aspects, we seed four different topics using the same set of seed words (see Supplemental Material Sections S2 and S3 for details). By seeding different topics with the same words we allow the model to crystallize around particular dimensions of a broader theme of interest in separate topics without explicitly having to choose these dimensions a priori. For example, while we know that "crime" is a multi-dimensional theme in our corpus (e.g., news covering different types of crimes at different phases in an investigation will be defined by different vocabularies), we let the model inductively find which type and aspect of crime should form a particular topic. One seeded topic then becomes a drug topic, for example, one becomes a homicide topic, and so on, and these are then combined into the larger topic of crime. This procedure allows the model to identify more specialized topics which, depending on the research question, can then be combined into a well-defined larger topic. We set the number of topics to 1,000, allowing for a combination of seeded and unseeded topics in the model.
Unlike previous research, we quantify interpretative frames using co-occurrence frequencies for different topics that are inferred from the same topic model that simultaneously measures the focal topic of interest. We measure the importance of each frame (Figure 2 ) in terms of the proportion of words that belong to the respective seeded topics in immigration-rich documents printed in the newspapers (see Supplemental Material Section S5). Document Inclusion, Sensitivity, and Validation. The analysis includes all documents that we classified as "immigration-rich" if at least 2.5% of its tokens were estimated to belong to the immigration topic (i.e., ≥25 times more than the a priori expected proportion, which is 1/1,000 or 0.1%, where Note: We capture each interpretative frame as a supratopic composed of several specialized seeded topics that frequently co-occur with the immigration topic.
1,000 represents the number of topics used in the model). One could argue that if a news item contains only a single token related to immigration, it should belong to the immigration topic. However, often immigration or immigrants are mentioned only once in an article, for example, as one of many policy areas. To establish a useful threshold for document inclusion for the entire observation period, we have read samples of the material at different threshold values and evaluated when the topic of immigration indeed was central to the news items; we settled with what we considered a good trade-off between keeping a reasonable number of documents and keeping the analysis centered around the focal theme of interest. Our main results are robust to threshold choice (see Supplemental Material Section S6). We report on model diagnostics and sensitivity analyses in Supplemental Material Section S6, including (i) a test for model convergence as well as model re-runs (ii) using alternative numbers of topics (950, 1500), (iii) using each newspaper corpus separately, (iv) using alternative thresholds for document inclusion (1%, 4%, and 5%), and (v) using random subsets of 90%, 80%, and 70% of the original set of seed words.
In Supplemental Material Section S7, we report on validation strategies for topic definition that evaluate the degree to which a seeded topic captures the concept of interest. Those strategies include (i) a comparison of documents classified as being about immigration with a manual annotation of a sample of documents, (ii) an inspection of the tokens that the algorithm learned to belong to the topic, and (iii) an analysis of influential immigration-related events based on high temporal resolution data. The latter analysis tests whether the model picks up on immediate changes in newspapers' framing following such events. We focus on events for which clear theoretical expectations exist about their likely impact on the salience of a particular seeded frame. An Islamist terrorist attack, for example, may serve to re-frame Islam as a violent ideology, leading to revisions of the current security-related interpretations of immigration (Greenberg, Pyszczynski, and Solomon, 1986; Legewie, 2013; Schmidt-Catran and Czymara, 2020) . In this case, we would expect the relative salience of the security-related frame to increase in the weeks following the attack-indicating valid topic seeding.
Parsing Discursive Eras. We use a Bayesian Gaussian change-point model (Barry and Hartigan, 1993; Erdman and Emerson, 2007) to detect shifts over time in the salience of single frames as well as in the relative composition of salient frames. We interpret salience shifts as breakpoints in the media's framing of immigration. The model assumes that a time series of frame salience can be partitioned into an unknown number of periods, with each period having a constant mean reflecting a "new probability regime" (Abbott, 2001) . We estimate two kinds of specifications of the change-point model: (i) A univariate specification that tests for breakpoints in the salience of each of the five seeded frames separately, and (ii) a combined multivariate specification that tests for breakpoints in the relative composition of all five seeded frames. We are particularly interested in the multivariate model results. The composition of salient frames at a certain point in time aggregates into what we refer to as the shared interpretation of immigration communicated by the media. A shared interpretation describes a set of frames that are available to the public at a given point in time to make sense of an issue. The estimates of the change-point model provide an empirical foundation for the parsing of discursive periods (Rule, Cointet, and Bearman, 2015) in which meaning-making measurably differed.
The model, regardless of its specification as univariate or multivariate, estimates the posterior probability that each year constitutes a change point, delimiting sharp differences in the means of the respective time series in adjacent periods. That is, the model estimates the likelihood of a significant shift has occurred in the way the newspapers frame immigration in each one of the 75 years included in the data. We use a standard implementation of the model (Erdman and Emerson, 2007) , and we set the model's hyperparameter γ to its default value 0.3, which reflects the absence of a priori knowledge as to how many change points the model should identify. We interpret years that have a multivariate posterior change-point probability equal to or larger than 50% as consequential turning points that mark the beginning of a new era of discourse. For most years, the estimated change-point probabilities are close to 0 (see Figure 2B ). Our choice of a ≥50%-threshold is non-exclusive and merely requires a change-point year to have a higher likelihood of representing a turning point than of not doing so.
Results
Figure 1B traces the relative salience of the seeded immigration topic in Sweden's newspaper corpus from 1945 to 2019. The blue line represents the annual average salience of immigration and shows how important this issue was in the media. Prior to the first major peak in the number of immigrants in 1970, the level of media attention focused on immigration was low. On average, 0.05% of tokens in the newspapers referred to it. By contrast, from 2015 to 2019, the salience of immigration as a news issue reached 0.37%, a 7.4-fold increase vis-à-vis the first period. 3 Both the actual number of immigrants arriving in Sweden (Figure 1A ) and the importance of the immigration topic in newspaper coverage (Figure 1B ) reached unprecedented heights in 2015. The year of the European "refugee crisis" represents a clear disruption in terms of the salience of immigration. Salience also spiked during 1969-1970, which were years of high labor migration, and during the armed conflicts in Iraq (1990 -1991 , 2003 -2011 ) and Bosnia (1992 -1995) , which resulted in many refugees arriving in Sweden. The linear correlation between the annual number of newly arrived immigrants and the salience of the immigration topic is 0.82 for the entire period examined; this correlation increases to 0.93 from 2010 to 2019. These results show that the attention of the media shifts to immigration in periods of peak influx, particularly if immigrant numbers increase rapidly.
Relative topic salience provides an important measure of what has been discussed at different times. It does not reveal, however, how issues have been covered and thought about. To answer the second question, we trace the salience of the different immigration frames shared by the media. Figure 2A maps the co-evolution of different interpretations of immigration, plotting the salience proportion of each of the five seeded frames over 75 years. Figure 2B provides estimates of likely change points for each frame (colored lines for univariate models) and in the composition of the different frames (black line for the multivariate model). When parsing discursive eras, our primary interest lies in the detection of measurable shifts in the composition of frames. From the multivariate changepoint model, we infer seven recognizable eras (with an average length of 10.7±2.6 years) between which the media's interpretation of immigration measurably differed. Period 1, 1945 Period 1, -1954. . Immediately following the war, the media discourse portrayed immigration mainly from a humanitarian perspective (Figure 2A ). As this association became less prominent, we find likely univariate change points in the humanitarian interpretation and, to a lesser degree, in the cultural interpretation of immigration during the late 1940s and early 1950s (Figure 2B ). Period 2, 1955 Period 2, -1964. . We estimate the first turning point, with a 96% posterior probability in the multivariate model, as occurring in 1955. This year was characterized by a surge of labor migration to Sweden. At the end of the second period, in the mid-1960s, the association between immigration and the economy had caught up with the humanitarian perspective. Both inferred periods 1 and 2 of post-war immigration align with historical accounts that partition Sweden's immigration history on the basis of immigration flows and policy changes (Geddes and Scholten, 2016; Byström and Frohnert, 2017; Krzyżanowski, 2018; Kupsky , 2017; Andersson et al., 2010; Svanberg and Tydén, 1998) . Period 3, 1965 Period 3, -1973. . Our model identifies a period of rupture in the mid-1960s-which coincides with the first discussions of multiculturalism (1964) and investigations into the costs of immigration for the expanding welfare state (1965). In the immediate aftermath of these discussions and investigations, the dominant interpretation of immigration became economic, and a cultural framing gained importance. These ruptures, with multivariate change-point probabilities of 95% in 1964 and 70% in 1966, mark the beginning of a long era of relative stability in the associative patterns. Rapid economic growth and the political hegemony of the Social Democratic party resulted in the roll-out of the welfare state, which was extended in 1968 to cover migrant workers, and a newly established migration board was tasked with overseeing their employability. Again, the inferred period is largely in alignment with the narrative presented by historical social science (Byström and Frohnert, 2017; Krzyżanowski, 2018) . Period 4, 1974 Period 4, -1985. . We infer turning points in 1974 (70%) and 1986 (77%). Labor migration declined during the economic crises of the 1970s and was increasingly replaced by immigration involving non-European refugees. The univariate breakpoint for culture in 1984 coincides with the arrival of increasing numbers of non-Western refugees, discussions of legislation against ethnic discrimination, and increased efforts focused on integration, including family reunification (Byström and Frohnert, 2017; Andersson et al., 2010) . Period 5, 1986 Period 5, -1999 Period 5, . 1986 marks the year in which the Swedish Prime Minister, Olof Palme, was murdered. Spearheaded by Palme's governments (1969) (1970) (1971) (1972) (1973) (1974) (1975) (1976) (1982) (1983) (1984) (1985) (1986) , immigration law had embraced multicultural ideals, affirming diversity and the protection of immigrants' cultural identities. Despite the turning point identified in 1986, the media framing of immigration remained remarkably stable across periods 4 and 5, and we interpret the interval 1974-1999 as representing Sweden's famed era of tolerance (Schierup and Ålund, 2011; Rydgren and van der Meiden, 2019) , during which an inert mix of economic, humanitarian, and security-related frames shaped the interpretation of migration for almost a generation. This interpretation weathered economic downturns, peaks in immigration, and Sweden's accession to the EU in 1995, and remained dominant until the end of the 1990s-which is much longer than the historical narrative suggests (Dahlström, 2004; Byström and Frohnert, 2017; Svanberg and Tydén, 1998) . At the same time, the turning points we identify in this era are disproportionately driven by an increase in a new, politically polarized understanding of immigration. Notably, this upward trend in the politicization of immigration precedes the electoral success of populist far-right parties and the decline in the Social Democratic consensus that have characterized Swedish policy debates in recent decades (Dahlström, 2004; Byström and Frohnert, 2017) . Period 6, 2000-2012.
Our analysis identifies the year 2000 as a consequential turning point (84%) driven by politicization. This was a year of revisions to immigration law, when the EU started to harmonize its immigration policies in the lead-up to the Schengen agreement (2001) , and led to an increase in the number of migrant workers arriving in Sweden from the eastern countries of the EU. We find that a further convergence of media frames and, ultimately, their gradual replacement by politics as the dominant lens through which immigration is viewed, coincided with the populist right Sweden Democrats' entry into parliament in 2010. The Sweden Democrats have since become the country's second-largest party in national elections. Several years are associated with non-zero change-point probabilities for specific frames, but none of these are particularly pronounced and we do not find them to be sufficiently consequential to register in the model as having altered the interpretation of immigration. Throughout this period, and despite the September 11 attacks and the subsequent US-led "war on terror," the association between Swedish immigration and security issues remained flat. Period 7, 2013-today.
The final turning point that we estimate to lie above the 50%-threshold (51%) occurred in 2013. This disruption, which is less clear than those described above, marks the beginning of the most recent discursive era. This period included generous revisions of asylum law. At the same time, the consensual migration politics of past decades, which some have argued cemented an "opinion corridor" of views perceived as socially acceptable (Ekengren Oscarsson, 2013) , were increasingly being criticized in society at large. This period reflects a further politicization of the immigration discourse, a surge in a security-related interpretation, and probably also the end of Sweden's "exceptionalism" (Schierup and Ålund, 2011; Rydgren and van der Meiden, 2019) as regards the country's tolerant approach to immigration. Our results indicate that this reinterpretation of immigration started well before the 2014 general election (in which the Sweden Democrats doubled their number of seats in parliament) and, most importantly, before the 2015 "refugee crisis." Neither of these years was sufficiently consequential to register in our change-point model. Strikingly, we instead see that the 2015 "refugee crisis," which many observers have classified as a watershed in European immigration history, was of little consequence for the ways in which the Swedish media have portrayed immigration.
In Supplemental Material Section S6, we report these results separately per newspaper. We find that the framing of immigration over time varies little between newspapers of different political orientations or between highbrow broadsheets (Dagens Nyheter, Svenska Dagbladet) and lowbrow tabloids (Aftonbladet, Expressen). These separate analyses closely reproduce the findings of the main analysis presented here.
Discussion
We have argued that the seeded (or constrained) topic model constitutes a promising semi-supervised method-combining both inductive and deductive reasoning-that provides a more replicable and transparent means of measuring meaning in digital text. Semi-supervised methods can improve transparency and replicability by decreasing the number of idiosyncratic decisions made during model implementation. Importantly, the seeded topic model permits a theoretical grounding of the topic definition procedure, because seed words require researchers to be explicit about how concepts are operationalized, and these constraints ensure that the model will identify the same concepts in each model run. This approach represents an advance in relation to concerns about whether computationally identified patterns can provide replicable and interpretable empirical evidence that is relevant to social science research. The seeding procedure allows researchers to tame the unsupervised nature of the topic model by guiding the model in its detection of topics, but without predetermining the full vocabulary associated with the topics identified. We have demonstrated the applicability of one specific algorithm to the task of identifying predefined, sociologically relevant concepts in texts and inferring the associations that exist between these concepts.
Model performance should be validated to ensure that the seeded topics represent the concepts of interest, and model validation still requires subjective interpretations of topic quality. To be sure, choosing seed words may be an iterative process, based on interpretations of model outputs and allowing previously unknown patterns to arise from the data. Such iterative processes are essential in most research that employs computational text analysis (Grimmer, Roberts, and Stewart, 2022) , and as Mohr and colleagues have noted, "there can be no measurement of culture without interpretation" (Mohr et al., 2020: 4) . Against this backdrop, we have taken important steps toward a more principled interpretation of topic models. First, identifying both a focal concept and its neighboring topics in a single estimationinstead of first identifying the relevant documents that contain the focal concept and then searching for other concepts within these documentsensures that the analysis is less reliant on early operationalization decisions. One-step procedures are particularly important for producing reliable measures of meaning-making over long timescales, where they may be affected by language change.
Second, seeding facilitates diagnostics of model performance, something that is typically difficult in purely unsupervised settings (Chang et al., 2009; Ying, Montgomery, and Stewart, 2022) . The semi-supervised nature of the model allows us to restrict validation efforts to the seeded topics. This is particularly important because there are currently no standards regarding how topic models should best be evaluated when used in sociological research. In the Appendix (Supplemental Material Section S7), we suggest various measures that will assist in inspecting the quality of seeded topics, and we found a high level of correspondence when we compared a manually coded sample of documents with documents inferred by the model to belong to a seeded topic. Additionally, we have checked the sensitivity of our results regarding the number of topics, seed word selection, and different thresholds for document inclusion (Supplemental Material Section S6).
In a supplementary analysis also reported in Supplemental Material Section S7, we provide suggestive evidence that unforeseen and widely recognized events have the capacity to measurably shift the salience of certain media frames. These results illustrate another validation strategy that tests whether the model picks up on shifts in the salience of the frame most closely related to the event in question. The results lend support to the validity of our semi-supervised inference of interpretative frames, and they provide pointers to the immediate response of newspapers to disruptive events. The event-focused analysis of high temporal resolution data also illustrates how-under certain assumptions-latent features of text data can be used as the outcome variable when estimating causal effects (Egami et al., 2022; Gencoglu and Gruber, 2020) .
Of course, seeded topic models also have their own limitations. Current applications of the original topic model focus on discovering previously unknown patterns in text data (Grimmer, Roberts, and Stewart, 2022) . The seeding of topics places bounds on an open discovery process. One solution (which we followed in our case study) involves allowing for a combination of seeded and unseeded topics in the model such that unexpected signals in the data can still be detected and explored. The applicability of the seeded topic model depends on how well researchers can operationalize a theoretical concept via one or more topics. A seeded topic model can easily identify some concepts, depending on the availability of unique words associated with the theme of interest. Other concepts are nearly impossible to pin down, however. For example, the model will struggle to capture a topic that is mostly defined by polysemic words, i.e., words with different possible meanings. To tackle issues with polysemy, researchers can seed multiple topics with the same words-as we did, for example, for the multifaceted crime topic-and thereby rely on the model to inductively capture their different meanings. While this may solve issues related to polysemy, it also decreases the replicability of the model. Therefore, finding non-polysemic words to crystallize interpretable topics of interest poses an important scope condition and, in some potential use cases, a roadblock to making full use of the seeded topic model. At the same time, however, vague and multifaceted themes that are difficult to identify using a seeded topic model may also present challenges to supervised methods that require human annotation.
Large language models (LLMs), which increasingly find their way into social science publications, also blur the line between supervised and unsupervised learning. LLMs have shown great capacity in a vast array of classification tasks (Do, Ollion, and Shen, 2022; Widmann and Wich, 2023; Bonikowski, Luo, and Stuhler, 2022; Chae and Davidson, 2023; Gilardi, Alizadeh, and Kubli, 2023; Törnberg, 2023) , although current models' performance is still under debate (e.g., Ollion et al., 2024; Bail, 2024) , especially in classification tasks that require cross-document reasoning as in topic modeling and when texts pertain to a particular place and time as in historical corpora (Ziems et al., 2024) . The development of LLMs proceeds at an extremely fast pace. Decreasing costs will open them up for analyses of very large corpora, and ideas of identifying, in principled ways, concepts predefined by the researcher will hopefully guide some of the modeling advances. If researchers find ways to gain more control over labeling, replicability, and transparency (Grossmann et al., 2023) , this transformative brand of text modeling will be in a good position to develop important alternatives to the seeded topic model.
We have applied the seeded topic model to a vast newspaper archive to learn how the issue of immigration has been framed in Swedish newspapers from 1945 to 2019. The storytelling of journalists-their use of interpretative frames to make news events understandable to their audiences-makes newspaper archives a treasure trove for the study of meaning-making over historical timescales. We have operationalized frames as themes that frequently co-occur with the issue of interest, and we have interpreted these relationships as culturally relevant associations between concepts. Hence, we have also studied newspaper coverage as a social sensor of discursive processes (Fiss and Hirsch, 2005; Gamson and Modigliani, 1989) in which broader interpretations of societal developments and events are generated, negotiated, and revised (Swidler, 1986; Bourdieu, 1991; Strauss and Quinn, 1997) . Viewing text as a social sensor involves the use of large repositories of digital text to uncover latent observations about the social world and trends in contemporary societies in particular.
Some have argued that media content reflects elite discourses and that a media sensor can capture "common cultural patterns, but it cannot observe what is never articulated" (Bonikowski, 2016) . We recognize that mediagenerated perceptions of current events do not equate to the perceptions of the whole population, especially not with regard to polarized "hot" topics and in the age of social media. We have not measured meaning at the individual level, and we have not delineated different "thought communities," although they no doubt exist, particularly in a politicized domain such as immigration. One example would be that different segments of society may have different groups in mind when they think about immigrants (Blinder, 2015; Eberl et al., 2018) . Still, our case study has demonstrated that vast corpora of the type and scale studied here are likely to contain important evidence of the dominant interpretative frames-in the sense of "common cultural patterns"-that have been used to make sense of societal issues at a certain point in time. We believe that using such sensors may have general implications for sociological research in light of the increasing availability of "found" online data (e.g., Keuschnigg, Lovsjö, and Hedström, 2018; Salganik, 2018; Jarvis, Keuschnigg, and Hedström, 2021) .
We have highlighted the induction of different eras of meaning-making as a potential means of analyzing the output of seeded topic models, offering a refined empirical foundation for the parsing of "discursive periods" during which specific interpretations of an issue are widely shared. Historians often define "eras" of social change on the basis of policy shifts (Ermakoff, 2019) , and-for immigration history-many have viewed key revisions of immigration law as turning points demarcating different eras (Andersson et al., 2010; Geddes and Scholten, 2016) . However, historical narratives that partition the flow of events into coherent, meaningful sequences (Stone, 1979; Sewell, 1996) have been criticized for their lack of explanatory depth and, in particular, for involving a risk that spurious events will be identified as marking the beginning and end of posited periods (Popper, 1957; Griffin, 1992) . Our study exemplifies that digital archives offer new opportunities for the identification of turning points and for delineating discursive periods on the basis of the ideas expressed by contemporaries (Bearman, 2015; Rule, Cointet, and Bearman, 2015; Garg et al., 2018) .
Our measures of media framing are in close alignment with the type of immigration experienced in post-war Sweden until the mid-1970s. The inferred discursive periods match those implied by historical accounts that have partitioned Sweden's immigration history on the basis of policy changes (Andersson et al., 2010; Geddes and Scholten, 2016; Kupsky , 2017) . We found that the texts from the late 1970s and early 1980s best describe the country's signature era of multiculturalism and tolerance toward immigration. Different frames achieved similar salience, indicating a new pluralism in how immigration has been discussed. Weathering economic downturns and peaks in immigration, this era lasted until the end of the 1990s-and thus much longer than historical accounts have suggested (Dahlström, 2004; Svanberg and Tydén, 1998) . At the same time, we found that the media began framing immigration as a political issue as early as the mid-1970s-long before anti-immigration platforms started attracting larger audiences and the erosion of the parliamentary consensus on immigration in the mid to late 1980s (Byström and Frohnert, 2017) . As the political framing of immigration gained momentum, we were once again able to see a more unidimensional discussion of migration-now as a strongly politicized issue.
We have also found that seemingly obvious turning points-such as the economic downturns of the 1970s and 1990s, and the "refugee crisis" of 2015-had few consequences for the frames used by the news media to portray immigration in Sweden. However, the public might frame things differently from the mainstream media, and future research is therefore needed to examine how broader segments of society, e.g., the online public, react to highly publicized events.
To conclude, seeded topic modeling provides a means whereby researchers can rely on sociological knowledge when implementing and validating replicable models that make inferences beyond the words on the page. Semi-supervised approaches of this kind could become an important next step toward further improving the work of social scientists in their computational analysis of social data.
Figure 1 .
1Figure 2 .
2Table 1 .
1| Dagens | Svenska | |||
| Aftonbladet | Nyheter | Expressen | Dagbladet | |
| Newspaper type | Tabloid | Broadsheet | Tabloid | Broadsheet |
| Political leaning | Left | Moderate | Moderate Right | |
| Founding year | 1830 | 1864 | 1944 | 1884 |
| Avg. daily paid | 343,595 | 377,870 | 417,653 | 166,426 |
| circulation | ||||
| # documents (in | 7.20 | 6.86 | 7.89 | 7.36 |
| millions) | ||||
| Tokens (in millions) | 338.5 | 427.9 | 338.8 | 455.7 |
| Avg. # tokens per doc 47.3 | 44.0 | 61.1 | 44.1 | |
| # Immigration-rich | 86,070 | 117,876 | 90,261 | 112,844 |
| docs | ||||
| Note: Average daily paid circulation refers to 1945-2018, tokens refers to number of words, and | ||||
| we classified documents as immigration-rich if at least 2.5% of its tokens belong to the estimated | ||||
| immigration topic. |
Table 2 .
2| Interpretative frame Seeded topics | |
| Culture | Diversity perspectives, language, national identity, religion |
| Economy | Labor market, public finance, health care, housing, education |
| Human rights | Discrimination, family, human rights, racism |
| Politics | Political parties, European Union |
| Security | Crime, terrorism |
References
- E-Commerce A Abbott 10.5040/9780815751113.ch-005 Comparative Social Research 16 1997 The Brookings Institution
- Time Matters: On Theory and Method A Abbott 2001 University of Chicago Press Chicago, IL, USA
- Immigration, Housing and Segregation in the Nordic Welfare States R Andersson H Dhalmann E Holmqvist T M Kauppinen L Magnusson Turner H Skifter Andersen S Søholt M Vaattovaara K Vilkama T Wessel 2010 Department of Geosciences and Geography Finland Helsinki University
- Making the News Kenneth T Andrews Neal Caren 10.1177/0003122410386689 American Sociological Review Am Sociol Rev 0003-1224 1939-8271 75 6 2010 SAGE Publications
- A Topic Modeling Guided Approach for Semantic Knowledge Discovery in e-Commerce V S Anoop S Asharaf 10.9781/ijimai.2017.03.014 International Journal of Interactive Multimedia and Artificial Intelligence IJIMAI 1989-1660 4 6 2017 Universidad Internacional de La Rioja
- Learning Topic Models-Going Beyond SVD S Arora R Ge A Moitra Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science New Brunswick, New Jersey IEEE 2012
- Integrating Topic Modeling and Word Embedding to Characterize Violent Deaths A Arseniev-Koehler S D Cochran V M Mays K.-W Chang J G Foster Proceedings of the National Academy of Sciences 119 10 2022
- Machine Learning as a Model for Cultural Learning: Teaching an Algorithm What It Means to Be Fat A Arseniev-Koehler J G Foster Sociological Methods & Research 51 4 2022
- The Fringe Effect: Civil Society Organizations and the Evolution of Media Discourse About Islam Since the September 11th Attacks C A Bail American Sociological Review 77 6 2012
- The Cultural Environment: Measuring Culture With Big Data C A Bail Theory and Society 43 3-4 2014
- C A Bail Cultural Carrying Capacity: Organ Donation Advocacy, Discursive Framing, and Social Media Engagement 2016 165
- Can Generative AI Improve Social Science? C A Bail Proceedings of the National Academy of Sciences 121 21 2314021121 2024
- Channeling Hearts and Minds: Advocacy Organizations, Cognitive-Emotional Currents, and Public Conversation C A Bail T W Brown M Mann American Sociological Review 82 6 2017
- Individuals, Institutions, and Innovation in the Debates of the French Revolution A T J Barron J Huang R L Spang S Dedeo Proceedings of the National Academy of Sciences 115 18 2018
- A Bayesian Analysis for Change Point Problems D Barry J A Hartigan Journal of the American Statistical Association 88 421 1993
- Comparing Grounded Theory and Topic Modeling: Extreme Divergence or Unlikely Convergence? E Baumer D Mimno S Guha E Quan G K Gay Journal of the Association for Information Science and Technology 68 6 2017
- Big Data and Historical Social Science P Bearman Big Data & Society 2 2 2015
- Blocking the Future: New Solutions for Old Problems in Historical Social Science P Bearman R Faris J Moody Social Science History 23 4 1999
- Framing Processes and Social Movements: An Overview and Assessment R D Benford D A Snow Annual Review of Sociology 26 1 2000
- The Stigma of Diseases: Unequal Burden, Uneven Decline R K Best A Arseniev-Koehler American Sociological Review 88 5 2023
- Probabilistic Topic Models D M Blei Communications of the ACM 55 4 2012
- Latent Dirichlet Allocation D M Blei A Y Ng M I Jordan Journal of Machine Learning Research 3 2003
- Media Portrayals of Muslims: A Comparative Sentiment Analysis of American Newspapers, 1996-2015 E Bleich A M Van Der Veen Politics, Groups, and Identities 9 1 2021
- Imagined Immigration: The Impact of Different Meanings of 'Immigrants' in Public Opinion and Policy Debates in Britain S Blinder Political Studies 63 1 2015
- Reporting on Climate Change: A Computational Analysis of U.S. Newspapers and Sources of Bias, 1997-2017 J Bohr Global Environmental Change 61 2020
- Nationalism in Settled Times B Bonikowski Annual Review of Sociology 42 2016
- Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in US Presidential Campaigns (1952-2020) with Neural Language Models B Bonikowski Y Luo O Stuhler Sociological Methods & Research 51 4 2022
- From Ends to Means: The Promise of Computational Text Analysis for Theoretically Driven Sociological Research B Bonikowski L K Nelson Sociological Methods & Research 51 4 2022
- Transfiguring the Library as Digital Research Infrastructure: Making KBLab at the National Library of Sweden L Börjeson C Haffenden M Malmsten F Klingwall E Rende R Kurtz F Rekathati H Hägglöf J Sikora 2023. March 3, 2024
- Language and Symbolic Power P Bourdieu 1991 Harvard University Press Cambridge, MA
- A Boutyline A Arseniev-Koehler D J Cornell School, Studying, and Smarts: Gender Stereotypes and Education Across 80 Years of American Print Media 2023 102
- M Byström P Frohnert Invandringens Historia: Från "Folkhemmet" til Dagens Sverige. Elanders Sverige AB 2017 Delegationen för migrationsstudier Stockholm, Sweden
- Computational Analysis of 140 Years of US Political Speeches Reveals More Positive but Increasingly Polarized Framing of Immigration D Card S Chang C Becker J Mendelsohn R Voigt L Boustan R Abramitzky D Jurafsky Proceedings of the National Academy of Sciences 119 31 2022
- Rethinking Culture and Cognition Karen A Cerulo Vanina Leschziner Hana Shepherd 10.1146/annurev-soc-072320-095202 Annual Review of Sociology Annu. Rev. Sociol. 0360-0572 1545-2115 47 1 2021 Annual Reviews
- Large Language Models for Text Classification: From Zero-Shot Learning to Fine-Tuning Chae Y T Davidson 2023. March 5, 2024
- Reading Tea Leaves: How Humans Interpret Topic Models J Chang S Gerrish C Wang J Boyd-Graber D Blei Advances in Neural Information Processing Systems 22 2009
- Using Machine Learning to Support Qualitative Coding in Social Science: Shifting the Focus to Ambiguity N.-C Chen M Drouhard R Kocielnik J Suh C R Aragon ACM Transactions on Interactive Intelligent Systems 8 2 2018
- Framing Theory D Chong J N Druckman Annual Review of Political Science 10 2017
- New Perspective? Comparing Frame Occurrence in Online and Traditional News Media Reporting on Europe's "Migration Crisis C S Czymara M Van Klingeren Communications 47 1 2022
- Rhetoric, Practice and the Dynamics of Institutional Change: Immigrant Policy in Sweden, 1964-2000 C Dahlström Scandinavian Political Studies 27 3 2004
- Evaluation and Refinement of an Enhanced OCR Process for Mass Digitisation D Dannélls T Johansson L Björk 2019 2364 Costanza Navarretta, Manex Agirrezabal, Bente Maegaard. Copenhagen, Denmark: University of Copenhagen in Digital Humanities in the Nordic Countries
- Culture and Cognition P Dimaggio Annual Review of Sociology 23 1 1997
- Adapting Computational Text Analysis to Social Science (and Vice Versa) P Dimaggio Big Data & Society 2 2 2015
- Exploiting Affinities Between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of US Government Arts Funding P Dimaggio M Nag D Blei Poetics 41 6 2013
- The Augmented Social Scientist: Using Sequential Transfer Learning to Annotate Millions of Texts with Human-Level Accuracy S Do E Ollion R Shen Sociological Methods & Research 53 3 2022
- Mapping Media Coverage of Migration Within and Into Europe J.-M Eberl S Galyga in Media and Public Attitudes Toward Migration in Europe J Strömbäck C E Meltzer J-M Eberl Oxfordshire, England, UK Routledge 2021
- J.-M Eberl C E Meltzer T Heidenreich B Herrero N Theorin F Lind R Berganza H G Boomgaarden C Schemer J Strömbäck The European Media Discourse on Immigration and its Affects: A Literature Review 2018 42
- How to Make Causal Inferences Using Texts N Egami C J Fong J Grimmer M E Roberts B M Stewart Science Advances 8 42 2022
- Väljare är inga dumbommar H Ekengren Oscarsson 2013. May 21, 2024
- Framing: Toward Clarification of a Fractured Paradigm R M Entman Journal of Communication 43 4 1993
- bcp: An R Package for Performing a Bayesian Analysis of Change Point Problems C Erdman J W Emerson Journal of Statistical Software 23 3 2007
- Causality and History: Modes of Causal Investigation in Historical Social Sciences I Ermakoff Annual Review of Sociology 45 2019
- Keyword-Assisted Topic Models S Eshima K Imai T Sasaki American Journal of Political Science 68 2 2024
- Assessing Topic Model Relevance: Evaluation and Informative Priors A Fan F Doshi-Velez L Miratrix Statistical Analysis and Data Mining: The ASA Data Science Journal 12 3 2019
- Corporate Funding and Ideological Polarization About Climate Change J Farrell Proceedings of the National Academy of Sciences 113 1 2016
- The Discourse of Globalization: Framing and Sensemaking of an Emerging Concept P C Fiss P M Hirsch American Sociological Review 70 1 2005
- Seeing Like the Fed: Culture, Cognition, and Framing in the Failure to Anticipate the Financial Crisis of 2008 N Fligstein J Stuart Brundage M Schultz American Sociological Review 82 5 2017
- Relating Social and Symbolic Relations in Quantitative Text Analysis: A Study of Parliamentary Discourse in the Weimar Republic J Fuhse O Stuhler J Riebling J L Martin Poetics 78 2020
- Talking Politics W A Gamson 1992 Cambridge University Press Cambridge, UK
- Media Discourse and Public Opinion on Nuclear Power: A Constructionist Approach William A Gamson Andre Modigliani 10.1086/229213 American Journal of Sociology American Journal of Sociology 0002-9602 1537-5390 95 1 1989 University of Chicago Press
- Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes N Garg L Schiebinger D Jurafsky J Zou Proceedings of the National Academy of Sciences 115 16 2018
- A Geddes P Scholten The Politics of Migration and Immigration in Europe London, UK Sage 2016
- Causal Modeling of Twitter Activity during COVID-19 Oguzhan Gencoglu 0000-0002-3581-2231 Mathias Gruber 10.3390/computation8040085 Computation Computation 2079-3197 8 4 85 2020 MDPI AG
- ChatGPT outperforms crowd workers for text-annotation tasks Fabrizio Gilardi 0000-0002-0635-3048 Meysam Alizadeh 0000-0001-6696-6471 Maël Kubli 0000-0002-5592-9648 10.1073/pnas.2305016120 Proceedings of the National Academy of Sciences Proc. Natl. Acad. Sci. U.S.A. 0027-8424 1091-6490 120 30 2023 National Academy of Sciences
- Mapping Shared Understandings Using Relational Class Analysis: The Case of the Cultural Omnivore Reexamined A Goldberg American Journal of Sociology 116 5 2011
- Analyzing Meaning in Big Data: Performing a Map Analysis Using Grammatical Parsing and Topic Modeling J Goldenstein P Poschmann Sociological Methodology 49 1 2019
- The Causes and Consequences of a Need for Self-Esteem: A Terror Management Theory J Greenberg T Pyszczynski S Solomon in Public Self and Private Self R F Baumeister New York Springer 1986
- Shifting the Refugee Narrative? An Automated Frame Analysis of Europe's 2015 Refugee Crisis E Greussing H G Boomgaarden Journal of Ethnic and Migration Studies 43 11 2017
- Online Conspiracy Groups: Micro-Bloggers, Bots, and Coronavirus Conspiracy Talk on Twitter H R Greve H Rao P Vicinanza E Y Zhou American Sociological Review 87 6 2022
- Temporality, Events, and Explanation in Historical Sociology: An Introduction L J Griffin Sociological Methods & Research 20 4 1992
- Finding Scientific Topics T L Griffiths M Steyvers Proceedings of the National Academy of Sciences 101 1 2004
- J Grimmer M E Roberts B M Stewart Text as Data: A New Framework for Machine Learning and the Social Sciences Princeton, NJ Princeton University Press 2022
- AI and the Transformation of Social Science Research I Grossmann M Feinberg D Parker N Christakis P Tetlock W Cunningham Science 380 6650 2023 New York, NY
- Media Framing Dynamics of the 'European Refugee Crisis': A Comparative Topic Modelling Approach T Heidenreich F Lind J.-M Eberl H G Boomgaarden Journal of Refugee Studies 32 1 2019
- Framing Immigration in Western Europe M Helbling Journal of Ethnic and Migration Studies 40 1 2014
- Mapping Cultural Schemas: From Theory to Method M B F Hunzaker L Valentino American Sociological Review 84 5 2019
- Interpretable Word Embeddings via Informative Priors M Hurtado Bodell M Arvidsson M Magnusson Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) S Padó R Hong Huang Kong the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) China Association for Computational Linguistics 2019
- From Documents to Data: A Framework for Total Corpus Quality M Hurtado Bodell M Magnusson S Mützel Socius 8 2022
- Incorporating Lexical Priors Into Topic Models J Jagarlamudi H Daumé III R Udupa Proceedings of the 13th Conference of the European Chapter W Daelemans Avignon the 13th Conference of the European Chapter France Association for Computational Linguistics 2012
- Cultural Globalization and Arts Journalism: The International Orientation of Arts and Culture Coverage in Dutch, French, German, and US Newspapers, 1955 to 2005 S Janssen G Kuipers M Verboord American Sociological Review 73 5 2008
- Analytical Sociology Amidst a Computational Social Science Revolution B F Jarvis M Keuschnigg P Hedström Handbook of Computational Social Science U Engel A Quan-Haase S X Liu L Lyberg Oxfordshire, England, UK Routledge 2021
- Rhetorics of Radicalism D Karell M Freedman American Sociological Review 84 4 2019
- Analytical Sociology and Computational Social Science M Keuschnigg N Lovsjö P Hedström Journal of Computational Social Science 1 1 2018
- Computer-Assisted Keyword and Document Set Discovery from Unstructured Text G King P Lam M E Roberts American Journal of Political Science 61 4 2017
- Discursive Opportunities and the Evolution of Right-Wing Violence in Germany R Koopmans S Olzak American Journal of Sociology 110 1 2004
- U Korkut G Bucken-Knapp A Mcgarry J Hinnfors H Drake The Discourses and Politics of Migration in Europe New York Springer 2013
- The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings A C Kozlowski M Taddy J A Evans American Sociological Review 84 5 2019
- We Are a Small Country That Has Done Enormously Lot': The? Refugee Crisis? and the Hybrid Discourse of Politicizing Immigration in Sweden M Krzyżanowski Journal of Immigrant & Refugee Studies 16 1-2 2018
- History and Changes of Swedish Migration Policy A Kupsky Journal of Geography, Politics and Society 7 3 2017
- Deciding Who's Legitimate: News Media Framing of Immigrants and Refugees A Lawlor E Tolley International Journal of Communication 11 2017
- Terrorist Events and Attitudes Toward Immigrants: A Natural Experiment J Legewie American Journal of Sociology 118 5 2013
- Contextual Text Coding: A Mixed-Methods Approach for Large-Scale Textual Data M Lichtenstein Z Rucks-Ahidiana Sociological Methods & Research 52 2 2023
- Improving Cultural Analysis: Considering Personal Culture in its Declarative and Nondeclarative Modes O Lizardo American Sociological Review 82 1 2017
- Culture, Cognition, and Internalization O Lizardo Sociological Forum 36 S1 2021
- Multi-Aspect Sentiment Analysis with Topic Models B Lu M Ott C Cardie B Tsou H Wang D Cook IEEE 11th International Conference on Data Mining Workshops M Spiliopoulou Canada IEEE 2011. 2011
- Post-Hoc Interpretability for Neural NLP: A Survey A Madsen S Reddy S Chandar ACM Computing Surveys 55 8 2021
- Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models M Magnusson L Jonsson M Villani D Broman Journal of Computational and Graphical Statistics 27 2 2018
- Voices from the Far Right: A Text Analysis of Swedish Parliamentary Debates M Magnusson R Öhrvall K Barrling D Mimno 2018. May 21, 2024
- Resonance and Radicalism: Feminist Framing in the Abortion Debates of the United States and Germany M Marx Ferree American Journal of Sociology 109 2 2003
- J W Mohr C A Bail M Frye J C Lena O Lizardo T E Mcdonnell A Mische I Tavory F F Wherry Measuring Culture New York Columbia University Press 2020
- Toward a Computational Hermeneutics J W Mohr R Wagner-Pacifici R L Breiger Big Data & Society 2 2 2015
- Graphing the Grammar of Motives in National Security Strategies: Cultural Interpretation, Automated Text Analysis and the Drama of Global Politics J W Mohr R Wagner-Pacifici R L Breiger P Bogdanov Poetics 41 6 2013
- To Measure Meaning in Big Data, Don't Give Me a Map, Give Me Transparency and Reproducibility L K Nelson Sociological Methodology 49 1 2019
- Computational Grounded Theory: A Methodological Framework L K Nelson Sociological Methods & Research 49 1 2020
- Cycles of Conflict, a Century of Continuity: The Impact of Persistent Place-Based Political Logics on Social Movement Strategy L K Nelson American Journal of Sociology 127 1 2021
- Leveraging the Alignment Between Machine Learning and Intersectionality: Using Word Embeddings to Measure Intersectional Experiences of the Nineteenth Century US South L K Nelson Poetics 88 2021
- The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods L K Nelson D Burk M Knudsen L Mccall Sociological Methods & Research 50 1 2021
- The Dangers of Using Proprietary LLMs for Research É Ollion R Shen A Macanovic A Chatelain Nature Machine Intelligence 6 2024
- Humanistic Interpretation and Machine Learning J Pääkkönen P Ylikoski Synthese 199 1 2021
- The Poverty of Historicism K R Popper 1957 Routledge Oxfordshire, UK
- Competing News Frames and Hegemonic Discourses in the Construction of Contemporary Immigration and Immigrants in the United States S Quinsaat Mass Communication and Society 17 4 2014
- Schemas, Interactions, and Objects in Meaning-Making C M Rawlings C Childress Sociological Forum 36 2021
- Structural Topic Models for Open-Ended Survey Responses M E Roberts B M Stewart D Tingley C Lucas J Leder-Luis S K Gadarian B Albertson D G Rand American Journal of Political Science 58 4 2014
- Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Cynthia Rudin 0000-0003-4283-2780 10.1038/s42256-019-0048-x Nature Machine Intelligence Nat Mach Intell 2522-5839 1 5 2019 Springer Science and Business Media LLC
- A Rule J.-P Cointet P S Bearman Lexical Shifts, Substantive Changes, and Continuity in State of the Union Discourse 2015 112
- The Radical Right and the End of Swedish Exceptionalism J Rydgren S Van Der Meiden European Political Science 18 2019
- Bit by Bit: Social Research in the Digital Age M J Salganik 2018 Princeton University Press Princeton, NJ
- Framing as a Theory of Media Effects D A Scheufele Journal of Communication 49 1 1999
- Agenda-Setting, Priming, and Framing Revisited: Another Look at Cognitive Effects of Political Communication D A Scheufele Mass Communication and Society 3 2-3 2000
- The End of Swedish Exceptionalism? Citizenship, Neoliberalism and the Politics of Exclusion C.-U Schierup A Ålund Race & Class 53 1 2011
- Did You Read About Berlin?' Terrorist Attacks, Online Media Reporting and Support for Refugees in Germany A Schmidt-Catran C S Czymara Soziale Welt 71 2-3 2020
- Historical events as transformations of structures: Inventing revolution at the Bastille William H Sewell 10.1007/bf00159818 Theory and Society Theor Soc 0304-2421 1573-7853 25 6 1996 Springer Science and Business Media LLC
- A Paper Ceiling: Explaining the Persistent Underrepresentation of Women in Printed News E Shor A Van De Rijt A Miltsov V Kulkarni S Skiena American Sociological Review 80 5 2015
- Utrikes Födda i Sverige Statistics Sweden 2022. May 21, 2024
- Population and Population Changes 1749-2023 Statistics Sweden 2024. May 21, 2024
- Cultural Cartography with Word Embeddings D S Stoltz M A Taylor Poetics 88 2021
- The Revival of Narrative: Reflections On a New Old History L Stone Past & Present 85 1979
- A Cognitive Theory of Cultural Meaning C Strauss N Quinn 1997 Cambridge University Press Cambridge, UK
- Tusen år av Invandring. En Svensk Kulturhistoria I Svanberg M Tydén 1998 Arena Stockholm, Sweden
- Culture in Action: Symbols and Strategies A Swidler American Sociological Review 51 2 1986
- Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts M A Taylor D S Stoltz Sociological Science 7 2020
- Chatgpt-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning P Törnberg 2023. March 5, 2024
- A Törnberg P Törnberg Muslims in Social Media Discourse: Combining Topic Modeling and Critical Discourse Analysis 2016 13
- A Frame of Mind: Using Statistical Models for Detection of Framing and Agenda Setting Campaigns O Tsur D Calacci D Lazer Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing C Zong M Strube Beijing the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing China Association for Computational Linguistics 2015 1
- From Strange to Normal: Computational Approaches to Examining Immigrant Incorporation Through Shifts in the Mainstream Andrea Voyer 0000-0003-0338-3273 Zachary D Kline 0000-0003-4942-929X Madison Danton Tatiana Volkova 10.1177/00491241221122596 Sociological Methods & Research Sociological Methods & Research 0049-1241 1552-8294 51 4 2022 SAGE Publications
- What Is an Event? R Wagner-Pacifici 2017 University of Chicago Press Chicago, IL
- Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences K Watanabe A Baturo Social Science Computer Review 42 1 2024
- Package 'seededlda' K Watanabe P Xuan-Hieu M K Watanabe 2022. February 2, 2023
- Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches K Watanabe Y Zhou Social Science Computer Review 40 2 2022
- Thoughts on Agenda Setting, Framing, and Priming D H Weaver Journal of Communication 57 1 2007
- Creating and Comparing Dictionary, Word Embedding, and Transformer-Based Models to Measure Discrete Emotions in German Political Text T Widmann M Wich Political Analysis 31 4 2023
- M L Wood D S Stoltz J Van Ness M A Taylor Schemas and Frames." Sociological Theory 36 3 2018
- Large Teams Develop and Small Teams Disrupt Science and Technology L Wu D Wang J A Evans Nature 566 7744 2019
- Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures L Ying J M Montgomery B M Stewart Political Analysis 30 4 2022
- Can Large Language Models Transform Computational Social Science? Caleb Ziems William Held Omar Shaikh Jiaao Chen Zhehao Zhang Diyi Yang 10.1162/coli_a_00502 Computational Linguistics 0891-2017 1530-9312 50 1 2024 MIT Press
Metadata
Issues
No public issues have been filed for this DOI.
Submit an issue
Record history
| When | Event | Field | Old | New |
|---|---|---|---|---|
| 2026-06-18 19:37:53.011249+00:00 | identifier_assigned | DSEID | DSEID-001-3214718 | |
| 2026-06-18 15:19:00.064116+00:00 | pdf_processed | pdf_sha256 | 45d5c56d7dde4ae8e0159bd894f6fd000a63315b9d2396e4bf4a4b2ea2e09a60 |