Literature Review

text as data NYT text analysis project literature review

Text as Data Project-Literature Review

Kristina Becvar https://kbec19.github.io/NYT-Analysis/ (UMass DACSS Program (My Academic Blog Link))https://kristinabecvar.com
04/15/2022

Liu & Huang (2022)

Abstract

“Climate change” and “global warming” are two popular terms that may be often used interchangeably in news media. This study proposes to give a corpus assisted discourse study of the representations of climate change and global warming in the New York Times (2000–2019) in order to examine how they are actually used in the newspaper. The findings show both similarities and differences in their representations in terms of the associated topics/themes, the particular ways of framing, and the perspectivization strategy employed. It is argued that a corpus-assisted discourse study of a large sample of news articles presents a more accurate picture of the actual use of the two terms in news media.

Chan, et al. (2021)

Abstract

We examined the validity of 37 sentiment scores based on dictionary-based methods using a large news corpus and demonstrated the risk of generating a spectrum of results with different levels of statistical significance by presenting an analysis of relationships between news sentiment and U.S. presidential approval. We summarize our findings into four best practices: 1) use a suitable sentiment dictionary; 2) do not assume that the validity and reliability of the dictionary is ‘built-in’; 3) check for the influence of content length and 4) do not use multiple dictionaries to test the same statistical hypothesis.

van Atteveldt, et al. (2021)

Abstract

Sentiment is central to many studies of communication science, from negativity and polarization in political communication to analyzing product reviews and social media comments in other sub-fields. This study provides an exhaustive comparison of sentiment analysis methods, using a validation set of Dutch economic headlines to compare the performance of manual annotation, crowd coding, numerous dictionaries and machine learning using both traditional and deep learning algorithms. The three main conclusions of this article are that: (1) The best performance is still attained with trained human or crowd coding; (2) None of the used dictionaries come close to acceptable levels of validity; and (3) machine learning, especially deep learning, substantially outperforms dictionary-based methods but falls short of human performance. From these findings, we stress the importance of always validating automatic text analysis methods before usage. Moreover, we provide a recommended step-by-step approach for (automated) text analysis projects to ensure both efficiency and validity.

Burggraff & Trilling (2020)

Abstract

We investigate how news values differ between online and print news articles. We hypothesize that print and online articles differ in terms of news values because of differences in the routines used to produce them. Based on a quantitative automated content analysis of N = 762,095 Dutch news items, we show that online news items are more likely to be follow-up items than print items, and that there are further differences regarding news values like references to persons, the power elite, negativity, and positivity. In order to conduct this large-scale analysis, we developed innovative methods to automatically code a wide range of news values. In particular, this article demonstrates how techniques such as sentiment analysis, named entity recognition, supervised machine learning, and automated queries of external databases can be combined and used to study journalistic content. Possible explanations for the difference found between online and offline news are discussed.

Boukes, et al. (2020)

Abstract

This article scrutinizes the method of automated content analysis to measure the tone of news coverage. We compare a range of off-the-shelf sentiment analysis tools to manually coded economic news as well as examine the agreement between these dictionary approaches themselves. We assess the performance of five off-the-shelf sentiment analysis tools and two tailor-made dictionary-based approaches. The analyses result in five conclusions. First, there is little overlap between the off-the-shelf tools; causing wide divergence in terms of tone measurement. Second, there is no stronger overlap with manual coding for short texts (i.e., headlines) than for long texts (i.e., full articles). Third, an approach that combines individual dictionaries achieves a comparably good performance. Fourth, precision may increase to acceptable levels at higher levels of granularity. Fifth, performance of dictionary approaches depends more on the number of relevant keywords in the dictionary than on the number of valenced words as such; a small tailor-made lexicon was not inferior to large established dictionaries. Altogether, we conclude that off-the-shelf sentiment analysis tools are mostly unreliable and unsuitable for research purposes – at least in the context of Dutch economic news – and manual validation for the specific language, domain, and genre of the research project at hand is always warranted.

Song, et al. (2020)

Abstract

Political communication has become one of the central arenas of innovation in the application of automated analysis approaches to ever-growing quantities of digitized texts. However, although researchers routinely and conveniently resort to certain forms of human coding to validate the results derived from automated procedures, in practice the actual “quality assurance” of such a “gold standard” often goes unchecked. Contemporary practices of validation via manual annotations are far from being acknowledged as best practices in the literature, and the reporting and interpretation of validation procedures differ greatly. We systematically assess the connection between the quality of human judgment in manual annotations and the relative performance evaluations of automated procedures against true standards by relying on large-scale Monte Carlo simulations. The results from the simulations confirm that there is a substantially greater risk of a researcher reaching an incorrect conclusion regarding the performance of automated procedures when the quality of manual annotations used for validation is not properly ensured. Our contribution should therefore be regarded as a call for the systematic application of high-quality manual validation materials in any political communication study, drawing on automated text analysis procedures.

Rudkowsky, et al. (2018)

Abstract

Moving beyond the dominant bag-of-words approach to sentiment analysis we introduce an alternative procedure based on distributed word embeddings. The strength of word embeddings is the ability to capture similarities in word meaning. We use word embeddings as part of a supervised machine learning procedure which estimates levels of negativity in parliamentary speeches. The procedure’s accuracy is evaluated with crowdcoded training sentences; its external validity through a study of patterns of negativity in Austrian parliamentary speeches. The results show the potential of the word embeddings approach for sentiment analysis in the social sciences.

Silva (2017)

Abstract

In this article, I engage with Edward Said’s Orientalism and various perspectives within the othering paradigm to analyze the emergence and transformation of radicalization discourses in the news media. Employing discourse analysis of 607 New York Times articles from 1969 to 2014, this article demonstrates that radicalization discourses are not new but are the result of complex socio-linguistic and historical developments that cannot be reduced to dominant contemporary understandings of the concept or to singular events or crises. The news articles were then compared to 850 government documents, speeches, and other official communications. The analysis of the data indicates that media conceptualizations of radicalization, which once denoted political and economic differences, have now shifted to overwhelmingly focus on Islam. As such, radicalization discourse now evokes the construct radicalization as symbolic marker of conflict between the West and the East. I also advanced the established notion that the news media employ strategic discursive strategies that contribute to conceptual distinctions that are used to construct Muslims as an “alien other” to the West.

Gottlieb (2015)

Abstract

This article introduces a protest news framing cycle and presents the results of a longitudinal analysis of news attention and framing of protest movements. To identify the frame-changing dynamic occurring over time, a content analysis of the news coverage of Occupy Wall Street was conducted on 228 articles and 37 editorials in The New York Times from the start of the protest in September 2011 until long after the protest had subsided in July 2014. The article identifies longitudinal changes in news frames about the economic substance of the protest and the ensuing conflict between protesters and city officials during the occupation. Findings suggest that conflict had a significant impact on the number of news stories about the protest. Further, the results demonstrate how news framing opportunities changed as the movement reached different stages of the news attention cycle. As the movement grew, journalists focused on the movement’s economic grievances, including economic inequality, bank bailouts, and foreclosures. As the movement peaked, news attention shifted to the intensifying conflict between city officials and protesters.

Diakopoulos (2015)

Abstract

The journalistic curation of social media content from platforms like Facebook and YouTube or from commenting systems is underscored by an imperative for publishing accurate and quality content. This work explores the manifestation of editorial quality criteria in comments that have been curated and selected on the New York Times website as “NYT Picks.” The relationship between comment selection and comment relevance is examined through the analysis of 331,785 comments, including 12,542 editor’s selections. A robust association between editorial selection and article relevance or conversational relevance was found. The results are discussed in terms of their implications for reducing journalistic curatorial work load, or scaling the ability to examine more comments for editorial selection , as well as how end-user commenting experiences might be improved.

Grimmer & Stewart (2013)

Abstract

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods–they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

Kothari (2010)

Abstract

This multi-method study examines how the New York Times reported the Darfur conflict in the Sudan, which has led to an estimated 300,000 deaths and over 2.3 million people displaced by the fighting. Drawing on normative media theories and prior studies of Africa’s representation, the role of sources in the frame-building process was analyzed, together with the impact of news-making processes on journalists’ reporting about Darfur. The textual analysis largely supports results of prior studies on news framing of Africa. However, interviews with four New York Times journalists reveal that the individual biases and motives of the journalists and their sources significantly influenced the coverage. While the journalists participated in news-making processes distinguishable by journalist goal, source availability, and source credibility, their sources also provided information that reinforced certain media frames.

Kiousis (2004)

Abstract

Media salience—the key independent variable in agenda-setting research—has traditionally been explicated as a singular construct. Nevertheless, scholars have defined and measured it using a number of different conceptualizations and empirical indicators. To address this limitation in research, this study introduced a conceptual model of media salience, suggesting it is a multidimensional construct consisting of 3 core elements: attention, prominence, and valence. Furthermore, the model was tested through an exploratory factor analysis of The New York Times news coverage of 8 major political issues during the 2000 presidential election as a case study. The data revealed that 2 dimensions of media salience emerge: visibility and valence. Based on the factor analysis, 2 indices are created to measure the construct, which are intended for use in future investigations.

Peng (2004)

Abstract

This study examined the coverage of China in the New York Times and Los Angeles Times between 1992 and 2001. Across time comparison were made both within and between the two newspapers in terms of total number of stories, media frames used and favourability differences. Findings show that coverage of China has increased significantly over time, but the overall tone remained negative. Stories presented in political frames and ideological frames were more likely to be unfavourable. No significant differences were found between the two newspapers.

Althaus & Tewksbury (2002)

Abstract

This study examines whether readers of the paper and online versions of a national newspaper acquire different perceptions of the importance of political issues. Using data from a weeklong experiment in which subjects either read the print version of the New York Times, the online version of that paper, or received no special exposure, this study finds evidence that people exposed to the Times for 5 days adjusted their agendas in response to that exposure and that print readers modified their agendas differently than did online readers.

Pan & Kosicki (1993)

Abstract

In the American political process, news discourse concerning public policy issues is carefully constructed. This occurs in part because both politicians and interest groups take an increasingly proactive approach to amplify their views of what an issue is about. However, news media also play an active role in framing public policy issues. Thus, in this article, news discourse is conceived as a socio-cognitive process involving all three players: sources, journalists, and audience members operating in the universe of shared culture and on the basis of socially defined roles. Framing analysis is presented as a constructivist approach to examine news discourse with the primary focus on conceptualizing news texts into empirically operationalizable dimensions—syntactical, script, thematic, and rhetorical structures—so that evidence of the news media’s framing of issues in news texts may be gathered. This is considered an initial step toward analyzing the news discourse process as a whole. Finally, an extended empirical example is provided to illustrate the applications of this conceptual framework of news texts.