Schedule

Watch the videos!
Thursday
Friday
Saturday
08:15
08:15
Door opens
08:15
10:30
Registration desk
ETH Main Building
HG Main Hall
09:00
09:20
Opening ceremony
ETH Main Building
Main RoomHG F1
09:20
10:00
Impact of AI
Keynote: More than a Glitch - confronting Race, Gender and Ability Bias in Tech
ETH Main Building
Main RoomHG F1
10:00
10:20
Coffee Break
10:20
10:40
Impact of AI
Inside the Suspicion Machine: A New Framework for Algorithmic Accountability Reporting
ETH Main Building
Main RoomHG F1

Effective explanatory reporting on the societal impacts of machine learning systems deployed by governments remains out of reach for many newsrooms. Lighthouse Reports and WIRED’s recent investigation into fraud prediction algorithms makes both the risks of automated decision-making accessible to a wider audience and the path for other reporters seeking to carry out similar investigations much clearer.

Our reporting was based on gaining far reaching access into a fraud prediction system used in Rotterdam, including the trained model file, the training data, and the source code but the lessons are useful even with more limited access.

This Practice Talk will describe a four-step framework to report out the full lifecycle of a machine learning algorithm.

First, we will describe how to interrogate the use of sensitive input features, which can illustrate a system’s discriminatory potential.

Second, we will explain how the training data itself – including the sampling strategy and what preprocessing steps were taken – remains an underexplored field in algorithmic accountability reporting.

Third, the type of algorithm itself has important procedural fairness implications, and we will explore how journalists can understand and explain how probabilistic and non-linear prediction tools sit at odds with the procedural norms of the administrative state.

Fourth, we will dig into our experimental design to measure disparate impact across protected groups, which can expose the real-world harm of such systems. No matter which level of access journalists obtain, our Practice Talk will provide technical guidance and ethical considerations for investigative approaches.

10:40
11:00
Impact of AI
The Ethical Dimensions of Data Quality for Automated Fact-Checking
ETH Main Building
Main RoomHG F1

Automated fact-checking (AFC) has grown in popularity to address the online spread of misinformation, propaganda, and misinformation about critical contemporary issues. Various natural language processing, machine learning, knowledge representation and database techniques have been used in AFC, whereas, from an end-user perspective, little attention was paid to the quality of the datasets feeding these information systems.

Considering the recognised need to blend AI-based tools with journalistic values, this research proposes a practical framework for assessing and improving data quality when developing or implementing AFC systems.

Drawing on an interdisciplinary approach, it contributes to understanding how to better align AI-based solutions with ethical standards in journalism and fact-checking.

11:00
11:40
Impact of AI
Keynote: A Research Perspective on Large Language Models
ETH Main Building
Main RoomHG F1
11:40
12:00
Impact of AI
TITAN: an AI-based citizen coaching system to counter disinformation
Massimo Magaldi, Trisha Meyer, Ana Fernandez Inguanzo
ETH Main Building
Main RoomHG F1

Recent scandals of propaganda and fake news that are distributed massively online brought the need to develop AI systems to fight disinformation, such as fact-checking websites and moderation systems within social networks. At the same time, AI-driven large language models, such as ChatGPT raise awareness on how these tools contain biases as well as risks that potentially harm users, such as collecting sensitive data or supporting particular ideologies.

TITAN is a Horizon Europe research project that aims at building a chatbot (conversational tool) based on Socratic dialogue to help citizens with critical thinking skills.

The purpose of the project is to build an AI system capable of supporting citizens with their critical skills by implementing a novel Socratic methodology within the tool. Thus, questions will be provided to the users that increase their awareness of certain typical disinformation signals, such as clickbait content, tampered images, logical fallacies.

In addition, the system will offer microlessons to citizens on topics, such as fact-checking and AI ethics. Many issues come to place in this project, and we would like to discuss the limitations and possibilities of a project such as TITAN, an AI-based citizen coaching ecosystem that empowers citizens to arrive at their logical conclusions about statements in the news.

12:00
13:00
Lunch
13:00
13:40
Datajournalism
The impact of disinformation and AI on newsrooms: perceptions, challenges and solutions
Simón Peña Fernández, Beatriz Gutiérrez-Caneda, Pablo Sanguinetti, Jaume Suau Martínez, Maria Sanchez Gonzalez, Bella Palomo
ETH Main Building
Panel RoomHG F26.3

Social media and artificial intelligence (AI) have distorted notions of authority, shaping the most complex media system in history, generating an emotional and existential impact on journalism. The AI-disinformation binomial is linked, making it possible to find solutions or increase threats, because sometimes progress can lead to setbacks.

In this context, disinformation and AI have become both causes of concern in newsrooms and two priority lines of research at the global level. In this pannel, we aim to reflect about the behaviour and perception of journalists towards disinformation and AI, how they react in the face of information disorder, the challenge of the introduction of new routines, how they draw attention to the problem on media agendas, and the ethical and social responsibility applied on this complex scenario.

Speakers present also constructive perspectives about AI focused on detection of false content and prevent its spread. All the panelists are members of national projects with expertise in the topic of the panel.

SPEAKER 1. Simón Peña Fernández (University of Basque Country). Title: What do journalists think about the use of generative AI in the media? Simón Peña Fernández Koldobika Meso Ayerdi Urko Peña Alonso The advance of disruptive technologies that deepen the digitization of the media affects some of the issues that until now had been considered exclusively human, such as journalism. The media themselves spread the message that it will be ubiquitous and applicable to any task that people perform (Brennen, Howard & Nielsen, 2022). The vision of journalists on AI is ambivalent. On the one hand, they consider automated content endangers the integrity of the profession (Pérez-Dasilva et al., 2021; Noain-Sánchez, 2022), as it can reduce the number of jobs and question the symbolic capital of journalists. as mediators between reality and citizenship. On the other hand, AI can also be understood as a set of tools and technologies that can free journalists from performing simple and repetitive tasks, and spend more time on issues that bring them closer to the essence of the profession (Wu, Tandoc & Salmon, 2019; DalBen & Jurno, 2021). In this context, this research analyzes the opinion of 420 Spanish journalists on generative AI and its impact on the media through Social Network Analysis. To this end, the messages that have been published on Twitter from their personal accounts on tools such as Chat GTP, Google Bard or Bing have been studied during the last 6 months, and the discourse they make about them has been characterized.

SPEAKER 2. Beatriz Gutiérrez-Caneda (University of Santiago de Compostela) . Title: Impact of AI on fact-checking sites: professional profiles, work routines and strategies for the development of applied technologies Beatriz Gutiérrez-Caneda Jorge Vázquez-Herrero Xosé López-García In a misinformation growth context, an early and agile response is a necessary resource. The introduction of artificial intelligence in fact-checking sites has accelerated processes, thanks to automation. In doing so it has modified work routines, impacting professional profiles as well as business models and international collaboration strategies in these organizations. High technology has burst into the journalistic landscape as a disruptive and transformative element. Specifically, AI has been introduced in the different phases of the information process: automating data collection and the creation, production, and dissemination of content; interaction with users; or fact-checking routines. This transformation has also had an impact on fact-checking sites, optimizing some processes, modifying professional profiles, and affecting funding and cooperation strategies. The aim of this paper is to analyze how fact-checking sites use AI in verification routines, in which parts of the process it intervenes, what competencies it demands from professionals, what is the level of international cooperation, and how all these aspects affect strategies to develop and apply these technologies. This research analyzes a sample of ten fact-checking sites –Newtral, Maldita.es, Verificat, Fullfact, Chequeado, Aos Fatos, Salud con Lupa, Pagella Politica, Polígrafo, Snopes– through the observation of their initiatives and interviewing professionals and experts of the field. Preliminary results show the importance of automation and the inclusion of AI in fact-checking routines to achieve a faster and more relevant response.

SPEAKER 3. Pablo Sanguinetti (University of Malaga). Title: Hype as fake news: The role of media narratives in AI disinformation Pablo Sanguinetti Bella Palomo Media narratives strongly impact public perceptions of Artificial Intelligence (AI) and can be considered a key ethical aspect around this technology. However, research shows that stories in the press are often polarised between notions of exaggerated expectations and fears (Chubb, J., Reed, D. & Cowling, 2022). The same can be said about misleading images that illustrate many articles (Romele, 2022). In this talk, we connect this biased coverage of AI with a lack of social literacy about its limits and risks, and therefore with a more fertile ground for disinformation that is created or spread with this technology. We do this by tracing narratives and images that might awake a false expectation of pseudo-artificial general intelligence (Brennen et al., 2020) with a qualitative analysis of 20 stories related to AI with the most engagement in Twitter by each of two major European outlets in two languages (El País and The Guardian) since the presentation of ChatGPT in December 2022. We suggest that stories that quote ChatGPT as a source or uncritically report about corporate or government announcements are not only a form of disinformation per se, but also a way of making the audience more prone to see an unreal authority in machine learning models and eroding the critical stance that the public needs to detect fake content spread or created by new technologies.

SPEAKER 4. Jaume Suau (Ramon Llull University). Title: "Assessing the impact of disinformation narratives" Jaume Suau David Puertas Irene Ruiz Current academic debates show growing concerns about disinformation pernicious effects (Jungherr and Schroder, 2021). However, some research highlights limited impact among general public (Masip et al. 2020). To assess impact researchers focused on the study of the spread or reach of disinformation content, studying different online platforms (Allen et al., 2020), with problems to identify then spread through closed platforms such as WhatsApp or Telegram (Masip et al. 2021). Other studies understand impact as the capacity of disinformation to change or reinforce certain attitudes after exposure (Zerback et al. 2020; Guess et al., 2019). Despite their relevance, both approaches tend to conceptualize online disinformation as existing in a vacuum, disregarding how disinformation narratives can holistically spread among social media platforms and online or legacy media (Suau, 2022). We suggest to assess impact by researching the spread and trustfulness of main disinformation narratives. To do so we team up with different fact-checkers and civic society organizations in different countries, to have daily information about fact checks and reported disinformation content, disregarding on/offline origin. Material was analysed to cluster similar content into disinformation narratives during early 2022. A series of surveys were conducted in Spain, UK, Germany, Serbia, France, Kosovo, Italy and Czech Republic to test the impact, spreading patterns and effectivity of clustered narratives. Results show different levels of impact in different countries and for different narraryives: those narratives more connected to ideological issues are more likely to spread among the population, although the level of impact is not generally higher than 15%. Likewise, those citizens more ideologically polarized are more likely to receive and trust disinformation content. Television seems to be an amplifier of disinformation narratives, with Twitter and Facebook having scarce importance, while WhatsApp keeps a moderate profile. Results point also towards relevant differences in national patterns.

SPEAKER 5. Maria Sanchez (University of Malaga). Title: Artificial Intelligence, one more weapon in the fight against disinformation: solutions from the Hispanic side of the IFCN Sánchez González, María Sánchez Gonzales, Hada Mª Martínez Gonzalo, Sergio

Hoaxes have been proliferating, especially online, where certain technologies facilitate their creation and/or expansion. Among them, Artificial Intelligence (hereinafter AI), which is, at the same time, a tool at the service of those who fight against misinformation, the so-called fact-checkers. We present the results of a study that provides an overview of the role of AI in Spanish and Ibero-American baseline verification platforms (members of IFCN's #CoronavirusFacts Alliance, 17 altogether). AI development projects are located, classified and analyzed (14, out of 8 verifiers), and as a complement, the vision of those responsible is collected.

The scenario is characterized by an uneven and incipient use of AI. It is almost always limited to the big verifiers. Most are bots and other typologies aimed at assisting the news verification process, not so much at generating content. And there is agreement among professionals in terms of evaluating these projects very positively and highlighting the potential of AI for their activity.

We conclude by raising issues for debate such as the possibilities yet to be exploited; the co-creation strategies that can help less capable fact-checkers to incorporate AI into their routines; or the importance of interdisciplinary collaboration, with teams of different profiles within each medium, to take advantage of these technologies.

Read more
13:00
13:20
Datajournalism
Using freedom of information laws to improve COVID-19 data transparency for jails
ETH Main Building
Main RoomHG F1

Without high-quality COVID-19 data, it is impossible to improve public health decision-making and outcomes. These limitations disproportionately affect vulnerable and marginalized groups such as incarcerated populations.

In California, the most populous state in the United States, local jails that incarcerate over 80,000 people pending trial or sentencing on any given day have notoriously fallen through the cracks regarding COVID-19 data transparency. No state authority has established a technologically advanced and centralized effort to capture this critical data.

The Covid In-Custody Project (www.covidincustody.org) is a data journalism initiative that leverages freedom of information laws and technology such as image recognition, web scraping and analytics to aggregate data on infection and vaccination rates in hundreds of public records retrieved from law enforcement agencies. We uncovered over 60,000 positive cases and several deaths that were previously kept under wraps. Our data is integrated in the United States’ Centers for Disease Control’s (CDC) COVID-19 tracker for correctional facilities via a partnership with the University of California Los Angeles School of Law.

The reluctance from law enforcement agencies to disclose public records, and their backwardness in data documentation is a curious case in unbridled power in these institutions. Through our efforts, we show that the COVID-19 pandemic in California jails is a perfect example of the value and opportunity to bring technology, law and policy together in the public sector at a national, state and local level, and the consequences of not doing so.

13:20
13:40
Datajournalism
Quotatives Indicate Decline in Objectivity in U.S. Political News
Tiancheng Hu, Manoel Horta Ribeiro, Robert West, Andreas Spitz
ETH Main Building
Main RoomHG F1

According to journalistic standards, direct quotes should be attributed to sources with objective quotatives such as "said" and "told", as nonobjective quotatives, like "argued" and "insisted" would influence the readers' perception of the quote and the quoted person.

In this paper, we analyze the adherence to this journalistic norm to study trends in objectivity in political news across U.S. outlets of different ideological leanings. We ask:

1) How has the usage of nonobjective quotatives evolved? and

2) How do news outlets use nonobjective quotatives when covering politicians of different parties?

To answer these questions, we developed a dependency-parsing-based method to extract quotatives and applied it to Quotebank, a web-scale corpus of attributed quotes, obtaining nearly 7 million quotes, each enriched with the quoted speaker's political party and the ideological leaning of the outlet that published the quote. We find that while partisan outlets are the ones that most often use nonobjective quotatives, between 2013 and 2020, the outlets that increased their usage of nonobjective quotatives the most were "moderate" centrist news outlets (around 0.6 percentage points, or 20% in relative percentage over 7 years).

Further, we find that outlets use nonobjective quotatives more often when quoting politicians of the opposing ideology (e.g., left-leaning outlets quoting Republicans), and that this "quotative bias" is rising at a swift pace, increasing up to 0.5 percentage points, or 25% in relative percentage, per year. These findings suggest an overall decline in journalistic objectivity in U.S. political news.

13:40
14:00
Datajournalism
How to investigate the digital divide in the United States
ETH Main Building
Main RoomHG F1

This contributed talk will reveal the behind-the-scenes process of a data-intensive eight month-long investigation into the digital divide in the United States.

We will discuss our initial findings and methodology for the award-winning series, “Still Loading,” which found that four of the nation’s largest internet service providers (ISPs) each charged the same price for drastically different speeds, depending on where you live. Further, we found that the “worst deals”-- being asked to overpay for slow speeds, disproportionately fell upon historically marginalized areas in all but two of the 38 major American cities we tested.

We will also share our publication strategy, which involved partnering with the AP for distribution, as well as authoring The Markup’s first “story recipe” for localization. We’ll explain what we optimized for in the story recipe, and our experience walking nine newsrooms through our data for their original reporting.

Lastly, we’ll share our future plans for the series– including an ambitious citizen science guide intended to lower technical hurdles so that anyone with internet access (such as grassroots organizations, local governments, educators, and journalists) can collect internet service plans for any provider in their area. To do this, we partnered with Big Local News to build a tool that simplifies collecting representative samples of random street addresses in the United States (called the United States Place Sampler), and used Census Reporter to automatically populate addresses with socioeconomic data merged from the American Community Survey.

14:00
14:40
Datajournalism
From J-Schools and Newsrooms to Global Industry Initiatives: What can we learn from journalistic AI education across contexts?
Anna Schjøtt Hansen, Nadja Schaetz, Nicholas Diakopoulos, Bronwyn Jones, Nele Goutier
ETH Main Building
Panel RoomHG F26.3

Over the last decade, artificial intelligence (AI) applications have increasingly moved into newsrooms across the globe and, as a result, have become an almost inevitable part of the future of journalism. As a result, journalists face a future in which AI-skills and literacy are becoming more vital for their future.

This vitality of AI can be seen in the multiple AI training initiatives that are emerging, such as the Google News Initiative (GNI) ‘Introduction to Machine Learning’ aimed at journalists, Associated Press’ Local AI Initiative self-paced curriculum and JournalismAI’s Academy for small newsrooms that are aimed at addressing the emerging knowledge gap that journalists around the world face. Equally, journalism educations have over the last decade begun to include more specialized classes on data and computational journalism, to prepare future journalists for a changed labour market.

With the current rapid pace of the AI developments, media organisations are also faced with a reality in which they must ongoingly develop policies for and train their staff in these emerging technologies.

This panel aims to discuss the role of AI education of journalists across these different contexts; formal journalism education, industry initiatives, and within media organisations. This will also facilitate a discussion of the implications of AI in journalism, as the panellists will explore how these different education practices participate in transforming the journalistic field.

Read more
14:00
14:20
Datajournalism
Data Journalism Roadmapping: A Conceptual Approach to Embrace Data Storytelling Formats in the Journalism Business Model
Mathias Felipe de-Lima-Santos
ETH Main Building
Main RoomHG F1

It has been almost a decade since the term data journalism was coined. Since then, much has changed and often in unexpected ways. Data journalism has become a global phenomenon. However, much of the ongoing and fast-paced evolution of the practice still occurs in well-resourced news outlets. Smaller organizations continue to struggle to understand how to deploy data skills in their newsrooms.

This study draws on extensive research involving over 60 interviews with experts on four continents (Americas, Europe, Asia, and Oceania) and participant observations in four well-known data-driven news outlets—La Nación (Argentina), ProPublica (US), Al Jazeera (Qatar), and BBC (UK)—to illuminate best practices that transcend organizational differences and contribute to calibrating the execution of journalistic data-driven projects.

By examining the infrastructure levels of these organizations, that is, key activities, key resources, and partner networks, I propose a roadmap that conceptually describes the key factors in the business model decisions of news organizations to enable the emergence of data-driven storytelling in newsrooms.

Ultimately, this study notes that this roadmap is not a solution but a way to provide guidelines for news outlets that are willing to adopt data journalism practices.

14:20
14:40
Datajournalism
"Your data is racist...": What does equitable data journalism look like when the data itself is biased?
Dana Amihere
ETH Main Building
Main RoomHG F1

Data is EVERYWHERE. Our social interactions, activities and behaviors create a lot of data. And, your data, our data, is racist. The collection, implementation and presentation in data journalism is inherently biased. So, in order to have meaningful conversations about social justice we have to consider the implications and effects of data and structure those discussions around data equity.

Through several case studies of investigative news reporting and AI research, let’s unpack what’s under the hood of today’s data journalism landscape: what’s broken, the effects and outcomes of specific biases, why these exist and how we can improve and reorient for a more equitable practice of data journalism in the future.

14:40
15:00
Datajournalism
Data Journalism Today: A Comparative Analysis of Two Consecutive Surveys
ETH Main Building
Main RoomHG F1

I am excited to propose a talk that highlights the results of two consecutive survey studies on data journalism from 2021 and 2022, carried out by the researcher for the European Journalism Centre. The studies gathered data on the demographics, skills, work practices, challenges, and Covid-19's impact on data journalists.

The value of the results is twofold. First, the sample size for both surveys was large (1594 in 2021, 1809 in 2022) and diverse, consisting of data journalists from different regions, backgrounds, and organizations, meaning the findings are representative and reflective of the current state of data journalism globally.

Second, the surveys covered a range of topics, providing a variety of insights into the field. Conducting the study twice allowed for a temporal analysis, which identified short-term changes in what is a rapidly evolving industry. For example, while the pandemic has had a significant impact on data journalists, we find that the negative effects are reducing over time.

The insights answer critical questions such as: what are the major challenges felt by data journalists? How does data journalism vary, across regions? What are the skills possessed and the tools used? What do organizational structures look like? The answers offer a comprehensive view of data journalism today, of value to practitioners, researchers, and policymakers.

In summary, the two studies provide a rich dataset of information about data journalism, offering valuable insights into the evolving landscape of the profession. My talk will interest attendees of the conference, and I look forward to sharing my findings with the community.

14:55
15:15
Newsroom automation
First Steps Towards a Source Recommendation Engine: Investigating How Sources Are Used in News Articles (recorded video)
Alexander Spangher, James Youn, Jonathan May, Nanyun Peng
ETH Main Building
Panel RoomHG F26.3

A source-recommendation tool to surface useful sources to journalists could save journalists time and diversify the breadth of sources considered. However, to build an effective service, we must first understand journalists' needs: how and why sources are used today.

We take steps towards this goal by building effective source attribution models that can reliably extract a broad variety of sources (i.e. people, documents, databases, etc.) from news articles based on the linguistic patterns associated with their use. We construct a large annotated training dataset and show that models trained on this dataset out-compete previous approaches in the literature. We use these models to audit articles from major news outlets (e.g. New York Times, BBC and others). We find, for instance, that on average, 50% of sentences in these articles have attributable sources.

Finally we show that there are patterns to the way sources are used in news writing by showing, via two experiments, that we can predict when sources need to be added to a news article. We hope in future work to explain these predictions, to study why different types of sources are used together, and ultimately how to recommend them for journalists.

Slides, Video

15:20
15:40
Newsroom automation
The (In-)Transparency of Data Journalism: The Role of Code Repositories in Data Storytelling
Mathias Felipe de-Lima-Santos, Florian Stalph, Marília Gehrke
ETH Main Building
Main RoomHG F1

Transparency has been established as a normative ideal of data journalism. As data-driven stories draw on vast amounts of data and practitioners deploy journalistic code to exploit and communicate such data, measures need to be taken to warrant transparency, and refrain from obscuring news work through code.

Drawing upon the concept of transparency as verifiability and transparency as performativity, this mixed-method study analyzes to what extent prime examples of data stories enact transparency measures.

First, we quantitatively analyzed a sample of data stories to examine to what extent they provide code repositories, documentation, and data.

Secondly, we conducted an interpretative code reading based on critical code studies (CCS) of two data stories (from small and large news organizations) to inspect how these transparency rituals are reflected in the code and data repositories.

Results show that transparency in codes and data repositories manifests to a certain level but is still insufficient. While transparency as verifiability does not seem to be an essential value of these organizations, the performativity transparency could be seen in some of these stories. Our study contributes to extending the debate of transparency in data journalism, discussing paths for it, and downplaying the myth of its broad adoption.

15:20
15:40
Newsroom automation
Large-scale scraping projects
Leon Yin, Ilica Mahajan, Jeff Kao, Ruth Talbot
ETH Main Building
Panel RoomHG F26.3

It's easier and cheaper than ever to do large-scale scraping projects with a small team. In this panel session, speakers will discuss three investigations that involved industry-grade data engineering efforts and rigorous methodologies. Combined, we wrote thousands of lines of code, utilized cloud computing infrastructure, and made use of extensive parallelization.

Our goal is to introduce audience members to key technologies, services, and principles to scale-up data collection for ambitious investigations.

We will share our varied approaches that helped produce our original award-winning work:“Testify”– an investigation by the Marshall Project that scraped nearly a hundred thousand court records to reveal the intricacies of the court system in Cuyahoga County, OH., “Still Loading” – an investigation by The Markup that collected over a million internet service plans to map the digital divide across 38 major U.S cities, and “Inside Google’s Black Box Ad Business” – an investigation by ProPublica that scanned over seven million website domains to deanonymize Google’s vast network of publishers.

We will mention topics such as asynchronous programming, circumventing IP address blocking, headless browsing, reverse-engineering undocumented APIs, idempotency, cloud computing, Amazon Web Services, and Docker.

Read more
15:40
16:00
Newsroom automation
How can a newsroom collaborate with generative AI? Two examples: science communication through visual journalism; AI as an assistant for data journalists
ETH Main Building
Main RoomHG F1

Generative AI is a critical topic for computational journalism in 2023. It will be essential for newsrooms to not only cover it simply as breaking news but to practice progressive applications. We will present two examples of collaboration between a newsroom and generative AI. The first is the science communication of AI technology through visual journalism. The second is the application of AI as an assistant for data journalists.

Firstly, we will suggest that it is socially valuable to make generative AI a subject of science communication through visual journalism. It could reduce social losses due to misunderstanding of the technology and promote constructive discussion. We take our article "Interactive quiz: How does AI generate images?" published in December 2022 as a successful example. This article provides a clear visual explanation of the diffusion model, the underlying mechanism of image-generative AI. It also corrects the misconception that AI generates images by collaging human drawings. This misunderstanding evacuates creators utilizing cutting-edge AI, leading to the atrophy of a productive discussion about the latest technology. This article makes a societal contribution to correcting the misconceptions and encouraging proper debates on critical issues such as copyright for AI-generated products.

Secondly, we will focus on the potential of generative AI as an assistant to improve the productivity of data journalists. For example, it can be helpful in parsing data with low machine-readability, which is common in Japan. It can also help data journalists learn programming skills for data visualization in a short time.

16:00
16:20
Newsroom automation
Adding Quotable Signatures to the Transparency Repertoire in Data Journalism
Marília Gehrke, Simon Erfurth
ETH Main Building
Panel RoomHG F26.3

Fabricated content falsely attributed to reputable news sources is one of the significant challenges for journalism today. One of the manipulation methods is to copy the layout of news websites and substitute the original text.

The manipulated version is then recirculated, making it hard to assess the reliability and trace the origin of such 'information.' Offering an exploratory, descriptive, and solution-oriented approach, we present examples of how this manipulation threatens news outlets and can escalate to data journalism and other specialized forms of news reporting. One reason for that is people's overreliance on numbers and data visualizations as cues to assess the trustworthiness of the content.

Then, we suggest that news organizations and social media platforms incorporate a tool to make the digital information environment safer for users and readers. By presenting quotable signature schemes, a cryptography-based solution, we claim that the transparency repertoire in journalism can be improved and extended.

16:00
16:20
Newsroom automation
Audience reception of news articles made with various levels of automation—and none: Comparing cognitive & emotional impacts
Florian Stalph, Sina Thäsler-Kordonouri, Neil Thurman
ETH Main Building
Main RoomHG F1

Our knowledge about audience perceptions of manually authored news articles and automated news articles is limited. Although over a dozen studies have been carried out, findings are inconsistent and limited by methodological shortcomings. For example, the experimental stimuli used in some have made isolation of the effects of the actual authorship (automated or manual) difficult. Our study attempts to overcome previous studies’ shortcomings to better evaluate audiences’ relative evaluations of news articles produced with varying degrees of automation—and none. We conducted a 3 (article source: manually written, automated, post-edited) × 12 (story topics) between-subjects online survey experiment using a sample (N = 4,734) representative of UK online news consumers by age and gender. Each of the 36 treatment groups read a data-driven news article that was either: (1) manually written by a journalist, (2) automated using a data-driven template, or (3) automated then subsequently post-edited by a journalist. The articles’ authorship was not declared. To minimise confounding variables, the articles in each of the 12 story sets shared the same data source, story angle, and geographical focus. Respondents’ perceptions were measured using criteria developed in a qualitative group interview study with news consumers. The results show that respondents found manually written articles to be significantly more comprehensible—both overall and in relation to the numbers they contained—than automated and post-edited articles. Authorship did not have any statistically significant effect on overall liking of the articles, or on the positive or negative feelings (valence) articles provoked in respondents, or the strength of those feelings (arousal).

16:20
16:40
Newsroom automation
Leveraging generative AI in the newsroom
ETH Main Building
Main RoomHG F1
16:40
17:00
Coffee Break
17:00
17:20
Newsroom automation
Seven Years, Millions of Stories & Growing: Lessons from the Bloomberg’s News Automation Project
Adi Narayan, Telma Marotto
ETH Main Building
Main RoomHG F1

Bloomberg's News Automation Project began seven years ago as a way to cover market news; it now serves as an integral part of the news service's global coverage, saving journalists’ time so they are able to write deeper stories. Our programmatic news generation system produces thousands of stories every day – from always-on mainstays like stock movers to purpose-built automations that cover topics currently in the news.

The scope of our efforts have expanded greatly over time and now encompass fully-automated news, semi-automated data assemblers, and summarization engines that run on-demand and provide context for reporters’ stories.

This session will cover some of the biggest learnings from this project, with lessons for any newsroom looking to incorporate any kind of automation into its workflows. Topics we intend to cover:

1. AI and ML are great and GPT has huge potential, but there’s a lot of value in plain old templating for the sake of accuracy.

2. Standardization is the key to scale. Small decisions on how to format numbers and text can become overly time-consuming. Standardizing and getting buy-in from all areas of the organization is key.

3. System design fundamentals: How separating data wrangling from text generation enabled future expansion.

4. Start small, but start right. Setting up a workflow pipeline before your first project will make it easier to scale.

5. Focusing on accuracy is essential at every stage. A well-designed pipeline can publish thousands of stories each day.

6. Lessons learnt: Address common issues systematically to prevent the same errors from being perpetuated.

17:20
17:40
Newsroom automation
Housekeeping + buffer
ETH Main Building
Main RoomHG F1
17:40
18:20
Newsroom automation
Keynote: South China Morning Post Transformation within Crisis
ETH Main Building
Main RoomHG F1
18:30
00:00
Aperitive & Rooftop Dinner at Dozentenfoyer