ZUR INVASION AKADEMISCHER ELFENBEINTÜRME 
...

Hybrid Workshop


LLMs and the Patterns of Human Language Use

DATE

LOCATION

ORGANIZERS

29-30 August 2024

Weizenbaum-Institut,
Hardenbergstraße 32
10623 Berlin
& online

Anna Strasser, Bettina Berendt, Christoph Durt, Sybille Krämer

supported & funded by



List of speakers and lecture titles (in alphabetical order):
*if you want to find out more about the speakers click on their name

1. ANNA STRASSER (DenkWerkstatt Berlin / LMU Munich):
What we can learn from developmental psychology for dealing with non-understanding LLMs

2. BETTINA BERENDT & DIMITRI STAUFER (TU Berlin / Weizenbaum Institute):
Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification

3. CHRISTOPH  DURT (University of Heidelberg) & TOBIAS HEY (Karlsruhe Institute of Technology): LLM in software engineering: trace link recovery & the problem of relevance
4. DAVID GUNKEL (Northern Illinois University):
Does Writing Have a Future? – Literary Theory for LLMs

5. ELENA ESPOSITO (University of Bologna / University Bielefeld):
Communication with nonunderstandable machines
6. GEOFFREY ROCKWELL (University of Alberta):
ChatGPT: Chatbots can help us rediscover the rich history of dialogue

7. IWAN WILLIAMS & TIM BAYNE (Monash University):
The NLP Trilemma: Language, Thought, and the Nature of Communication

8. MIA BRANDTNER (LMU Munich):
Interpretative Gaps in LLMs - The Trial to Bridge the Limits of Language with Multi-Modal LLMs

9. STEFANIA CENTRONE (TU Munich) & COSIMO PERINI BROGI (IMT Lucca):
Machine Translation, Problem Solving, Pattern Recognition: An Historical-Phenomenological Analysis

10. SYBILLE KRÄMER (Leuphana University of Lüneburg):
How should the generative power of LLMs be interpreted?
11. XYH TAMURA (University of the Philippines):
How does Esposito’s “Artificial Communication” compare with Gygi’s Japanese “Emergent Personhood”?

CALL FOR REGISTRATION

Limited in person places: First come – first serve
Online participation via Zoom after registration 

Please fill out the registration form and indicate whether you plan to attend in person or virtually.

REGISTRATION FORM

If the registration button does not work, simply copy the link to the form here: https://forms.gle/FgUXChDj1EsvEM8x7

At the end of July, we will send notifications of successful registration. All registered participants will then have the opportunity to participate in a CALL FOR QUESTION (CFQ).

CALL FOR QUESTION

Why?

  • Especially in interdisciplinary events, it is often the case that the representatives of the different disciplines are not really aware of the questions that concern the other disciplines.
  • A CFQ allows the speakers to get an idea of the audience's interests in advance.
  • It is also likely that many of the questions submitted will be ideal for starting the discussion.
  • The CFQ gives everyone the opportunity to develop questions at their own pace. Of course, this is by no means intended to call into question the option of asking questions spontaneously.

How does the CFQ work?

  • Anyone planning to participate in the workshop (online or in person) can use the materials provided on the website to think about which questions are important to them.
  • Once you have received notification of successful registration, you can send one or more questions by e-mail to the DenkWerkstatt Berlin.

SCHEDULE



ABSTRACTS
will be comleted soon

Sybille Krämer: How should the generative power of LLMs be interpreted?
The debate about contemporary generative media goes hand in hand with a sublime anthropomorphism: people and software, especially text-producing chatbots, are understood concerning the model of human cognition and communication, which is either surpassed or failed. The lecture attempts to avoid this category fallacy through a (i) cultural-technical and a (ii) language-philosophical argument:
(i) In terms of cultural techniques, the written character of chatbot interaction must be emphasized. The digitized written material of a society embodies a cultural unconscious, which generative media forensically uncover as patterns and combine into new patterns. However, the token-statistical approach of LLM-based algorithms forms a machine counterpart and alternative to interpretation and hermeneutics, which (exception: complex cryptological practices) is not accessible to humans.
(ii) In terms of the philosophy of language, interaction with chatbots does not have the character of speech acts or communication, which in social life are characterized by the fact that the representation of content is simultaneously the establishment of a social relationship in the act of speaking together. Rather, it is about a co-performance of humans and technology, whose efficiency is based on the constitutive otherness and alterity of technology.


David Gunkel: Does Writing Have a Future?—Literary Theory for LLMs
This paper argues that large language models (LLMs) and generative AI signify not the end of writing but the terminal limits of a particular conceptualization of writing that has been called logocentrism. Toward this end, the analysis will 1) review three fundamental elements of logocentric metaphysics and the long shadow that this way of thinking has cast over the conceptualization and critique of LLMs and generative AI; 2) release a deconstruction of this standard operating procedure that interrupts influential and often-unquestioned assumptions about authorship, truth, and semiology; and 3) formulate the terms and conditions of an alternative way to think and write about LLMs and generative AI that escape the conceptual grasp of logocentrism and its hegemony. In doing so, the paper will argue that writing indeed has a future but only if we reconceptualize how we think about writing and write about thinking.

Bettina Berendt & Dimitri Staufer: Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification
Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other NE labels) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a LLM that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool's effectiveness using court cases from the ECHR and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution (AA) attacks and utility loss statistically using the popular IMDb62 movie reviews dataset. Our method can significantly reduce AA accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content's semantics.




LLMs and the Patterns of Human Language Use
Large Language Models (LLMs) such as ChatGPT and other generative AI systems are the subject of widespread discussions. They are often used to produce output that 'makes sense' in the context of a prompt, such as completing or modifying a text, answering questions, or generating an image or video from a description. However, little is yet known about the possibilities and implications of human-sounding machines entering human communication. The seemingly human-like output of LLMs masks a fundamental difference: LLMs model statistical patterns in huge text corpora, patterns that humans are not normally aware of. Humans do perceive patterns at various levels, but when we produce ordinary language, we do not explicitly compute statistical frequency distributions.
The workshop aims at an interdisciplinary and philosophical understanding of the processing of statistical patterns of LLMs and their possible function in communicative exchange. In the controversy about the communicative potential of LLMs, we start from the thesis that LLMs do not understand meaning and investigate the extent to which they can nevertheless play a role in communication when people interact with them. To this end, concrete examples of LLM applications, such as the use of LLMs in software engineering and for whistleblower protection, will be explained by and discussed with their developers. This is important not only for a better understanding of the kinds of exchanges that are possible with LLMs, but also for the question of how far we can trust them and which uses are ethically acceptable.


CALL FOR PAPER

– hybrid Workshop in Berlin ‘LLMs and the Patterns of Human Language Use‘–
Deadline: 15.4. 2024


There are some slots reserved for this CFP.

We are looking for international contributions from computer sciences or philosophy and invite you to submit an extended abstract. We encourage joint presentations by researchers from different disciplines due to the interdisciplinary nature of the subject.

To address these issues, the following questions could serve as a starting point:
(1) Are there certain types of language games that can be modeled almost perfectly by LLMs, and are there others that resist computational modeling?
(2) What kinds of patterns in human language use, widely recognized as key features of cultural evolution, can be modeled by computations of statistical patterns?
(3) What is the relationship between patterns and rules?
(4) What role do patterns play for LLMs, and what role do they play for humans?
(5) Are there examples of successful human-human communication where understanding cannot be attributed to all participants?
(6) Given that the text production of LLMs is so radically different from that of humans, to what extent can communicative principles such as trust and reliability be applied to human-machine interaction?

  • Please send an abstract with 500 to max. 1000 words as Pdf- or Word document plus a short biographical note to: berlinerdenkwerkstatt@gmail.com
  • Please use the following subject when submitting: Submission for LLMs in Berlin
  • Deadline: 15.4.2024

The workshop is funded by the DFG, and travel & accommodation costs can be reimbursed under the usual conditions. For environmental reasons, however, we also welcome remote participation, especially concerning transatlantic flights.


 THE ORGANIZERS

ANNA STRASSER

BETTINA BERENDT

CHRISTOPH DURT

SYBILLE KRÄMER

(DenkWerkstatt Berlin / LMU Munich)

(TU Berlin / Weizenbaum Institute)

(University of Heidelberg)

(Leuphana University of Lüneburg)


Longer description

LLMs based on generative AI are often used to produce output that makes sense in relation to a prompt, such as completing or modifying a text, or producing an image or video from a description. But the apparent human-like output masks a fundamental difference: LLMs model statistical patterns in huge corpora of text, patterns that humans are usually either unaware of or only tacitly aware of. Humans do experience patterns at various levels, often quite vividly, but when we produce ordinary language, we do not explicitly compute statistical patterns.
Rather, people make sense of language, although even within a discourse on the same topic, the degree and manner of understanding can vary widely between people. However, meaningful exchange is still possible to some extent, even if the participants have very different understandings of the topic, and some may have no understanding at all. By exploiting statistical patterns in large corpora of text, LLMs produce text that is – to an astonishing degree – grammatical and meaningful to us, and one can expect further surprises. The relationship between meaningful language use and statistical patterns is an open question and considering it in the context of LLMs promises new insights. This will be important not only for a better understanding of the kinds of exchanges possible with LLMs, but also for questions about how much we can trust them and what uses are ethical.
In the international and interdisciplinary workshop, we will discuss the ways in which, despite the fundamental difference in text production, LLMs can still participate in human language games. Are there certain types of language games that can be modeled almost perfectly by LLMs, and are there others that resist computational modeling? It is widely recognized that patterns are an important feature of human cultural development. What kinds of patterns in human language use can be modeled by computations on statistical patterns? What is the relationship between patterns and rules? What is the role of patterns for LLMs, and what is their role in experience and language? Since LLM text production is so radically different from that of humans, can communicative principles such as trust and reliability apply to human-machine interaction? We will discuss these and other questions, both in terms of the fundamental philosophical issues involved, and in terms of concrete new and future applications of LLMs. Philosophical insights on the topic will be brought into discourse with the experience of computational scientists developing new applications.