MysticIR Repository  ยท  Working Paper  ยท  ๐Ÿ’ฉ Linguistics  ยท  Submitted for peer review
Submitted for peer review. Reviewers are asked to engage with the argument, not the subject matter.

Why ๐Ÿ’ฉ Linguistics
Deserves to Exist

A Defense
Corpus Linguistics | Computational Pragmatics | The author did not choose this topic voluntarily.
Abstract

This document argues that the systematic computational analysis of scatological language โ€” hereafter referred to as ๐Ÿ’ฉ linguistics โ€” constitutes a legitimate and underexplored subfield of corpus linguistics. We identify three structural claims in support of this position and briefly address anticipated objections. Source code is available in this repository. The author did not choose this topic voluntarily.

ยง 1

The Universality Argument

Every human language, without known exception, has developed dedicated vocabulary for excretion and its products. This cross-linguistic universality is not trivial. When linguists observe a feature present in 100% of documented languages, the standard inference is that the feature encodes something cognitively or socially fundamental.

Kinship terms are universal. Color terms are universal.
And yes โ€” poop terms are universal.

To exclude ๐Ÿ’ฉ vocabulary from serious linguistic analysis is therefore not neutrality. It is a motivated omission dressed as taste.

ยง 2

The Pragmatics Argument

๐Ÿ’ฉ-related language is pragmatically rich in ways that reward formal analysis:

A field that ignores its most pragmatically active vocabulary is leaving signal on the table.

ยง 3

The Computational Gap Argument

A search of major NLP benchmarks reveals that scatological terms are:

This creates a measurable blind spot. Models trained on sanitized corpora learn that certain high-frequency real-world tokens effectively do not exist. The downstream effect is not cleanliness โ€” it is brittleness.

๐Ÿ’ฉ linguistics, operationalized as Python tooling over raw corpora, directly addresses this gap.

ยง 4

Anticipated Objections

This is not serious.
The seriousness of a subject is determined by the rigor of its methods, not the dignity of its object. Phonologists have published careers on clicks and grunts. Semanticists have written monographs on the word "um." The bar is methodology, not prestige.
No one will cite this.
George Zipf's most famous insight emerged from counting word frequencies by hand. Robin Lakoff's foundational work on gendered language was dismissed as anecdote. Citation counts follow paradigm shifts, they do not precede them.
The author clearly didn't want to do this.
Correct. The author was compelled by AI tooling to pursue this line of inquiry. This does not affect the validity of the claims above.
Involuntary research is still research. See also: most of graduate school.
ยง 5

Conclusion

๐Ÿ’ฉ linguistics is not a joke dressed as scholarship.
It is scholarship that happens to involve a subject people find funny.

The discomfort it produces in reviewers is itself a linguistic phenomenon worth studying.

This repository is an attempt to take the subject seriously, one Python script at a time.

The author requests that peer reviewers evaluate the argument on its merits.
The author also requests that peer reviewers consider the circumstances under which this was written.
Both requests are probably futile.