Skip to content

Nothing Sinister About Natural Language Processing: Are We Paranoid, or Is It Just US?

October 6, 2006

This post co-authored by the dynamic, unstoppable duo of Montag and Fehlleistungen: Fehlleistungen and Montag.

[UPDATED 7:07 pm, 10-6-06, polishing/clarification edits.]

The New York Times says Cornell and other Universities are…

…developing software that would let the government monitor negative opinions of the United States or its leaders in newspapers and other publications overseas. [NYT: Software Being Developed to Monitor Opinions of U.S.]

Interesting that “overseas” bit. (When we talk about new bunker-busting nuclear weapons systems, we don’t bother saying they are for use overseas.) I highly doubt the computer program will be programmed to determine the origin of the text that it analyzes, and could therefore be used on text from anywhere. Also, I wonder if the program is designed to deal with English language, and will thus rely on translations of foreign language items, or if it will account for different languages and their different semantic (or is it linguistic?) rules, etc.

The research is funded by the Homeland Security Department, and is based on…

…a database of hundreds of articles that it is being used to train a computer to recognize, rank and interpret statements. [Ibid.]

More below on the prospect of a computer recognizing, ranking and interpreting statements made by human beans.

But first, some digging turns up a report published by the Cornell people entitled, Joint Extraction of Entities and Relations for Opinion Recognition, the abstract of which says:

We present an approach for the joint extraction of entities and relations in the context of opinion recognition and analysis. We identify two types of opinion-related entities — expressions of opinions and sources of opinions—along with the linking relation that exists between them. . . . we employ an integer linear programming approach to solve the joint opinion recognition task, and show that global, constraint-based inference can significantly boost the performance of both relation extraction and the extraction of opinion-related entities. [Cornell University, Department of Computer Science: Joint Extraction of Entities and Relations for Opinion Recognition, emphasis added.]

Zzzzzz. (Don’t even ask what the next two sentences say. I’ve already achieved a state of denial that I even read them myself.)

And having dug up the above, we find that the private sector is interested in what seems like the exact same technology— for examining the Blogosphere. Interested enough to fund a conference that will delve into it, among other things. The Cornell Professor quoted in the Times article is also involved in this International Conference on Weblogs and Social Media.

One “Area of Investigation” to be highlighted at the conference:

…the response to current events, including emotional and attitudinal dimensions as well as content and patterns of influence.

Two of the “Areas of Interest” include:

Sentiment analysis; polarity/opinion identification and extraction


Blogosphere vs. mediasphere; measuring the influence of blogs on the media.

Is there power buried in so much text? Can technology like what is described above disinter that power? Whether it is foreign news sources or blogs, the question remains of whether one can really uncover true thoughts — seeds of action — by combing through such massive amounts of texts and deriving from them trends, patterns, opinions, and so on.

Hey wait, this brings the issue into the field of literary criticism. It’s really a question of interpretation: can you look at narrative language (whether the “looking” is computerized or human) and decide what the author really means? (The so-called “intensity” of opinion in the NYT article.) In addition, this research raises the question of the connection between thought, its expression in language, and (potential) action. Just because I write: I’m going to clean the refrigerator this afternoon no matter what, no questions asked, I’m going to wipe that thing so clean that you won’t even be able to look at it, we’re going to be able to eat our food directly off the shelves in that cold storage device, there will never have been a refrigerator cleaner than our refrigerator is going to be this afternoon following the vigorous scrubbing that I’m going to give it” and so on (a very “intense” statement of opinion, and even an overt promise of planned action), this has NO NECESSARY RELATIONSHIP with whether or not I’m actually going to clean my refrigerator this afternoon. I might fall asleep and take a long nap, I might get engrossed in a book, I might fall and break my leg, or I MIGHT HAVE JUST BEEN SAYING IT AND NEVER REALLY MEANT THAT I WAS GOING TO ACTUALLY DO IT. I might have “just been saying it” for any number of reasons, including: my wife really wanted me to clean the refrigerator and I had to get her off my back; I have obsessive-compulsive disorder and needed to clean the refrigerator before I mowed the lawn (and I really wanted to mow the lawn); I was kidding around – since our refrigerator is always spotless, my language was entirely ironic and not meant to be taken at face value; etc.

There is some agreement among certain literary critics these days that one interprets a text by assigning it meaning. There is no single, ultimate meaning that a book carries within it, some mystical core which the author intended and which can be transmitted intact over time and space to read readers everywhere. When I read a book, I make it mean what I want it to mean, what I need it to mean, what my unique background, language skills, the corpus of my past reading, my philosophy of life, etc. allow me to take from the text, or even compel me to take from the text. To MAKE IT MEAN. So what I imagine is going to happen with this technology is that it will MAKE THE TEXT MEAN WHAT IT MUST MEAN, that is, it will interpret the text along the lines of how the programmers TOLD it to interpret the text. The computer’s “personal situation” or “context” or “background,” those things that will allow it to assign meaning to the various texts that it scans, are given to it by human programmers, who bring their own reading strategies to bear when they tell the computer “how to read” and “how to assign meaning” and “how to categorize texts on the basis of their meaning” and “how to derive the intent of the author from the text.” (For a good example of this, just recall the hubbub surrounding “random event generators”.)

So just as the man in the airport speaking Tamil MUST BE A TERRORIST AND SHOULD BE REPORTED (meaning was assigned to the man and what he was saying (language), his threat level was assessed (interpreted) and categorized (a threat) and action was taken based on this categorization), the computer is going to find just what it expects to find, and where it expects to find it, when it scans the texts given to it.

[Via: Sumo Merriment]

In closing, allow me to quote the DSM-IV definition of “paranoid personality disorder” and I’ll leave it to you to draw connections between the motivations behind massive data collection and analysis and the following:

  • 301.00 Paranoid Personality Disorder
A. A pervasive distrust and suspiciousness of others such that their motives are interpreted as malevolent, beginning by early adulthood and present in a variety of contexts, as indicated by four (or more) of the following:
  1. suspects, without sufficient basis, that others are exploiting, harming, or deceiving him or her
  2. is preoccupied with unjustified doubts about the loyalty or trustworthiness of friends or associates
  3. is reluctant to confide in others because of unwarranted fear that the information will be used maliciously against him or her
  4. reads hidden demeaning or threatening meanings into benign remarks or events
  5. persistently bears grudges, i.e., is unforgiving of insults, injuries, or slights
  6. perceives attacks on his or her character or reputation that are not apparent to others and is quick to react angrily or to counterattack
  7. has recurrent suspicions, without justification, regarding fidelity of spouse or sexual partner.
One Comment
  1. October 6, 2006 7:57 PM

    All I can say is…”Wow”…and…did you ever get back to the frige…?

Comments are closed.

%d bloggers like this: