Plagiarizing From Myself

March 29th, 2010

paperrater

PaperRater.com is, as they describe on their site, “a free resource, developed and maintained by linguistics professionals and graduate students … used by schools and universities in over 46 countries … combines the power of natural language processing (NLP), artificial intelligence (AI), machine learning, information retrieval (IR), computational linguistics, data mining, and advanced pattern matching (APM).”

But wait, there’s more: “Before we could offer PaperRater.com we had to overcome large challenges related to computational linguistic design and development, handling transliteration variation; ethnolinguistic identification; document classification and entity extraction; name parsing and regularization; duplicate document recognition, plagiarism detection, clustering, and prioritization; automatic entity extraction and entity resolution.”

Sounds pretty good, so I decided to give it a whirl by inputting the text to this blog post into their system and seeing what happened.

When I clicked on the results for “vocabulary words” I was told: “Excellent work! Your usage of sophisticated words is on par with other well-written papers! Not bad, I suppose, since the blog post had “STFU” in the headline and used “suck” or “sucks” a total of five times.

But the disturbing part was that PaperRater called me out for plagiarism: “This paper is most likely plagiarized. The percentage of original content in this paper is too low.”

So then I tried to use PaperRater on a chunk of text from a Hemingway story, “A Clean Will-Lighted Place.” Papa got nailed for plagiarism, too. More than that, Hemingway got nailed for his vocabulary: “Your percentage of sophisticated vocabulary words used is LESS than average.”

Somehow, that all makes me feel a little better.

I can’t help but wonder, though, if maybe there aren’t a few bugs left in the “computational linguistic design and development, handling transliteration variation; ethnolinguistic identification; document classification and entity extraction; name parsing and regularization; duplicate document recognition, plagiarism detection, clustering, and prioritization; automatic entity extraction and entity resolution” system.

One Response to “Plagiarizing From Myself”

  1. Seth Says:

    It said my stuff was plagarized too. Total quack!

Leave a Reply