Similar pages across the internet
Wednesday, January 9th, 2008When I was four or five years old, my grandfather took me for a ride in his convertible with the top down. He drove fast, the wind rushed by, it was thrilling and fun. Afterwards, he asked me how I liked it. I said, “It was cool!”
He got excited for a moment, and said, “Do you mean ‘cool’ as in fun, or ‘cool’ the temperature?” He was excited because I didn’t usually use slang, if ever, because I wasn’t usually around friends or family who used slang.
I was puzzled because I hadn’t ever heard anyone say ‘cool’ as in ‘fun’ before. It was out of context for me. “‘Cool’ the temperature,” I said.
Please, imagine you write something –
a thought, a blog, a profile on Facebook or LinkedIn, a webpage, a research paper, an entire book –
And as instantly as ads pop up when you read an email in your inbox, as instantly as search engine results, as instantly as viewing a blog or a profile on Facebook or a webpage –
in half a second –
you see the most similar thought anyone has written anywhere on the internet.
* * *
Let’s start with this blog, because…are you sure this can be done?
Rob Marsh, S.J., made a program, a WordPress Plugin called “Similar Posts.” It finds similar pages. You could say it searches entire pages of text to find the best matches for every other page.
Every time I write a new blog entry, Similar Posts updates. You can see the results at the bottom of any entry on this blog, where it says “Similar Posts” and shows the five most similar entries.
* * *
Why do I care?
Without doing anything, every time I write a thought, Similar Posts reminds me of my most similar thoughts.
Not similar by narrow topic, but –
similar by the general underlying meaning –
triggering similar combinations of distributed words across “similar” pages.
In March 2005, my life was changed when I searched a page of text I’d written –
not a search of a few words, but a search of a whole page. It referred me to Virginia Satir, who’d pioneered family therapy. Soon after, I lived in Galveston, TX, and read all of Ms. Satir’s books. They turned out to be essential in my helping people tell their stories about work in companies, and gave me insights into my own life. But enough about me. Here’s what some other people have said:
‘Ryan was looking for a new career. He wrote me a half–page letter asking for advice. When I searched the text of his letter, two of the four top results were links to art magazines. Ryan liked the content of the art magazines very much. But he said that what surprised him was that, before he had written to me, he had actually been in the process of starting his own online art magazine.’
-from an earlier entry on this blog
‘what you’re talking about is for the search engines to have a higher comprehension of what a set of words is really about. to understand the meaning, to deduce based on consistent subtleties, without conflicting elsewhere. it’s not easy, and is the holy grail so to speak of that industry…
‘“should I ask her” could be referring to someone thinking of starting a
relationship, but it could also mean “if she’s pregnant” or “how old she is” - the goal here is simple:
know all the potential endings for the phrase “should i ask her” and then analyze the surrounding text to see which of those potential endings matches.‘they will be there… from what i’ve seen, the fact that they’re not there,
is evidence of how hard this really is…’-from a comment by ’smitty’ at the bottom of another entry on this blog
‘When a key piece of data changes in the enterprise, one must first treat this new data like a query (i.e., what does this new data mean in relation to what the enterprise already knows). And if new data is not treated first like a query, one will never know if this new information matters unless someone asks. I often refer to this notion as Perpetual Analytics – a world where the “data finds the data and the relevance finds the user.”‘
-from Jeff Jonas’ blog
‘Ideally, similarity or relatedness would be based on a post’s meaning….The Similar Posts plugin compares posts by comparing their words.’
-from the Similar Posts website
* * *
Similar Posts chooses better links on my blog than I can. It “knows” and “recommends” the order of a three-part article I wrote: first part 1, then part 2, then part 3. Even though it doesn’t “know” which comes first, the similarity of words in part 1 is most similar to the words in part 2, then to part 3. That’s kind of incredible when you think about it.
It also “knows,” Blockbuster movie-style or Amazon.com-style, if you like this page, you might love this other page. Similar Posts suprises and delights me constantly ever since I installed it on December 1st, a month ago. It reminds me of writings I’d forgotten I’d written.
Imagine the Google of similar webpages across the internet. Every page you read will link to pages which on a deep level relate.
When you search online, instead of typing one or two words, you’ll be able to write a page of what you want. You’ll be referred to your answer, maybe even your soulmate.
* * *
For all ya’ll techies out there, we don’t need “tags” or “semantic web” or “labels” or “categories.” Our natural language –
our words –
contain the answers of what we’re looking for.
A computer program can tell on its own when someone is writing about something abstract, say, something “blue” –
and it can differentiate between blue the color and blue as in “feeling down.”
Likewise, when the words “should I ask her” appear on a webpage –
it is very likely that the subject of the page will be related to a guy wondering about asking a girl out. Google’s current PageRank system helps get the most popular usages to the top of the search results.
Similar Posts can be chunked down into Similar Sentences —
and meaning can be linked.
Call it “contextual search,” “soulpage search,” or whatever ya wanna –
try it out. Start by installing Similar Posts on your own blog. It takes half an hour to install. Go ahead.
I doubledare ya.
And if you don’t believe me, look immediately below this line to see what Similar Posts “thought” were the most similar writings on this blog.
Similar Posts: