The forest for the trees
Saturday, September 8th, 2007“The guy who invented the wheel was an idiot. The guy who invented the other three, he was a genius.” –Sid Caesar
My brother Eric once told me about a stage direction Shakespeare wrote: “Exit, pursued by a bear.” So when I saw the quote prominently featured on the webpage of a management consultant, I emailed Eric the link. He wrote back an email. “I have fallen in love with a poem that is kind of like that,” he wrote, and included the poem. And hey, I do really like the poem too.
When my Gmail account showed me Eric’s email, it showed a lot of Google Adword ads for cats. Why? The word “cat” is mentioned once in the whole poem. Any idiot human would know the email is not about cats. It’s about poems. The email is about certain emotions and a way of viewing the world.
Google sees the trees for the forest. But there’s a very simple way that Google and other contextual ad companies can see the bigger picture, know the forest for the trees, and match like Eric and I just did.
Here’s how computers can know the forest for the trees: Take the whole page. Search the combination of the most common words. The first link is usually the most relevant. That’s it.
I’ll show you the poem, and then how computers can quickly see the forest for the trees. Skip down if you ain’t such a poetry fan yourself.
“The Path to the White Moon
There were little farmhouses there they
Looked like farmhouses yes without very much land
And trees, too many trees and a mistake
Built into each thing rather charmingly
But once you have seen a thing you have to move on
You have to lie in the grass
And play with your hair, scratch yourself
And then the space of this behavior, the air,
Has suddenly doubled
And you have grown to fill the extra place
Looking back at the small, fallen shelter that was
If a stream winds through all this
Alongside an abandoned knitting mill it will not
Say where it has been
The time unfolds like music trapped on the page
Unable to tell the story again
Raging
Where the winters grew white we went outside
To look at things again, putting on more clothes
This too an attempt to define
How we were being in all the surroundings
Big ones sleepy ones
Underwear and hats speak to us
As though we were cats
Dependent and independent
There were shouted instructions
Grayed in the morning
Keep track of us
It gets to be so exciting but so big too
And we have ways to define but not the terms
Yet
We know what is coming, that we are moving
Dangerously and gracefully
Toward the resolution of time
Blurred but alive with many separate meanings
Inside this conversation”
–from John Ashbery’s book A Wave.
Here’s how computers can know the forest for the trees: Take the whole email. Search the combination of most common words. The first link is usually the most relevant. That’s it.
1) Take the whole email, or the main part of the email. (I sent myself an email with only the poem, and Google still showed me only ads about cats. Here are the ads. Skip down if you ain’t such a fan of cat ads.
Sponsored Links (feedback)
Train Away Bad Habits Fast. Expert Reveals Little Known Secrets
www.fortheloveofkitty.com
An amazing true story
of the love between my Calico cat and me and our 7 year journey
barbaraloveskitty-cats.net
Cat Declawing Alternative
Veterinarian developed safe & humane, alternative to declawing.
www.SoftPaws.com
Kittentanz Cattery
Tonkinese Kittens
Happy, healthy kittens-Guaranteed!
www.kittentanz.com
Expert Cat Sitting - NYC
Experienced cat sitters available for regular & last-minute needs.
www.TwoDogsAndAGoat.com
Furminator Cat Deshedder
Learn about this amazing tool. Your cat will look and feel great.
www.CatFaeries.com
Potomac Pixiebobs
Pixie-bob kittens available now! Potomac pixiebobs of Northern VA
www.pixiebobspotomac.com
Liberator Cat Collar
Stops cats from killing birds Only AUD$29.95
www.liberators.com.au
More about…
Cat Health Care »
Cat Information »
Cat Companion »
Cat Sitter »
Would you recommend these ads after reading the poem? Probably not.)
2) Find the most common words.
Here’s what a Google search looks like:
It’s not your typical search. But it’s the contextual search because “the” appears 15 times in the email, “to” appears 10 times… you get the idea. Because Google only handles 32 keywords at a time, I haven’t included all the words from the poem, but only the most frequent words.
3) The first link is usually the most relevant.
Via Negativa >> Blog Archive >> Festival of the Trees 1
So we have some mighty tangles here and there around Berkeley, …..
Thanks to all of you for the words and pictures. Nice to know so many care for trees as …
www.vianegativa.us/2006/07/01/festival-of-the-trees-1/ - 82k - Cached - Similar pages - Note this
I click on the “Cached” link to see the webpage with the keywords highlighted.
Is it relevant? Eric is deeply, intensely interested in trees. And I actually enjoy reading a lot of this webpage. It contrasts nicely with my reading of the poem. And hey! there’s even poems on this page.
It doesn’t take any “semantic web” (a fancy term for thinking computers are idiots) for a computer to recommend poems — because the poem is implied from the combination of words that make it a poem. You don’t need to write “poem” to know it’s a poem.
The guy who wrote the page is “Dave Bonta, a 41-year old writer….” His profile in the top-right of the page says he runs a literary blogzine and has two collections of poetry online. Hmm, sounds like something I’ll recommend to Eric.
4) If I wanted the computer to choose a single block of text to highlight, I would give it an algorithm to find the most closely clustered highlighted keywords. I would feature this paragraph or sentence and show it at the top of the page. This paragraph from that page has nine different keywords close together.
our neighborhood is like a park, it’s a horticultural wonder that has sprung up on the grasslands of the berkeley hills. in other words, it’s mostly artificial. but it’s older, and very much overgrown - a feature many newcomers do not like about berkeley - but that’s just it, we don’t ‘manicure’ or ’spray’ much. (Although that is changing as a more monied group moves in. Berkeley used to be more about idealists of many ilks.) We wanted things to be ‘organic’ and to ‘let nature be nature.’ So we have some mighty tangles here and there around Berkeley, some briars that have gone bananas, but also just a lot of very relaxed-looking plants. I love the plants of Berkeley.
Eric drove through Berkeley and loved it. And I’m interested in what happens when new groups move in to a neighborhood.
The sentence featured by the algorithm is the first sentence which has three keywords close together:
But my assumption is that people who like trees are, by and large, given to contemplation rather than hurried skimming and haphazard clicking on links.
I’d assu
me that’s true about most people who read John Ashbery’s poetry, too. It’s a good summary of this webpage.
And the featured photo is the one closest to the text shown above.
I’ve done hundreds of this kind of “forest for the trees” search. Some folks call the results “uncanny.” Usually the algorithm is better than I am at finding relevant webpages. You could do this kind of search on MySpace, Facebook and LinkedIn profiles too.
Obviously if people like ten of the same movies, they have more in common than if they only like one of the same movies. But search engines and social networks have insisted on being idiots, searching single keywords instead of combinations, and warning me about Mainecoot cats when I’m more interested in stuff that actually relates to the whole page of what I’m reading.
Google and the other ad companies have invented a wheel. Now let’s attach the other three.
Similar Posts:
September 10th, 2007 at 7:32 pm
i just wonder how that algorithm would pan out. for example, how could it tell if i’m telling a fictional story, pasting in an excerpt from a book, or talking about what happened to me last night at a party?
also, if i am writing an email about my exploits last night, should it show ads for “prose” ?
another way of looking at this, is from the advertising perspective. i’m currently reading a book on marketing, and he talks about examples from different niches. what about, when he’s talking about food marketing, i see ads for food. maybe i am in a more receptable place for that particular ad, even though the actual topic is “marketing” ?
but that last piece is just a value question. what you’re talking about is
for the search engines to have a higher comprehension of what a set of words is really about. to understand the meaning, to deduce based on consistent subtleties, without conflicting elsewhere. it’s not easy, and is the holy grail so to speak of that industry…
“should I ask her” could be referring to someone thinking of starting a
relationship,
but it could also mean “if she’s pregnant” or “how old she is” - the goal here is simple:
know all the potential endings for the phrase “should i ask her” and then analyze the surrounding text to see which of those potential endings matches.
they will be there… from what i’ve seen, the fact that they’re not there,
is evidence of how hard this really is…