Thursday, September 4, 2008

Semantic Faces

(Rafael Nadal pics from his official page rafaelnadal.com)

In an earlier post here, I boldly claimed that the semantic web movement was a fool's errand. Rather than relying on a preconceived ontology, I argued that web searching would be better facilitated by "smart search technologies that can look at new, uncategorized things and figure out what to do with them right now, on the fly."

Recently, Google's Picasa photo sharing site has added some face recognition software to help users find different pictures of the same person then add name tags. The name tags are more reliable right now, but as face recognition software inevitably improves, I predict that they will be able to do away with tags altogether and rely wholly on the recognition of similarity in the pictures themselves. This is closer to the way the human cognitive system works. There will come a day when an algortihm can accurately match the two pictures of Rafael Nadal above and that algortihm with be the future of search.

This cognitive model of searching is what I want to see applied to web search as well. Find matches based on on-the-fly analysis of content. No tags. No ontology (at least, not built into the page itself). Laten Semantic Analysis is one quasi-linguistic method of doing this and it is already being applied quite profitably to the problem of matching advertisements with relevant web pages. LSA, with its somewhat crude bag-o'-words approach, has miles to go before it sleeps, but it's the right basic idea. Analyze content based on some salient metrics.

(Again, I admit I am no expert on the semantic web or search technologies, so my views are naive. If I am misunderstanding something, please feel free to educate me.)

1 comment:

dominique said...

Hi,

There is a huge underlying assumption in your analysis that people write to be understood.

I think that in the community web, people write to be differentiated by the people belonging to their community, hence the huge development of jargons and styles.

That's to me a huge challenge for this aspect of the semantic web i.e communities.

TV Linguistics - Pronouncify.com and the fictional Princeton Linguistics department

 [reposted from 11/20/10] I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ...