Monday, January 31, 2005

Automatic Meaning Discovery Using Google

Automatic Meaning Discovery Using Google

Abstract: We propose a new method to extract semantic knowledge from the world-wide-web for both supervised and unsupervised learning using the Google search engine in an unconventional manner. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. We give evidence of elementary learning of the semantics of concepts, in contrast to most prior approaches. The method works as follows: The world-wide-web is the largest database on earth, and it induces a probability mass function, the Google distribution, via page counts for combinations of search queries. This distribution allows us to tap the latent semantic knowledge on the web.

and from the paper itself...
A comparison can be made with the Cyc project [14]. Cyc, a project of the commercial venture Cycorp, tries to create artificial common sense. Cyc’s knowledge base consists of hundreds of microtheories and hundreds of thousands of terms, as well as over a million hand-crafted assertions written in a formal language called CycL [20]. CycL is an enhanced variety of first-order predicate logic. This knowledge base was created over the course of decades by paid human experts. It is therefore of extremely high quality. Google, on the other hand, is almost completely unstructured, and offers only a primitive query capability that is not nearly flexible enough to represent formal deduction. But what it lacks in expressiveness Google makes up for in size; Google has already indexed more than eight billion pages and shows no signs of slowing down.

Sunday, January 30, 2005

Researchers Map The Sexual Network Of An Entire High School

Researchers Map The Sexual Network Of An Entire High School
COLUMBUS, Ohio – For the first time, sociologists have mapped the romantic and sexual relationships of an entire high school over 18 months, providing evidence that these adolescent networks may be structured differently than researchers previously thought.


Monday, January 24, 2005

testing ElementTree

Testing cElementTree...

from SimpleXMLTreeBuilder import *
import cElementTree as ElementTree

def xml2elem(xmlData):
p=TreeBuilder()
p.feed(xmlData)
return p.close()

if __name__ == "__main__":
text = """ root content
tag 1 text
content of tag2
deeper
tag2 tail

"""
elem=xml2elem(text)
print 'elem.Tag->', elem.tag # elem root
print 'findall->', elem.findall('tag1')[0].text # tag1 text
print 'interrogate second child...'
print ' ', elem.getchildren()[1].tag # tag2
print ' ', elem.getchildren()[1].items() # [('id', '2'), ('att2', 'val2')]
print ' ', elem.getchildren()[1].getchildren()[0].tag # next level gets... deeper
print 'for child -> elems children but not grandchildren'
for child in elem:
print ' ', child.tag #tag1 tag2 (but not deeper)

file=open('text.xml','w'); file.write('text'); file.close

print 'for elem in iterparse -> all levels...'
for event, elem in ElementTree.iterparse('test.xml'):
print ' ', elem.tag # #tag1 tag2 AND deeper


elem.Tag-> root
findall-> tag 1 text
interrogate second child...
tag2
[('id', '2'), ('att2', 'val2')]
deeper
for child -> elems children but not grandchildren
tag1
tag2
for elem in iterparse -> all levels...
tag
deeper
tag2
root


Wednesday, January 19, 2005

Mary Ellen Bates - Tip of the Month

Mary Ellen Bates - Tip of the Month:
Good tips...
"...try Google's synonym search. Add a tilde (~) at the beginning of the words child and obese (~child ~obese), and Google retrieves web sites that use any of those synonyms."
Google Personalized
Personalized Google is still in beta, but it's an interesting tool. Once you go to the Google Labs page and select Personalized, you will be sent to a new search page, that includes a link to [Create Profile]. You can specify the type of searching you typically do, ranging from biotech and pharmaceuticals to dentistry to classical music. Click [Save Preferences], and then type your search terms in the Google Personalized search box.


At the search results screen, you will now see something new -- a slider bar that lets you specify how much you want the search results sorted by those interests you specified. The default is minimal personalization; move the slider bar toward maximum, and you will see the search results change on the fly, as Google re-ranks the results based on your personal interests.


Specialized Searches
In addition to the well-known Google search tabs for searching the web, news and images, there are several specialized search tools for commonly-search subjects, including UncleSam for searching federal government information; University Search for searching within the sites of major colleges or universities;

Simplicity

dirtSimple.org:
"In my second professional programming job, I had a really interesting boss. When we had a design meeting, we would all sit around a whiteboard, and as Roger (my boss) threw out things we needed to accomplish, the other programmers and I would propose solutions, and Roger would say, 'Really? What if you just did X?', where X was some absurdly, ridiculously, jaw-droppingly simple thing.

Of course, X wouldn't always work; oftentimes one of us would find a hole in his idea. We'd all then try to fix the hole, but at some point the idea started to become too complicated for Roger's taste. 'How about Y?' Still ridiculously simple, and tantalizingly close to working.

Oftentimes, he 'cheated', by redefining the problem itself to make it a simpler problem to solve, or forcing the problem to fit some existing available solution. We would continue in this vein until the solution was so simple it hardly seemed like any work to actually implement, or it became absolutely clear that the problem would not yield to simplicity. In which case we simply packed it in for the day on trying to solve that problem, and we'd hit it again on another day."
The article continues, wonderfully, in this vein. I don't know that I'm as good as his boss, but I certainly recognize myself. Its reassuring to find there are others out there ...


Roger, by the way, was not a programmer by trade. He was a teacher, mostly of learning disabled children.

Saturday, January 15, 2005

infoSync World : Shake the Samsung SCH-310 to its senses

infoSync World : Shake the Samsung SCH-310 to its senses:

Airpen of the future?
"Samsung Electronics today unveiled the world's first mobile phone with built-in, 3-dimensional motion recognition technology. Dubbed the SCH-310, the handset calculates and ascertains movement in three dimensional space, and then carries out commands according to those calculations.

Samsung's new SCH-310
breaks new ground with
its 3D motion recognition
capabilities

Previously, several handset makers have shown prototypes drawing on motion recognition technology, however typically such concepts have relied on two-dimensional recognition rather than full spatial tracking. To enable the 3D motion recognition technology found in the SCH-310, Samsung devised a moving algorithm, resulting in applications for 22 domestic and foreign patents.

Predicting a future where 3D motion recognition technology will become an important user interface, Samsung envisions the technology revolutionizing mobile phone designs and doing away with the need for complex keypads on handsets, portable audio players, digital cameras and other handheld products. In particular, the company sees the technology changing mobile gaming.

The SCH-S310 relies entirely on 3D motion recognition technology for its user interface, enabling users to dial by 'writing' digits in the air. Shaking the phone twice will conclude a call or delete spam messages, whilst drawing the letters 'O' or 'X' will cause the handset to respond with 'yes' or 'no' voice messages. Also housing MP3 playback functionality, moving the phone sharply to the right will cause songs to skip forward; to the left, and it skips backward.
"

Schneier on Security: Physical Access Control

Very nice description of security logic applicable to software (not all that powerful for fences as commenters point out....

Schneier on Security: Physical Access Control: "In Los Angeles, the 'HOLLYWOOD' sign is protected by a fence and a locked gate. Because several different agencies need access to the sign for various purposes, the chain locking the gate is formed by several locks linked together. Each of the agencies has the key to its own lock, and not the key to any of the others. Of course, anyone who can open one of the locks can open the gate.

This is a nice example of a multiple-user access-control system. It's simple, and it works. You can also make it as complicated as you want, with different locks in parallel and in series."

Wednesday, January 05, 2005

ZDNet AnchorDesk: Why IM is so much better than e-mail

ZDNet AnchorDesk: Why IM is so much better than e-mail:

Curiously enough, multichat has none of these virtues.
"People I need to reach aren't responsive to e-mail anymore; they seem to check it every few hours or so, probably dreading the onslaught of spam and tedious threads that await them.

IM restores that rapid-fire pungency e-mail used to have, an electronic version of someone sticking their head in your office door.

I IM'd three of my IM peeps to find out why they like it.

Brian: Hey, I'm writing about IM in my next column. What's the essence of it?

Rafe Needleman (with his characteristically analytical take): presence is a key part of it. i know if you're there when i send a message to you. that's something you can't do either with e-mail or phone

Brian: OK, OK, I'll admit it, IM is cool But why?
Joni Blecher (with her usual larkish take): oh, easy--it's just like passing notes in class.

Brian: Brainstorming for my column: Why do you think IM is cool?
Stacy (my wife, as catty as a woman with five cats should be): you can be far more explicit than you can be on your work e-mail--i.e., bitching about your boss's awful new shoes w/o leaving a trace in the company's server logs. and one more a neat freak like you will appreciate: less crap to have to delete from your in- and out-box.
And this comment from a reader:
No latency
Its like a discussion, only you can't be interrupted.
And yet....

Monday, January 03, 2005

mod_pubsub

Why isn't this precisely what we need for Multichat?
mod_pubsub Project FAQ: "What is Repubsub?

Previous Mod-pubsub distributions (which are still available on Sourceforge) included a mod-perl server, a standalone Python server, client libraries in 10 programming languages, lots of sample code for the mod-perl server, random tools, badly-maintained docs, and the kitchen sink. However, we found that a large proportion of all the developers who wanted to use Mod-pubsub just wanted the Python server with the JavaScript client and were confused by the other stuff. So Repubsub is a minimal Mod-pubsub package comprised of the Python server and JavaScript client, rewritten for better performance and scalability.

Exactly how do Mod_pubsub's connections stay open?

Mod_pubsub's clients do not poll the server every so often. Instead, like Jabber, a Mod-pubsub server holds open the connection socket to its clients. The main difference between Mod_pubsub and Jabber is that we use standard HTTP as our wire format, whereas Jabber uses its own XML-based protocol. (Yes, we know that people think HTTP is a polling protocol... but it's not necessarily.)

Are there any other products like Mod_pubsub?

Similar projects include Pushlets, Jabber, and XMLHTTP. Unlike mod_pubsub, Pushlets can only maintain a few simultaneous connections, and require Java programming. With mod_pubsub, if you know how to use HTTP, you can do PubSub. Unlike Jabber, which uses its own XML-based protocol, we use standard HTTP. And Mod-pubsub is easier to work with, supports multiple users viewing the same data stream at the same moment, and is more flexible than XMLHTTP.
"

Form follows function. Architecture and technology follow form

Joel on Software - Don't Let Architecture Astronauts Scare You: "Your typical architecture astronaut will take a fact like 'Napster is a peer-to-peer service for downloading music' and ignore everything but the architecture, thinking it's interesting because it's peer to peer, completely missing the point that it's interesting because you can type the name of a song and listen to it right away.

All they'll talk about is peer-to-peer this, that, and the other thing. Suddenly you have peer-to-peer conferences, peer-to-peer venture capital funds, and even peer-to-peer backlash with the imbecile business journalists dripping with glee as they copy each other's stories: 'Peer To Peer: Dead!'

The Architecture Astronauts will say things like: 'Can you imagine a program like Napster where you can download anything, not just songs?'"

Sunday, January 02, 2005

Blogger Help : All about Blogger's post editor

Blogger Help : All about Blogger's post editor: "All about Blogger's post editor"

Shirky: Situated Software

Shirky: Situated Software: "We've been killing conversations about software with 'That won't scale' for so long we've forgotten that scaling problems aren't inherently fatal. The N-squared problem is only a problem if N is large, and in social situations, N is usually not large. A reading group works better with 5 members than 15; a seminar works better with 15 than 25, much less 50, and so on.

This in turn gives software form-fit to a particular group a number of desirable characteristics -- it's cheaper and faster to build, has fewer issues of scalability, and likelier uptake by its target users. It also has several obvious downsides, including less likelihood of use outside its original environment, greater brittleness if it is later called on to handle larger groups, and a potentially shorter lifespan.
"