Sunday, March 15, 2015

Coffee Problem

The major task in marketing is categorization.  For instance once you have categorized groceries you can compare the UPC codes sent against the categorized items you have and kick out the ones that are new.  Assuming that you have correctly categorized the current ones and no one complains then only the new items are subject to human review.  Automating the process utilizing their item description seems like a tempting application until you consider the coffee problem.  What is coffee?  It is a product, a flavor, a color, and an appliance, at the very least. Cross referencing to the UPC manufacturer code gives an indication but you get the idea.
The general solution to the coffee problem as mentioned above is one that humans routinely use. We cheat.  That is we make use of other information, such as the UPC manufacturer code.
Open Salon has its own version of the coffee problem, the spam issue. How do we distinguish between someone venting their opinion and someone generating gibberish? Perhaps the sheer number of posts or length of posts would be useful. I believe at one point Salon knocked me off because I hit a spam filter.  I feel confident that I can recognize spam when I see it. But given that it is automatically generated I see little point in a manual process.  Even worse they are malicious.  Any screen or guarantee will meet further ingenuity. It is obvious that this is an attack on the site. Whether the justification is search engine optimization or stealing server resource, it is an attack against the commons.  What pointless silly futility.
The slush pile is another example.  How can a publisher winnow all the authors who want to publish?  Again spam filters sound like a good start.  “Heaving bosoms”, “communist menace”, “limpid pools”, it seems fun at first.  How do we determine a positive result, that something is excellent or at least marketable? It’s difficult enough when people do it.  Usually we follow successful authors. Bayes meets regression to the mean.
The internet itself is the perfect example. The rare joy I feel when I actually find something.  It seems like libraries and search engines are in the business of concealment rather than revelation.
On the other hand, think of the poor NSA.  What is subversive? What should they be tracking? “Blow up the Pentagon”, “flight training school”, “bushmaster”, what connects to what?
Confusion is the only refuge of freedom.

