DynamoDB on Amazon

19 Jan

Amazon introduces DynamoDB

Amazon released DynamoDB. It’s a NoSQL database similar to Cassandra. If you’ve been interested in checking out NoSQL or Amazon services, this is a great way to get your feet wet. In fact, you might as well dive in, because Amazon has a free usage tier that includes DynamoDB. I plan on checking this out the next time I have an evening free. This InformationWeek article provides a good journalistic intro to DynamoDB.

Tags: , ,

A Guide to Setting Up a Rails App on Dreamhost

10 Jan

Here’s my notes.  This is my first attempt at this, so I may make some updates later as necessary.

Start by creating the rails app.  I’m making one named “food_app”:

$rails new food_app --database=mysql --freeze

Then, on the dream host MySQL Databases Web Panel create a dev, test, and prod database.

  • Create a user/password for the rails app and use this for all databases
  • Remember the user/pass info and the server name and database name

Now edit the rails db config file:

 $emacs -nw food_app/config/database.yml

Update each section with the names of the databases you just created.  Also provide user and host information:

production:
  adapter: mysql2
  encoding: utf8
  reconnect: false
  database: food_prod
  pool: 5
  username: ####
  password: ####
  host: ####

Now, create a shortcut from to the domain.  If you already have a domain folder you can rename it first.  Make sure you’ve enabled Passenger in the Dreamhost Panel (see: http://wiki.dreamhost.com/Getting_Started_with_Ruby_on_Rails)

 $ln -s food_app/ food.textidermy.com

Now move into the app folder for some more configurations:

$cd food_app

The Dreamhost mysql2 gem has problems with the current rails version.  Make the following edit to the app’s Gemfile:

gem 'mysql2', '< 0.3'

Then bundle things:

$/usr/lib/ruby/gems/1.8/bin/bundle

Now make a test object to see that things are working:

$rails generate scaffold Post title:string body:text

Then, migrate it:

$rake db:migrate

Check that the tables are created in the dev database.  Then start the rails server to see the development app:

$ rails server

Check it at the server address.

And that’s it!

 

Some helpful links:

 

Tags: , ,

Time-bound Computing

1 Dec

Day 38 - Time keeps on slipping... by brianjmatis, on Flickr

Lately I’ve been reviewing the computational tools used to conduct analysis on recommendation problems, categorization problems, and other data or text mining problems.  Many of these problems attempt to find insight using data like a human would if a human had reviewed and understood all of the available data.  This resemblance to human intelligence often results in these types of tasks being considered “artificial intelligence.”  However, the computational tools used to create artificial intelligence seem fundamentally different than the ones that create human intelligence.

I’d like to introduce two terms.  (I’d also like to mention that I haven’t looked for literature on this, and I’ll post updates as I find pre-existing work on this concept.)  Our terms are:

Accuracy-bound computing:
This captures the majority of current computing.  A program will run until it has completed (if it completes), and (ideally) it will return an accurate answer when it completes.  We see this paradigm everywhere. When we ask “3/8=” or “What is the capital of Malaysia?” we expect the exact correct answer. In fact, we expect computers to return a 100% correct answer, even if that means we have to wait for the answer.

Time-bound computing:
Imagine an alternate computing paradigm where computing doesn’t stop when an (accurate) answer is reached, but instead stops after a specific amount of time.  The program returns the best possible answer available after a specified amount of time even if the answer isn’t 100% correct.  If a program has more time available, it will return a more accurate answer.

Consider two examples.

  1. With accuracy-bound computing the problem “3/8=?” returns “0.375″ after an unknown amount of time, while with time-bound computing “3/8=?” may return “0.4″ after 1 second, or “0.38″ after 2 seconds, etc.
  2. Querying a accuracy-bound search engine with the phrase “What is the capital of Malaysia” will return the ten best results for the phrase, perhaps with the Wikipedia article for Kuala Lumpur as the first result. But we may need to wait several seconds for the results.  A time-bound search engine would also return the top ten results, but the results would differ depending on how long we let the search engine run.  Limiting it to 1 second, the Kuala Lumpur wiki article may be the 9th result, but if we wait 5 seconds we may find it to be the first result.
Why is this distinction important?

I think that human intelligence can be better modeled as a time-bound computing problem.  Humans constantly make decisions in time limited situations, where recommendations, categorizations, and decisions are made based on the best available information. For example, drivers are not at leisure to compute the optimal time to decelerate when they see break lights on the freeway, they must make the best decision possible about the amount to break in the limited time available. Often, we don’t need a perfect answer, just a usable one. Perhaps if we design computing systems from a time-bound computing perspective we could make artificial intelligence more human.

In future posts I’ll provide some examples of time-bound computing and discuss some of the problems or limitations with the approach. Here are a few of the areas I’m currently considering:

  • Does a time-bound computing approach need an “expected accuracy” metric to accompany any results returned by a program?
  • How often do we need “perfect” answers?  What types of problems can be solved with imperfect answers?
  • If you pipeline time-bound computing programs, how does error propagate through the system? Can the error propagation be managed or does it make the final results unusable?

Tags: , ,

Welcome

16 May

A taxidermy figure of a gorilla

Facebook posts, Tweets, blogs, resumes, restaurant reviews, moving ratings, social networks, not-so social networks, government records, click streams, browser histories, bookmarks, … We publish countless streams of content, but what do they really say about us? Engineers, scientists, and marketers can harness and aggregate our content and data shadows to create specialized products, or at least a few targeted ads, but who are they really targeting?

A gap exists between the electronic content available about a person and the individual underlying person. A taxidermy figure, videos, and Jane Goodall’s accounts may describe a family of apes, or an ideal ape, but they are inadequate to capture the uniqueness of a single ape. A similar deficit exists when we try to model the greatest ape in the digital jungle of the twenty-first century.

This blog is about the struggle to capture the essence of a person by modeling the electronic data about that person.

Tags: , ,