Thursday, July 22, 2010

simple overrides for quick mocking in perl

A more concrete example (and one I use too frequently) of overriding for testing similar to what is described by Sawyer X at blogs.perl.org: Simple symbol overriding for tests. Of course, my example uses inheritance and re-blessing rather than symbol overriding, but the result is basically the same. I like to use the snippet below to test mail sending functionality by simply printing the resulting mail to stdout.

my $smtp = Net::SMTP->new($mailhost);
...
if ($i_dont_want_to_send_mail) {
  package Mock::SMTP;
  our @ISA = qw(Net::SMTP);

  no strict qw(refs);
  for (qw(mail to data dataend quit)) {
    *$_ = sub { };
  }

  for (qw(datasend)) {
    *$_ = sub { shift; print @_, "\n" }
  }

  # re-bless my Net::SMTP reference to a Mock reference
  $smtp = bless $smtp || {};
}

Friday, July 16, 2010

screen for cluster management

After my parallel ssh roundup and hacking my own solution...

I just realized screen can do this too using "at" and "stuff".  Here's how:

  1. open a bunch of screens and ssh to various hosts
  2. get to a screen command prompt (control key followed by colon, ^C-a : )
  3. at the prompt, enter the command "at \#", this tells screen to run command that follows on all windows (yes, include the backslash and don't press enter yet)
  4. continue with the command "stuff", which tells screen to quite literally stuff whatever follows it into the input of a screen terminal (don't press enter quite yet)
  5. follow stuff with the command you want to run, you'll need to put it in quotes if the command has arguments (not yet)
  6. lastly, you need to end the line with a return so the shell(s) will run the command, so end the line with "\012" which is octal for a newline character

Now, press enter and what your command get run across all of your screen sessions.  This is particular cool for long running commands where you can switch back and forth between screen windows and check on the progress of many machines or also useful if you have a couple of quick maintenance task to run on a bunch of hosts.

So, putting it all together, the whole sequence looks like this:

^C-a : at \# stuff ls \012
^C-a : at \# stuff 'ls -alR' \012

Monday, July 12, 2010

a survey of web visualization toolkits

I'm always on the lookout for new tools to create graphs, charts, and other visualizations. It is an obsession that dates back to my days slaving over gnuplot and R scripts (or even *gasp* metapost). Here are some more modern toolkits I've come across which lend themselves to visualizations for the web.

  • Processing has to be one of my favorites, probably because I've been using it the longest.
  • And if Processing isn't web-2.0-enough for you, there is always the Javascript implementation.
  • Protovis - any visualization toolkit that can be used to create an interactive version of Minard’s Napoleon has to get good marks.
  • Google has Chart Tools.
  • Degrafa has been used to create some beautiful visualizations.
  • Or there is the slightly higher-level Axiis if you prefer (which is built on top of Degrafa).
  • Should you be religiously opposed to flash, you might like dygraphs
  • Prefuse is another Java toolkit with some great graph and chart types
  • And Flare (also from the Prefuse guys) you post those interactive visualizations to the web in flash.
Ok everyone, which ones haven't I seen yet?

    Sunday, July 11, 2010

    broadcast ssh - my entry into the world of parallel or cluster ssh tools

    I threatened before when writing up my review of the parallel or cluster ssh tools out there that I would write my own. And after a quick review of paramiko, especially the demo scripts included, this turned out to be a pretty quick hack.

    The basics of my ssh client are this:
    • prompt for username and password - ssh keys not required (although I would like to add support for using them if available in the future)
    • interactive use - I often want to look at things, which leads me to want to look at other things; in other words, I don't want to be constrained by having a list of commands to execute up front
    • parallel - I have a bunch of commands and a bunch of machines, a simple loop executing each command on each machine isn't going to cut it
    Those last two together make things a little tricky, the interactive bit means you have two choices.  You can be line oriented, prompt for a command and send it to each host or you can be completely interactive and send each character.

    I started with the former, but soon found that if you only send commands, you have practically no environment setup (that is all done in the shell, remember...).  So, despite the difficulties in dealing with the terminal, that is what this implementation does.

    A terminal based solution has its own issues, the line buffering of output from each host has to be dealt with to avoid interleaved garbage as the results of each command.  But in the end that wasn't too hard.

    In the end, usage is simple, cut-n-paste the code, save it as bssh.py and run something like:
    ./bssh.py host1 host2 host3 host4
    Enter a username and password at the prompts, and away you go, just enter commands as you would with any other shell. You will of course need paramiko and its dependency pycrypto installed and available.

    dnode - javascript remote invocation

    dnode is just the latest in a serious of cool things people are doing with javascript that I find amazing. processingjs is another (albeit not very new anymore).

    on lunch and bike sheds

    Today, Seth Godin hits on a pet peeve of mine, arguing about unimportant things.  Although he casts the problem as a discussion from the perspective of deciding on lunch, I like the programmers version better.

    We all know the type, whether sitting in a code review or discussing a database schema, there is always one person willing to spend hours discussing minutia.  Either it is the indentation style of a particular comment instead of the correctness of the algorithm.  Or it is the virtues of camel case names for database objects as opposed to the performance trade off of complete normalization.

    Saturday, July 10, 2010

    operator brings semantics to firefox

    I just came across operator, a firefox add-on for recognizing microformats and I have to say I'm impressed. I think this is a pretty big step in the right direction for the semantic web.

    As far as the semantic web goes, I've always been kind of on-the-fence regarding the human annotation versus automated statistical approaches. I'm still not convinced either will win out. It seems unlikely that every web author will annotate all their pages -- and yet, I'm not sure machine learning approaches will ever be able to accurately annotate things automatically.

    But with all of the generated pages out there, having just a few of the big ones support microformats, and browser support for recognizing those pages is a really big step. For example, linkedin already returns profile pages in hresume format. With operator, you can extract names, phone numbers, addresses, and event information easily.

    Update: operator doesn't seem to work in firefox 4 beta, but I think we can forgive that for now.

    FDA52EB2TTEK

    Friday, July 9, 2010

    normals - a processing experiment

    As part of a visualization I'm attempting to create, I was doing some experiments in processing. One of the experiments involved distributing points randomly on a sphere, and drawing the normal vectors for those points. Add in a trackball for rotating, panning, and zooming and it ended up being a pretty cool interactive toy by itself.

    my tracks for android

    my tracks for android... I've been threatening to write something like this for years... glad to see Google finally beat me to it. Now, I just need an android phone.

    Tuesday, July 6, 2010

    firefox 4 first impressions

    After only a few short hours online with the new beta release of firefox 4, I can already tell I'm going to like it. I have yet to find any issues, and it loads pages lighting fast. I highly recommend anyone thinking of testing the waters with this preview release to go for it.

    playing poker during class

    What should the punishment be for students playing poker during class?  Well, for my recent "Foundations of Object Oriented Programming" class, there was no punishment -- rather, playing poker was required.

    Monday, July 5, 2010

    plasmodium protein interaction network visualization

    I was recently finishing my write-up about using processing to create interactive art. In that article I wanted to mention the animation and physics processing libraries by Jeff Traer Bernstein and while looking up the references I was reminded of this visualization. It has always been one of my favorites, so I'm finally getting around to sharing it.

    parallel ssh tool roundup

    So I'm in the market for a good "parallel" ssh tool.  Basically, I want to ssh and type some commands like I always do except that instead of one command output, I want the command to be run on a bunch of machines and I want responses from each.  I've used things like mpiexec in the past, but I was hoping for something more ad-hoc.  I just want to specify hosts on the command line (with no prior setup).  I really don't even want to required ssh keys or having the same password if I can avoid it.  Like I said, just ssh as usual, run some commands, get multiple results.

    Anyway, these are the tools I've come across (in no particular order).  I'll outline what I regard as the advantages and disadvantages of each below.
    1. pdsh - "a high-performance, parallel remote shell utility" from Lawrence Livermore National Laboratory
    2. pssh an implementation of some parallel ssh tools in python
    3. dsh - dancer's / distributed shell
    4. pydsh - a python version of dancer's shell
    5. clusterssh - "a tool for making the same change on multiple servers"
    6. mussh - "a shell script ... to execute a command or script over ssh on multiple hosts"
    7. sshpt - "SSH Power Tool (sshpt) enables you to execute commands and upload files to many servers simultaneously via SSH"
    8. multixterm - part of the expect project
    9. clusterit - "a collection of clustering tools, to turn your ordinary everyday pile of UNIX workstations into a speedy parallel beast"
    10. dish - "The diligence shell 'dish' executes commands via ssh/rsh/telnet/mysql simultaneously on several systems"

    Sunday, July 4, 2010

    go python!

    Python 2.7 was released yesterday - and it looks like some cool python 3 features were back-ported.

    new required reading for programmers

    starting now I'm putting this paper on my required reading list for all programmers I hire:

    http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf

    thanks to the authors (obviously) and the guys at http://everythingsysadmin.com/ for pointing it out.

    Saturday, July 3, 2010

    dfs == stack overflow

    As usual, xkcd nails it!

    Watch out for those nasty, down-the-rabbit-hole depth first searches, they can be real productivity drains.