Sunday, July 26, 2009

recursive descent json parser in perl

Sure, there are real, fully featured, well tested json parsers out there, but for a brief moment I thought I might need to write my own (why is a different story altogether, but for now I'll just say that trimming dependencies was necessary).

Anyway, after a quick look at the spec (hint, its very simple) and a couple of hours coding, I can now present a reasonably functional (character escapes are missing, might be missing other features as well) recursive descent json parser implementation in perl. This is definitely not the best choice for adding json support to your next perl project, however it is a simple and practical example of recursive descent parsing that others might find useful.

Anecdotally, if eval'ing json input is not a concern (for security reason's it probably should be), you might just substitute arrows (=>) for colon's (:) while reading in json and eval it as an even simpler alternate solution...


#!/usr/bin/env perl
package json;

use 5.006;
use strict;
use warnings;

use Carp;

BEGIN {
for (qw(file buffer token line pos)) {
eval "sub $_ : lvalue { \$_[0]->{$_}=\$_[1] if \@_>1;\$_[0]->{$_} }";
croak $@ if $@;
}
}

sub new {
bless { file => *ARGV, line => 0, pos => 0 }, $_[0];
}

sub accept {
my ($self, $chrs) = @_;
for (split(//, $chrs)) {
return 0 unless $self->token eq $_;
$self->advance;
}
return 1;
}

sub expect {
my ($self, $chrs) = @_;
$self->accept($chrs) or confess $self->error;
$self;
}

sub error {
my ($self) = @_;
"unexpected token '", $self->token, "' at line ", $self->line, "\n";
}

sub advance {
my ($self, $ns) = @_;
for ($self->token = undef; not defined $self->token; $self->pos++) {
unless ($self->buffer and $self->pos < length($self->buffer)) {
defined($self->buffer = readline($self->file)) or return;
$self->line++;
$self->pos = 0;
}
$self->token = substr($self->buffer, $self->pos, 1)
if ($ns or substr($self->buffer, $self->pos, 1) !~ /[[:space:]]/);
}
}

sub object {
my ($self, $object, $key, $val) = @_;
$key = $self->expect('"')->string;
$val = $self->expect('"')->expect(':')->value;
$object->{$key} = $val;
$self->object($object) if $self->accept(',');
$object;
}

sub array {
my ($self, $array) = @_;
push @$array, $self->value;
$self->array($array) if $self->accept(',');
$array;
}

sub string {
my ($self, $str) = @_;
do {
$str .= $self->token;
$self->advance(1);
} while ($self->token ne '"');
$str;
}

sub digits {
my ($self, $d) = @_;
do {
$d .= $self->token;
$self->advance(1);
} while ($self->token =~ /[[:digit:]]/);
$d;
}

sub number {
my ($self, $n) = @_;
$n .= '-' if ($self->accept('-'));
$n .= $self->digits();
if ($self->accept('.')) {
$n .= '.';
$n .= $self->digits();
}
if ($self->accept('e') or $self->accept('E')) {
$n .= 'e';
if ($self->accept('+')) {
$n .= '+';
} elsif ($self->accept('-')) {
$n .= '-';
}
$n .= $self->digits();
}
$self->advance if $self->token =~ /[[:space:]]/;
$n+0;
}

sub value {
my ($self, $value) = @_;
$self->advance unless defined $self->token;
if ($self->accept('{')) {
$value = $self->object({});
$self->expect('}');
} elsif ($self->accept('[')) {
$value = $self->array([]);
$self->expect(']');
} elsif ($self->accept('"')) {
$value = $self->string;
$self->expect('"');
} elsif ($self->accept('null')) {
$value = undef;
} elsif ($self->accept('true')) {
$value = 1;
} elsif ($self->accept('false')) {
$value = 0;
} elsif ($self->token =~ /[[:digit:].-]/) {
$value = $self->number;
} else {
confess $self->error;
}
$value;
}

sub main {
use Data::Dumper;
print Data::Dumper->Dump([json->new->value], ['json']);
}

main unless caller;

1;

2 comments:

Randal L. Schwartz said...

The construct

[[:digit:]-.]

is an invalid regex. What should it be?

dillo said...

so it is, swapping the . and - works with my simple test case. I've fixed the code in the original post.