Marpa notes

This post gives a high level overview of the Marpa parser, with links to various resources on using it.

1957: Noam Chomsky publishes Syntactic Structures, one of the most influential books of all time. The orthodoxy in 1957 is structural linguistics which argues, with Sherlock Holmes, that “it is a capital mistake to theorize in advance of the facts”. Structuralists start with the utterances in a language, and build upward.

But Chomsky claims that without a theory there are no facts: there is only noise. The Chomskyan approach is to start with a grammar, and use the corpus of the language to check its accuracy. Chomsky’s approach will soon come to dominate linguistics.

What is marpa

Marpa is an Earley parser. You can find a lot of details on how it works internally from the author’s blog: Oceans of Awareness

How to use Marpa

Besides lots of snippet tutorials on the blog linked above, the Marpa docs are quite extensive and cover all of the options available. However, most users want some type of skeleton to get started with, and then start adding additional options.

This perl code gives a tiny skeleton to get started with, and which other options can be hooked in. Most specifically, take a look at the slif DSL documentation which defines what can be passed to the grammar in the source string.

use v5.16;
use strict;
use warnings;

use Marpa::R2;
use Data::Dump 'dd';
use Try::Tiny;

# create Marpa grammar
my $grammar = Marpa::R2::Scanless::G->new({
  source => \q{
    :discard ~ ws

    Sentence ::= WORD+ action => ::array
    WORD ~ 'foo':i | 'bar':i | 'baz':i | 'qux':i

    ws ~ [\s]+
  },
});

# create recognizer to use the grammer and "drive" parsing
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });

# get the input to send in from somewhere
my $input = '1) bargain bebar Foo bar: baz and qux, therefore qux (foo!) implies bar.';

# try and read the input
try { $recce->read(\$input) };
# have we gotten to the end of the input yet?
while ($recce->pos < length $input) {
  # if not:
  try   {       $recce->resume                    }  # restart at current position
  catch { try { $recce->resume($recce->pos + 1) } }; # advance the position by one character
  # if both fail, we go into a new iteration of the loop.
}

# process the parse to produce the value
dd $recce->value;

 

Ruby slippers technique

Once a grammer has been created, the recognizer is used to “drive” it over the input string. Marpa is more powerful than other parsing technologies, and allows for arbitrary pausing during this process. One option that can be given to the recognizer is the rejection flag, which defines whether marpa throws an exception when it cannot parse, or produces an event and pauses. This provides a powerful interface for getting over messy input that does not fit with the proper grammar.

The following code is more complicated, but demonstrates using this technique to add pretend values whenever there is missing input to fix up a bad input. The input string given is missing ; characters at various points that the grammar expects. When this happens, Marpa pauses with a ‘rejection event. Note the leading single quote. Once paused, the ruby slippers function calls various recognizer functions to check on the current state of Marpa, to see what’s going on. If the next expected value is the ; and nothing else, the function tells the recognizer it just got one, and restarts parsing.

 

use v5.16;
use strict;
use warnings;

use Marpa::R2;
use Data::Dump 'dd';

my $grammar = Marpa::R2::Scanless::G->new({
  source => \q{
    :discard ~ ws

    Block         ::= Statement+ action => ::array
    Statement     ::= StatementBody (STATEMENT_TERMINATOR) action => ::first

    StatementBody ::= 'statement'       action => ::first
                    |   ('{') Block ('}') action => ::first

    STATEMENT_TERMINATOR ~ ';'

    #event ruby_slippers = predicted STATEMENT_TERMINATOR

    ws ~ [\s]+

  },
});

my $recce = Marpa::R2::Scanless::R->new({
  grammar => $grammar,
  trace_terminals => 1,
  trace_values => 1,
  rejection => "event"
});

my $input = q(
  statement;
  { statement }
  statement
  statement
);

# this is copy and pasted :D
for (
  $recce->read(\$input);
  $recce->pos < length $input;
  $recce->resume
) {
  for my $event ( @{ $recce->events() } ) {
    my ($name) = @{$event};
    if ($name eq "'rejected") {
      ruby_slippers($recce, \$input);
    }
  }
}

# we've exhausted all input, so check if we need a final terminator
ruby_slippers($recce, \$input);

# now give us the parse!
dd $recce->value;

sub ruby_slippers {
  my ($recce, $input) = @_;
  my %possible_tokens_by_length;
  my @expected = @{ $recce->terminals_expected };
  for my $token (@expected) {
    if ($token eq 'STATEMENT_TERMINATOR' && $#expected+1 == 1) {
      push @{ $possible_tokens_by_length{0} }, [STATEMENT_TERMINATOR => ';'];
    }
  }

  my $max_length = 0;
  for (keys %possible_tokens_by_length) {
    $max_length = $_ if $_ > $max_length;
  }

  if (my $longest_tokens = $possible_tokens_by_length{$max_length}) {
    for my $lexeme (@$longest_tokens) {
      $recce->lexeme_alternative(@$lexeme);
    }
    $recce->lexeme_complete($recce->pos, $max_length);
  }
}

 

 

Useful links

Marpa resources

Finding a pattern in a chunk of text

Parsing timeline

Code examples originally from

Super fancy print all parse trees

Discard AND use whitespace

Advertisements
Marpa notes

Testing framework

My coworkers have put together a testing framework which does some pretty fancy things. It’s able to record test runs against live systems, and then we can check in the results to use for future tests. That lets them work around not having development servers for a variety of different services. Record a run against prod, and save that for future tests.

However, to do this they need to send in a custom data format into their wrapper telling it what to do. The wrapper runs a series of functions to actually build and run the tests they want. Today I got to look at how this wrapper works and they’ve built a state machine to implement recursive descent parsing! So often developers don’t realize they are creating a parser, and don’t think about the tooling as building a language for the problem space.

So instead, we’ve got the raw input data structure, acting as a quasi pre-parsed AST to recurse down. Of course, you have to check the contents of various keys each step of the way to decide which function to run next, and what input keys to grab. And if you don’t have the keys? Boom, return an error for what’s expected “next”. Almost like expecting a noun before a verb in a sentence, with both required.

So none of this code is easily modified, nor is the resulting language obvious. And the only benefit they really get is that they can skip lexing a string and just feed in a raw data structure that…they…wrote… manually. Sometimes this job makes me sad.

Testing framework

Ansible

Recently, I got the team at work to start using Ansible for managing a couple of servers. Sadly, they were in such a tearing hurry in the later weeks that they didn’t bother to use the tool. Ah well, here’s the notes I put together for them after I spent an hour learning how to use it.

Ansible doesn’t require much to get started. Just install it on the system you plan to use for managing boxes. Note that each user *could* install their own copy locally, but would then not share useful info about the systems to be managed without some synchronization of a few files. These docs cover how to install ansible in a variety of ways. Easiest is just using the linux server’s package manager. Then you can immediately use adhoc commands against multiple systems.

Of course, most times you want a written down set of repeatable instructions. Ansible uses playbooks for that. The intro quickly goes over the basics, and the examples allows you to poke through nicer laid out playbooks. They have proper organization, linking to subcomponents, and managing multiple systems, such as in the lamp example provided.

However, there isn’t always a need for a large and complex system such as that when all you want to do is set up user accounts. To create a playbook capable of that, follow these instructions.

  1. mkdir playing-with-users; cd playing-with-users@
  2. touch site.yml hosts
  3. vim hosts and add content
  4. vim site.yml and add content
  5. ansible-playbook -i hosts site.yml

hosts

[servers]
112.128.133.4
134.126.191.4
112.124.133.5
134.226.711.5

site.yml

# This playbook configures the list of users for all servers
vars:
  users:
    foo:
      shell: bash
    bar:
      shell: bash
    wipple:
      shell: fish

tasks:
  - name: Install packages onto server
    package: name={{ item }} state=present
    with_items:
     - git
     - fish

  - name: create users on all nodes
    hosts: servers
    remote_user: root

    # Add each user with a bash shell, appending the group 'admins' and 'developers' to the user's groups
    # see the user docs for options like ssh key config
    user: name={{ item.key }} shell=/bin/{{ item.value.shell }} groups=admins,developers append=yes
    with_dict: {{ users }}

The core modules like the user or package modules are idempotent, and can be run as many times as you want. They will only make the changes necessary on the system. So you can run this playbook, then add more users and run it again. Only the new user will be configured.

Links

User module docs

List of modules available

Ansible

Complex systems

Any system that actually does useful work tends towards the complex. There are far more edge cases and complexities to real life than a cursory look shows. This is one reason that you shouldn’t roll your own libraries for everything. Not only do you have an infinitely regressing problem (do you build compilers for your custom language?), but you also simply don’t have the details.

Until you muck about in a problem space, you don’t know what you don’t know. You already have your own problem to solve. The problem may include writing your own library to support it. If so, that will become readily apparent. Don’t prematurely optimize complexity. You’ll have more than enough to deal with soon.

Complex systems

What is authority?

On twitter a couple days ago, there was a link to an important article for Christians: Husbands, beat your wives. Why is that such an important thing to read though? Seems that it would be common sense for any right thinking Christian. After all, you love your wife as Christ loved the church, right? This blog post is an insidious piece of heresy, but it seems good. Before going over why, lets sum up the important points of the article.

The author starts off with that touchstone Ephesians 5:22-23, which is always a nice place to start. Husband and wife are one flesh, so hitting your wife is akin to hitting yourself. Obviously a bad thing. As he states:

The gist of Paul’s point is this: the husband’s authority has a context to it, and is complementary to the role that the wife plays. By “complementary” I don’t mean it the way that Liberal Christians use it, but rather I mean that the husband and wife, being “one flesh” in marriage (Gen 2:24; Matt 19:5-6; Eph 5:31), unify with each other by their roles. The wives are to submit to the husbands as Christ to the church, yes; but the authority of the husband is within the confines of Christ “nourishing and cherishing” the church (Eph 5:29).

So he’s stated that the husband has one duty, and the wife has another. So far seems legit. But then we get to this bit:

The apostles makes it quite clear what the context of the husband’s authority over the wife is:within the authority of Christ’s salvific role over the church. Men do not rule over a woman like Saddam Hussein ruled over Iraq; men rule over a woman for the purpose of loving them, nourishing them, and cherishing them as if they were their own body because, in the context of marriage, they are (Eph 5:28-30).

Suddenly we’ve got another figure in this relationship. There’s a man, a woman, and the church. That’s really odd considering marriage existed since God created the world.(Genesis 2:24) What were all of these marriages doing before?

Mr. Gloucester then goes on to take apart an argument where wives can be disciplined like children. This is important, since it’s another place where we start to see a glimpse of his concept of authority in a marriage:

The problem is that whenever scripture covers relationships within the home, husband and wife are seen as one unit, and one separate from children and servants. This is clearly seen in the Epistle to the Ephesians, where Paul treats the relationship between husband and wife (Eph 5:22-33) separately from children to parents (Eph 6:1-3), fathers to children (Eph 6:4), and slaves to masters (Eph 6:5-9). The same is seen in the Epistle to the Colossians, where he touches on the same issue (Col 3:18-25). Again, all authority is seen within a specific context, just as love would be understood differently depending on the the relationship of the two people (eg., the love between husband and wife operates differently than love between father and daughter). In the same manner, the authority that a policeman has over me is different than the authority my boss has over me, and the authority I have over my wife. In regards to the topic at hand, the husband has authority over the wife in the context of Christ and his work on the cross (cf. Eph 5:25-28), while the father has authority over his children with the responsibility of raising them in the Lord’s training and instruction (Eph 6:4).

So in this new creature created by marriage, the brain only has limited control over one of it’s limbs. All of this is needed, because Mr. Gloucester needs to define the shape of authority. But why? Don’t we have an easy example of the pattern in Christ?

Now that he’s done with Scripture, the article moves on to a set of dilemmas.

Dilemma #1: The vagueness of the rule.

Dilemma #2: The removal of accountability

Dilemma #3: Blaming the victim.

While these are perhaps interesting issues surrounding a husband abusing his wife within a marriage, they’re all red herrings. None of them matter to the core issue, does the man have the right to punish? In conclusion, Mr. Gloucester asserts that a husband has no right to discipline his wife. Oh he gives a sop, but it’s about leading and guidance and listening to church councilors.

Now! We’ve gotten through the core bits, and we can get on to the explanation of heresy. By defanging men, Mr. Gloucester is denying the symbol of marriage. How so? It all comes back to authority. What is it? Well, lets go to the source of all authority. The classic is of course Isaiah 45:9, but for our purposes the book of Job is even better.

In Job, God decides that He will allow Job’s property to be stolen and destroyed, his children and servants killed, and his body afflicted. In Job 40:8

Will you also annul my judgment? will you condemn me, that you may be righteous?

Does the one underneath authority have any right to contend with the one over them? God is quite clear here. But Christ is nice I hear you say. This doesn’t apply because it’s not a marriage relationship. And Christ loves the church. He would never be mean like that bad Old Testament God. Well friends, God is the same yesterday, today, and tomorrow. And also, Christ IS God. So why should Christ be different? Indeed, we see Him to be the same in Revelation 2:5

Remember therefore from where you are fallen, and repent, and do the first works; or else I will come unto you quickly, and will remove your lampstand out of its place, except you repent.

What’s this? Christ (the husband) is holding the church (the wife) accountable and laying out a punishment He deems fits. The next couple chapters all follow the same theme. If you have been given authority, it is within your right to punish. But what about Mr. Gloucester’s examples of other authorities? How they were limited?

There’s a very simple question we need to ask for that. Who does the limiting in every, single case? The authority from above! God has put a hierarchy of authority in place. In the marriage relationship, what is this chain? It’s Christ –> Husband –> Wife. So the wife and other’s have no place to demand that this punishment or that punishment is wrong.

Fundamentally, these passages and many more give the lie to the claim of authority without power. In Genesis, God states a Husband shall rule over his wife. Throughout the Old Testament, God punishes His people the Israelites when they stray. And in the New Testament, our example Christ warns of punishment to the church. Authority without power is mere responsibility.

By denying the God given authority of a husband, Mr. Gloucester denies God’s authority over him. He claims that he knows better, and contends with his Creator. But this heresy is wrapped within the feel good message of husbands not abusing wives. What husband wants to? Flesh of my Flesh and all that. It’s an exceptional edge case blown up to allow the stripping of authority to slip by.

It’s a frightening relationship, but marriage is a symbol of Christ and the church! As God says in Job, He is an absolute ruler. Why would we think this would change in marriage? Thank God we know He is merciful and kind, far beyond what we deserve! A husband cannot be merciful, kind, and loving about withholding punishment if he never had the right to give it in the first place. But can a husband be indiscriminate? Is he a tyrant? God defined the hierarchy of authority, and placed a husband above a wife. But didn’t He also place the husband below Christ? Yes, and the same as those churches were warned, husbands are charged to care for and love what God has placed beneath them. The same as the parable of the talents, God will ask us men what we have done with the things He has given us.

I don’t know about Mr. Gloucester, but that causes me great fear. Husbands can’t abdicate from the position of authority God has given them just because they don’t like holding their subordinates feet to the fire, nor can they claim they have no one to answer to as they beat their wife. God will ask not only why the abused their wives, but also why they allowed their wives to run rampant. They have the power God has given them, and the question is whether or not they will hang themselves with it.

What is authority?

Joining a project

One of the primary reasons it takes at least three months to ramp up on a project is simply due to communication. The vast majority of the meta information about the project lives in the heads of the developers. Even if they have all the code cleanly laid out, in one repository, with lots of comments, it will still be hard to sync mentally.

  • Why was X done instead of Y
  • Where are we going
  • Where did we come from
  • What are the standards

Anything beyond a trivial code base has these issues. Since it only lives in the devs heads, it takes time for them to remember, and then to tell you. So you accumulate bits until all is clear.

Joining a project

A functional local build

So I’ve got the python marpa bindings pulled out and working locally. Packaging this up is not quite as obvious as I would have expected. You’d think there’d be instructions for how to build a c lib/install a needed library as part of setup.py or something.

Found where marpa hid the built library! Now I just need to figure out what bits I need for cffi. I can have the tarball in the repo, unpack it on target system, make, then tell setup.py to install the shared lib.

A functional local build