Role Based Access Control in One Line

Role based access control (RBAC) is a standard architecture for implementing authorization systems. The key feature is that permissions to access resources are not given directly to users but rather to roles. Users are then given sets of roles for a session. For example, in a publishing organization, there might be an Author role and an Editor role. Both roles may be permitted to create new articles and modify them, but only the Editor may be permitted to push the final article to the website. Creating this layer of abstraction makes maintaining permissions easier. When a new hire needs to be put into the system, only a small number of easily understood roles need to be added to their user account rather than a long list of individual permissions. When policies change and new permissions need to be given to a lot of users, one can simply assign those permissions to the correct roles, then all of the users with the roles will get those new permissions.

Hierarchical RBAC takes this one step further. Roles can inherit a subset of their permissions from other roles. In the publishing example above, one could make the Editor role inherit from the Author role because an Editor is allowed to do anything an Author could do. In the RBAC implementation, we just need to make sure that the Editor role has explicit permissions for just the things that are special to Editors instead of maintaining duplicated sets of permissions. The hierarchy of roles will form a directed graph, with varying amounts of complexity depending on the policies that it is modelling.

In order to authorize a user to access a resource in a hierarchical RBAC system, the system needs to walk up the connected graph of roles that are assigned to the user. Only if one of the roles in that graph has the appropriate permission is the user authorized to access the resource. Formulated this way, hierarchical RBAC is just a graph traversal problem.

It seemed to me that this would make a neat use case for a graph database. They usually come equipped with some kind of query language that represents graph traversals clearly and succinctly. Gremlin is a standard one that works across many Java-based graph databases. Let’s say that we make nodes in the database for Users, Roles, and Resources. Nodes and edges can have attributes attached to them; let’s say that Users have a username attribute and Roles have a name attribute while a Resource has a url attribute. Permissions associate Roles to Resources, so we will make an edge to each Resource that a Role is permitted to access. Edges have labels that help distinguish the kinds of relationships that are being modelled. We will call this one permission. Edges can also have data, and we do want to distinguish different kinds of access like 'read' and 'write', so we will also add an action attribute to these edges. To associate Users to Roles and child Roles to parent Roles we will add has_role edges. It’s a bit of a cheat to use the same kind of edge for both cases because it makes the query marginally more elegant.

Here is the Gremlin query that checks if a given User can access a Resource with a particular action:

g.v(user_id).out('has_role').loop(1){true}{true}.outE('permission').has('action', action).inV().has('url', resource_url).count() == 1

That’s it. Let’s break it down to see what is going on. Gremlin queries have a pipeline model. Each method in that chain adds a new processing step to the final pipeline. Each processing step will accept an iterator of items from the previous step and will generate any number of items for the next step in the pipeline.

g is just the graph object itself. It’s not really part of the pipeline, just the API entry point.

.v(user_id) starts the pipeline by looking for the node (or vertex in Gremlin terms) with the given unique user_id, just a number that the graph database assigned when it created the node. It will yield just that one node for the next step in the pipeline.

.out('has_role') iterates over every outgoing edge with the label has_role and yield the node at the other side of it.

.loop(1){true}{true} is more complicated. This instructs the pipeline to take the sequence of inputs and put them back in the pipeline 1 step back, in this case the .out() step. The first {true} is a “while” condition; while that expression is true, it will keep looping. The effect I am trying to achieve is a complete traversal of the graph of Role nodes connected by has_role edges. I just use true to allow it to go on until the subgraph of has_role-connected nodes is exhausted. The second {true} is a condition for passing the intermediate nodes further down the pipeline. In this case, I want all of the Role nodes that are ultimately connected to the User to be checked for permissions, not just the final nodes, so I use an unqualified true here.

.outE('permission') will yield all permission edges from each of the Role nodes that are being fed into it.

.has('action', action) will filter the iterator of permission edges to only pass on those that have an action attribute that matches the requested one.

.inV() yield the Resource node (vertex) that the permission edge is going into.

.has('url', resource_url) will filter out those permitted Resources that do not match the query.

.count() will count the number of elements that make it to the end of the pipeline. Assuming we kept the Resource nodes unique, this will either be 0 or 1. If it is 1, then the User is authorized to perform the action on the Resource.

Supposing you wanted to just list all of the Resources that a User can perform an action to, we can just omit the last couple of steps:

g.v(user_id).out('has_role').loop(1){true}{true}.outE('permission').has('action', action).inV()

We can reverse this pipeline to see what Users can access a Resource:

g.v(resource_id).inE('permission').has('action', action).outV().in('has_role').loop(1){true}{it.object.username != null}

The new part in this query is that the final loop() should only omit the final User nodes, those that have a non-null username attribute, at the end of the chain, not the intermediate Roles. it is a special iterator object that is passed to these loop expressions and it.object is the current object being processed.

So there you go, hierarchical RBAC implemented in one-liners.


The Utility of Not Voting

I’ve started listening to Penn’s Sunday School podcast on my way to work in the morning. I enjoyed his previous radio show, and I’m enjoying this one too. Recently, he had Bill Nye on the show. The topic of climate change came up, as you might expect with The Science Guy. Bill made the point that if you care about climate change, this upcoming election is really important. One party (which he did not name in the show, but I will: the Republicans) actively oppose responsible approaches to climate change and the other doesn’t so much. He went further to state that voting for a third party is just throwing your vote away.

Penn on the other hand advocates not voting at all. “The only way you can waste your vote is to vote.” Voting for the lesser of two evils just perpetuates the evil. It occurred to me that this doesn’t hold up even to Penn’s self-admittedly nutty standards of argument. A quick web search shows that for presidential elections, the US turnout has been only about 50-60% for the past fifty years. Splitting up the voters roughly evenly between the two major parties, we find that a plurality of voting-age Americans has taken Penn’s position for at least half a century.

And nothing has changed. There is no sign that the two evils have weakened any by this lack of participation by Penn and those who act like him.

I think it's fairly easy to see why. (Well, in all honesty, I should say that it’s easy to speculate why. If any of my poli-sci readers can point me to more rigorous studies, I’d love to read them.) Firmly removing yourself from the voting pool like this is not something that the two major parties are really concerned with. Sometimes, they like to help you out. In the game theoretic view of the election, a candidate really only needs to concern themselves with pleasing voters that can be persuaded to vote for them. Otherwise, they would prefer that you not vote at all or “throw your vote” away on a non-viable third party candidate to deny the viable competitor a vote.

Non-voters just become one less advertising dollar candidates have to spend and one less opinion they have to pay attention to. Non-voters are just helping them out and getting nothing in return.

And with that reminder, I now go submit my absentee ballot request. Thank you, Texas, for letting me send it by email.


Roll Me Up

I play role playing games. Sometimes, I get to be on the opposite side of the screen and run them for my friends as the game master. When I do, I have a fondness for random tables. They are concentrated packets of possibility, ready to seed your creativity. Naturally, diligent geeks wrote up programs and web apps to do the dice rolling for you and greatly expand what could be done with the venerable random table. Abulafia is a growing collection of these random generators and allows users to edit and contribute more tables as a wiki.

I play RPGs online through a virtual tabletop since my roleplaying group is scattered geographically. I’m not even on the same continent any more! Since we are all on our computers anyways, it is tempting to use these online generators when improvising at the (virtual) table as well as in prep. Unfortunately, they all require just a little too much interaction to find the generator that you want to use and get the random results. Sometimes, you need a dragon name while planning your next Big Bad, but sometimes, you just need some quick tavern chatter for color. I wanted a site where I could just type what I wanted and get the random results immediately.

So I wrote one.

I exported the generator tables from Abulafia, lightly curated them, and fixed some typos and broken cross-references between tables. These generators all have a simple structure, some text that contains references to tables. The tables are just weighted lists of texts that may have other table references that need to be recursively expanded. Piece of cake, thanks to re and random.

Making usable search engine required a little more thought. I wanted to return results as you type, so for a good user experience, the “random” results needed to be stable. Each time you load the site, you will get a new random seed. As you type in your search terms, this random seed will start the random number generator used to generate the results. If a generator continues to match what you are typing, the results will not change as you type each character, but generators may disappear if they do not match the search query. Each match has a link that will take you to a URL that includes this random seed, so you can bookmark results that you liked.

I would like to highlight some of the tools that I used to make the web part of the site. My design skills are rather minimal, but Twitter Bootstrap and other free resources really helped get something looking reasonable in a very short time.

If this is the kind of thing that speaks to you, please try out the site and let me know how you like it.


grinning like a fool

Back in 2007 I wrote a small grep-replacement tool called grin. I wanted a quick tool for searching through source code. grep is a great tool, but it is geared towards a slightly different use case. Or rather 15 other use cases. It‘s a very flexible and general tool. I wanted something that was configured just for me, that my fingers could just type out without thinking and have it do what I meant it to.

The first thing on my list was recursive directory searching by default rather than reading from stdin. Next, I wanted to eliminate binary files. I am a Python programmer, so grepping my code for an identifier will usually show both the .py source file and its compiled .pyc file in the list.

The last major improvement was to avoid certain specially-named directories from the recursion. This was in the bad old days when I was using SVN regularly. For those who don‘t remember or were lucky enough to skip non-distributed version control entirely, SVN makes a .svn/ directory in each of your directories. Under this directory, SVN stores a plaintext copy of each of your files. The default grep will recurse into this directory and search through all of these copies in addition to your real files. This made the default grep pretty useless for tasks like checking if I had successfully changed all uses of a renamed method, for example. The old unmodified versions were helpfully kept there by SVN.

I also wanted grin to be a useful set of components to be used as a library for other searching tasks. The first thing I wrote was a small class to recurse through directories applying the configured rules. It classifies files into groups like text, binary, and symbolic link for the rest of grin to use to determine whether it should be searched or skipped. I quickly realized that I could easily repurpose this component for finding files instead of searching through them. I exposed this as the grind script which applies the same recursion rules and options as grin but applies a glob pattern to the filename and prints the matching paths similar to find.

One of the neater things I could do with this flexibility was to preprocess files before passing them through the core regex search. I wrote to configure the file recognizer to only pass through Python files. It will tokenize the Python sources and classify each token as a comment, a string, or other Python code. It then reassembles a text stream where only the requested token types are present and the others are replaced in-place with spaces. This lets you find, for example, all mentions of an identifier in actual code but not surrounding comments or in strings.

This is an example of looking for the string "grep" in the grin source code, restricting the results to just the Python code and not the strings or comments:

$ -i --python-code grep
  187 : class GrepText(object):
  291 :     def do_grep(self, fp):
  321 :             (block_line_count, block_context) = self.do_grep_block(block,
  342 :     def do_grep_block(self, block, line_num_offset):
  476 :     def grep_a_file(self, filename, opener=open):
  501 :             unique_context = self.do_grep(f)
 1028 :         g = GrepText(regex, args)
 1031 :             report = g.grep_a_file(filename, opener=openers[kind])

We can see a single, lonely comment:

$ -i --comments grep
  985 :     # something we want to grep.

But several places in the docstrings:

$ -i --strings grep
  188 :     """ Grep a single file for a regex by iterating over the lines in a file.
  292 :         """ Do a full grep.
  343 :         """ Grep a single block of file content.
  420 :             The name of the file being grepped, if one exists. If not provided,
  477 :         """ Grep a single file that actually exists on the file system.
  491 :             The grep results as text.
  634 :                 It should should be grepped for the pattern and the matching
  637 :                 The file is binary and should be either ignored or grepped
  641 :                 The file is gzip-compressed and should be grepped while
  924 :     """ Generate the filenames to grep.

In short, grin is grep configured the way I like it. It‘s not a speed demon compared to grep, it‘s not as general as grep, but it doesn‘t make me think, a desirable quality in a tool that I reach for dozens of times a day.


About Me

Only missing a monocle.

Hello, world!

My name is Robert Kern. I am a Python developer by day and, too often, a Python developer by night.

I thought I would start off by telling you how I got into Python and programming in general. My father is a programmer, and I would (try to) read his books as a kid. I tried to learn C when I was 12 or so. We even had a C compiler installed on our DOS PC (amusingly, it is still for sale on this newfangled Internet thing). I learned enough to do simple loops, to printf() things, and a bit of file I/O. I learned to please the compiler with cargo cult semicolons. I had read about C++ and modeling objects with classes.

But nothing really took hold, to be honest. At the time, I was the kind of person that needed a real, compelling task to motivate me to truly learn a skill. As a lazy teenager, I was not a self-starter. Fast-forwarding a bit through the fallow years to the end of high-school, I obtained an internship with the structural biochemistry group at a nearby National Cancer Institute lab. We were doing bioinformatic sequence analyses on parts of the HIV genome which had hundreds of publicly available sequences by that time (hundreds, I say!). There were tools that did much of this work, and they could be strung together with baling wire and shell scripts, but I felt I needed something more.

Fortunately, Dad had run across this new language called Python that had extensions for scientific computing. Finally, I had a job to do, and the right tools to finish it! I fell in love in a day of frantic coding. I could replace expensive, inscrutable, unmodifiable software in with just a few scripts that I could understand thoroughly. I was powerful, and it was Python that made me mighty. I promptly subscribed to comp.lang.python on USENET (because I kick it old school) and learned an astonishing amount about software development from my new peers more or less by osmosis.

I tried to be a helpful, if sometimes overly terse, contributor to the newsgroup and the Matrix-SIG. When SciPy came around and bundled together some of the numerical libraries I had been using, I pitched in there too and acquired a minor reputation as a useful curmudgeon.

I can think of worse things to be known as.


First Post

Welcome to my blog!

I have finally reached the point where I must stop fiddling with the blog theme and blog engine and start writing posts. Bother.