Role Based Access Control in One Line

  |  More posts about programming nosql

Role based access control (RBAC) is a standard architecture for implementing authorization systems. The key feature is that permissions to access resources are not given directly to users but rather to roles. Users are then given sets of roles for a session. For example, in a publishing organization, there might be an Author role and an Editor role. Both roles may be permitted to create new articles and modify them, but only the Editor may be permitted to push the final article to the website. Creating this layer of abstraction makes maintaining permissions easier. When a new hire needs to be put into the system, only a small number of easily understood roles need to be added to their user account rather than a long list of individual permissions. When policies change and new permissions need to be given to a lot of users, one can simply assign those permissions to the correct roles, then all of the users with the roles will get those new permissions.

Hierarchical RBAC takes this one step further. Roles can inherit a subset of their permissions from other roles. In the publishing example above, one could make the Editor role inherit from the Author role because an Editor is allowed to do anything an Author could do. In the RBAC implementation, we just need to make sure that the Editor role has explicit permissions for just the things that are special to Editors instead of maintaining duplicated sets of permissions. The hierarchy of roles will form a directed graph, with varying amounts of complexity depending on the policies that it is modelling.

In order to authorize a user to access a resource in a hierarchical RBAC system, the system needs to walk up the connected graph of roles that are assigned to the user. Only if one of the roles in that graph has the appropriate permission is the user authorized to access the resource. Formulated this way, hierarchical RBAC is just a graph traversal problem.

It seemed to me that this would make a neat use case for a graph database. They usually come equipped with some kind of query language that represents graph traversals clearly and succinctly. Gremlin is a standard one that works across many Java-based graph databases. Let’s say that we make nodes in the database for Users, Roles, and Resources. Nodes and edges can have attributes attached to them; let’s say that Users have a username attribute and Roles have a name attribute while a Resource has a url attribute. Permissions associate Roles to Resources, so we will make an edge to each Resource that a Role is permitted to access. Edges have labels that help distinguish the kinds of relationships that are being modelled. We will call this one permission. Edges can also have data, and we do want to distinguish different kinds of access like 'read' and 'write', so we will also add an action attribute to these edges. To associate Users to Roles and child Roles to parent Roles we will add has_role edges. It’s a bit of a cheat to use the same kind of edge for both cases because it makes the query marginally more elegant.

Here is the Gremlin query that checks if a given User can access a Resource with a particular action:

g.v(user_id).out('has_role').loop(1){true}{true}.outE('permission').has('action', action).inV().has('url', resource_url).count() == 1

That’s it. Let’s break it down to see what is going on. Gremlin queries have a pipeline model. Each method in that chain adds a new processing step to the final pipeline. Each processing step will accept an iterator of items from the previous step and will generate any number of items for the next step in the pipeline.

g is just the graph object itself. It’s not really part of the pipeline, just the API entry point.

.v(user_id) starts the pipeline by looking for the node (or vertex in Gremlin terms) with the given unique user_id, just a number that the graph database assigned when it created the node. It will yield just that one node for the next step in the pipeline.

.out('has_role') iterates over every outgoing edge with the label has_role and yield the node at the other side of it.

.loop(1){true}{true} is more complicated. This instructs the pipeline to take the sequence of inputs and put them back in the pipeline 1 step back, in this case the .out() step. The first {true} is a “while” condition; while that expression is true, it will keep looping. The effect I am trying to achieve is a complete traversal of the graph of Role nodes connected by has_role edges. I just use true to allow it to go on until the subgraph of has_role-connected nodes is exhausted. The second {true} is a condition for passing the intermediate nodes further down the pipeline. In this case, I want all of the Role nodes that are ultimately connected to the User to be checked for permissions, not just the final nodes, so I use an unqualified true here.

.outE('permission') will yield all permission edges from each of the Role nodes that are being fed into it.

.has('action', action) will filter the iterator of permission edges to only pass on those that have an action attribute that matches the requested one.

.inV() yield the Resource node (vertex) that the permission edge is going into.

.has('url', resource_url) will filter out those permitted Resources that do not match the query.

.count() will count the number of elements that make it to the end of the pipeline. Assuming we kept the Resource nodes unique, this will either be 0 or 1. If it is 1, then the User is authorized to perform the action on the Resource.

Supposing you wanted to just list all of the Resources that a User can perform an action to, we can just omit the last couple of steps:

g.v(user_id).out('has_role').loop(1){true}{true}.outE('permission').has('action', action).inV()

We can reverse this pipeline to see what Users can access a Resource:

g.v(resource_id).inE('permission').has('action', action).outV().in('has_role').loop(1){true}{it.object.username != null}

The new part in this query is that the final loop() should only omit the final User nodes, those that have a non-null username attribute, at the end of the chain, not the intermediate Roles. it is a special iterator object that is passed to these loop expressions and it.object is the current object being processed.

So there you go, hierarchical RBAC implemented in one-liners.

comments powered by Disqus