Posts for Thursday, May 28, 2015

less portage rsync output

Ever thought how to silence rsync when doing emerge --sync (or eix-sync). Sure, it's nice when we get lots of information. But it's like with compiling packages - The first few years it's amazing looking at the terminal while it's compiling the latest stuff. However, after a while these things become a bit boring.
While we have --quite-build for emerge, rsync per default outputs every single file which gets transferred and deleted. Luckily, recent versions of rsync, which also went already stable, support new ways of progress output and since i use them already on other scripts i decided to modify my portage rsync settings a bit:

1
PORTAGE_RSYNC_EXTRA_OPTS="--info=progress2,name0,del0"

The output looks similar like that:


Neat, isn't it?
BTW, in order this works correctly - the remote rsync server need to run a recent version of rsync as well.

Posts for Monday, May 25, 2015

avatar

PostgreSQL with central authentication and authorization

I have been running a PostgreSQL cluster for a while as the primary backend for many services. The database system is very robust, well supported by the community and very powerful. In this post, I’m going to show how I use central authentication and authorization with PostgreSQL.

Centralized management is an important principle whenever deployments become very dispersed. For authentication and authorization, having a high-available LDAP is one of the more powerful components in any architecture. It isn’t the only method though – it is also possible to use a distributed approach where the master data is centrally managed, but the proper data is distributed to the various systems that need it. Such a distributed approach allows for high availability without the need for a highly available central infrastructure (user ids, group membership and passwords are distributed to the servers rather than queried centrally). Here, I’m going to focus on a mixture of both methods: central authentication for password verification, and distributed authorization.

PostgreSQL default uses in-database credentials

By default, PostgreSQL uses in-database credentials for the authentication and authorization. When a CREATE ROLE (or CREATE USER) command is issued with a password, it is stored in the pg_catalog.pg_authid table:

postgres# select rolname, rolpassword from pg_catalog.pg_authid;
    rolname     |             rolpassword             
----------------+-------------------------------------
 postgres_admin | 
 dmvsl          | 
 johan          | 
 hdc_owner      | 
 hdc_reader     | 
 hdc_readwrite  | 
 hadoop         | 
 swift          | 
 sean           | 
 hdpreport      | 
 postgres       | md5c127bc9fc185daf0e06e785876e38484

Authorizations are also stored in the database (and unless I’m mistaken, this cannot be moved outside):

postgres# \l db_hadoop
                                   List of databases
   Name    |   Owner   | Encoding |  Collate   |   Ctype    |     Access privileges     
-----------+-----------+----------+------------+------------+---------------------------
 db_hadoop | hdc_owner | UTF8     | en_US.utf8 | en_US.utf8 | hdc_owner=CTc/hdc_owner  +
           |           |          |            |            | hdc_reader=c/hdc_owner   +
           |           |          |            |            | hdc_readwrite=c/hdc_owner

Furthermore, PostgreSQL has some additional access controls through its pg_hba.conf file, in which the access towards the PostgreSQL service itself can be governed based on context information (such as originating IP address, target database, etc.).

For more information about the standard setups for PostgreSQL, definitely go through the official PostgreSQL documentation as it is well documented and kept up-to-date.

Now, for central management, in-database settings become more difficult to handle.

Using PAM for authentication

The first step to move the management of authentication and authorization outside the database is to look at a way to authenticate users (password verification) outside the database. I tend not to use a distributed password approach (where a central component is responsible for changing passwords on multiple targets), instead relying on a high-available LDAP setup, but with local caching (to catch short-lived network hick-ups) and local password use for last-hope accounts (such as root and admin accounts).

PostgreSQL can be configured to directly interact with an LDAP, but I like to use Linux PAM whenever I can. For my systems, it is a standard way of managing the authentication of many services, so the same goes for PostgreSQL. And with the sys-auth/pam_ldap package integrating multiple services with LDAP is a breeze. So the first step is to have PostgreSQL use PAM for authentication. This is handled through its pg_hba.conf file:

# TYPE  DATABASE        USER    ADDRESS         METHOD          [OPTIONS]
local   all             all                     md5
host    all             all     all             pam             pamservice=postgresql

This will have PostgreSQL use the postgresql PAM service for authentication. The PAM configuration is thus in /etc/pam.d/postgresql. In it, we can either directly use the LDAP PAM modules, or use the SSSD modules and have SSSD work with LDAP.

Yet, this isn’t sufficient. We still need to tell PostgreSQL which users can be authenticated – the users need to be defined in the database (just without password credentials because that is handled externally now). This is done together with the authorization handling.

Users and group membership

Every service on the systems I maintain has dedicated groups in which for instance its administrators are listed. For instance, for the PostgreSQL services:

# getent group gpgsqladmin
gpgsqladmin:x:413843:swift,dmvsl

A local batch job (ran through cron) queries this group (which I call the masterlist, as well as queries which users in PostgreSQL are assigned the postgres_admin role (which is a superuser role like postgres and is used as the intermediate role to assign to administrators of a PostgreSQL service), known as the slavelist. Delta’s are then used to add the user or remove it.

# Note: membersToAdd / membersToRemove / _psql are custom functions
#       so do not vainly search for them on your system ;-)
for member in $(membersToAdd ${masterlist} ${slavelist}) ; do
  _psql "CREATE USER ${member} LOGIN INHERIT;" postgres
  _psql "GRANT postgres_admin TO ${member};" postgres
done

for member in $(membersToRemove ${masterlist} ${slavelist}) ; do
  _psql "REVOKE postgres_admin FROM ${member};" postgres
  _psql "DROP USER ${member};" postgres
done

The postgres_admin role is created whenever I create a PostgreSQL instance. Likewise, for databases, a number of roles are added as well. For instance, for the db_hadoop database, the hdc_owner, hdc_reader and hdc_readwrite roles are created with the right set of privileges. Users are then granted this role if they belong to the right group in the LDAP. For instance:

# getent group gpgsqlhdc_own
gpgsqlhdc_own:x:413850:hadoop,johan,christov,sean

With this simple approach, granting users access to a database is a matter of adding the user to the right group (like gpgsqlhdc_ro for read-only access to the Hadoop related database(s)) and either wait for the cron-job to add it, or manually run the authorization synchronization. By standardizing on infrastructural roles (admin, auditor) and data roles (owner, rw, ro) managing multiple databases is a breeze.

Posts for Monday, May 18, 2015

avatar

Testing with permissive domains

When testing out new technologies or new setups, not having (proper) SELinux policies can be a nuisance. Not only are the number of SELinux policies that are available through the standard repositories limited, some of these policies are not even written with the same level of confinement that an administrator might expect. Or perhaps the technology to be tested is used in a completely different manner.

Without proper policies, any attempt to start such a daemon or application might or will cause permission violations. In many cases, developers or users tend to disable SELinux enforcing then so that they can continue playing with the new technology. And why not? After all, policy development is to be done after the technology is understood.

But completely putting the system in permissive mode is overshooting. It is much easier to make a very simple policy to start with, and then mark the domain as a permissive domain. What happens is that the software then, after transitioning into the “simple” domain, is not part of the SELinux enforcements anymore whereas the rest of the system remains in SELinux enforcing mode.

For instance, create a minuscule policy like so:

policy_module(testdom, 1.0)

type testdom_t;
type testdom_exec_t;
init_daemon_domain(testdom_t, testdom_exec_t)

Mark the executable for the daemon as testdom_exec_t (after building and loading the minuscule policy):

~# chcon -t testdom_exec_t /opt/something/bin/daemond

Finally, tell SELinux that testdom_t is to be seen as a permissive domain:

~# semanage permissive -a testdom_t

When finished, don’t forget to remove the permissive bit (semanage permissive -d testdom_t) and unload/remove the SELinux policy module.

And that’s it. If the daemon is now started (through a standard init script) it will run as testdom_t and everything it does will be logged, but not enforced by SELinux. That might even help in understanding the application better.

Posts for Sunday, May 10, 2015

avatar

Audit buffering and rate limiting

Be it because of SELinux experiments, or through general audit experiments, sometimes you’ll get in touch with a message similar to the following:

audit: audit_backlog=321 > audit_backlog_limit=320
audit: audit_lost=44395 audit_rate_limit=0 audit_backlog_limit=320
audit: backlog limit exceeded

The message shows up when certain audit events could not be logged through the audit subsystem. Depending on the system configuration, they might be either ignored, sent through the kernel logging infrastructure or even have the system panic. And if the messages are sent to the kernel log then they might show up, but even that log has its limitations, which can lead to output similar to the following:

__ratelimit: 53 callbacks suppressed

In this post, I want to give some pointers in configuring the audit subsystem as well as understand these messages…

There is auditd and kauditd

If you take a look at the audit processes running on the system, you’ll notice that (assuming Linux auditing is used of course) two processes are running:

# ps -ef | grep audit
root      1483     1  0 10:11 ?        00:00:00 /sbin/auditd
root      1486     2  0 10:11 ?        00:00:00 [kauditd]

The /sbin/auditd daemon is the user-space audit daemon. It registers itself with the Linux kernel audit subsystem (through the audit netlink system), which responds with spawning the kauditd kernel thread/process. The fact that the process is a kernel-level one is why the kauditd is surrounded by brackets in the ps output.

Once this is done, audit messages are communicated through the netlink socket to the user-space audit daemon. For the detail-oriented people amongst you, look for the kauditd_send_skb() method in the kernel/audit.c file. Now, generated audit event messages are not directly relayed to the audit daemon – they are first queued in a sort-of backlog, which is where the backlog-related messages above come from.

Audit backlog queue

In the kernel-level audit subsystem, a socket buffer queue is used to hold audit events. Whenever a new audit event is received, it is logged and prepared to be added to this queue. Adding to this queue can be controlled through a few parameters.

The first parameter is the backlog limit. Be it through a kernel boot parameter (audit_backlog_limit=N) or through a message relayed by the user-space audit daemon (auditctl -b N), this limit will ensure that a queue cannot grow beyond a certain size (expressed in the amount of messages). If an audit event is logged which would grow the queue beyond this limit, then a failure occurs and is handled according to the system configuration (more on that later).

The second parameter is the rate limit. When more audit events are logged within a second than set through this parameter (which can be controlled through a message relayed by the user-space audit system, using auditctl -r N) then those audit events are not added to the queue. Instead, a failure occurs and is handled according to the system configuration.

Only when the limits are not reached is the message added to the queue, allowing the user-space audit daemon to consume those events and log those according to the audit configuration. There are some good resources on audit configuration available on the Internet. I find this SuSe chapter worth reading, but many others exist as well.

There is a useful command related to the subject of the audit backlog queue. It queries the audit subsystem for its current status:

# auditctl -s
AUDIT_STATUS: enabled=1 flag=1 pid=1483 rate_limit=0 backlog_limit=8192 lost=3 backlog=0

The command displays not only the audit state (enabled or not) but also the settings for rate limits (on the audit backlog) and backlog limit. It also shows how many events are currently still waiting in the backlog queue (which is zero in our case, so the audit user-space daemon has properly consumed and logged the audit events).

Failure handling

If an audit event cannot be logged, then this failure needs to be resolved. The Linux audit subsystem can be configured do either silently discard the message, switch to the kernel log subsystem, or panic. This can be configured through the audit user-space (auditctl -f [0..2]), but is usually left at the default (which is 1, being to switch to the kernel log subsystem).

Before that is done, the message is displayed which reveals the cause of the failure handling:

audit: audit_backlog=321 > audit_backlog_limit=320
audit: audit_lost=44395 audit_rate_limit=0 audit_backlog_limit=320
audit: backlog limit exceeded

In this case, the backlog queue was set to contain at most 320 entries (which is low for a production system) and more messages were being added (the Linux kernel in certain cases allows to have a few more entries than configured for performance and consistency reasons). The number of events already lost is displayed, as well as the current limitation settings. The message “backlog limit exceeded” can be “rate limit exceeded” if that was the limitation that was triggered.

Now, if the system is not configured to silently discard it, or to panic the system, then the “dropped” messages are sent to the kernel log subsystem. The calls however are also governed through a configurable limitation: it uses a rate limit which can be set through sysctl:

# sysctl -a | grep kernel.printk_rate
kernel.printk_ratelimit = 5
kernel.printk_ratelimit_burst = 10

In the above example, this system allows one message every 5 seconds, but does allow a burst of up to 10 messages at once. When the rate limitation kicks in, then the kernel will log (at most one per second) the number of suppressed events:

[40676.545099] __ratelimit: 246 callbacks suppressed

Although this limit is kernel-wide, not all kernel log events are governed through it. It is the caller subsystem (in our case, the audit subsystem) which is responsible for having its events governed through this rate limitation or not.

Finishing up

Before waving goodbye, I would like to point out that the backlog queue is a memory queue (and not on disk, Red Hat), just in case it wasn’t obvious. Increasing the queue size can result in more kernel memory consumption. Apparently, a practical size estimate is around 9000 bytes per message. On production systems, it is advised not to make this setting too low. I personally set it to 8192.

Lost audit events might result in difficulties for troubleshooting, which is the case when dealing with new or experimental SELinux policies. It would also result in missing security-important events. It is the audit subsystem, after all. So tune it properly, and enjoy the power of Linux’ audit subsystem.

Posts for Sunday, May 3, 2015

How to conference

(This text evolved over a few weeks in short bursts of work in my free time. Sadly I don’t really have the time to properly fix it, edit it and make it nice right now, I just need to get it out. So I apologize for the bad quality of writing/text.)

I have had the privilege to attend quite a few conferences in my life, some as speaker/presenter, some as participant. After the first few times where these events can wash over you like a tsunami I started realizing how different these things we call conferences really are now wide the spectrum of conferency things actually is.

In this short brainfart I am writing mostly from the perspective of a presenter because that is an area where I think I can actually add a few things that maybe have not been said before by others. In the last few months I have seen quite a few interesting reflections on how to improve the concept and implementation of “conference” especially when it comes to accessibility and respect, I encourage you to for example check out this post by the conc.at organizers on how little it actually cost them to make conc.at a safer space for the participants and this ttw15 inspired reflection on conferences as a technoligy by Sky Croeser . There is a lot more of that out there.

So let me just quickly state the obvious before I dive into these notes: If you provide any sort of public space (as a conference) you do not only need an anti-harassment policy but also clear mechanisms for victims to report problems and have them taken care of. Many conferences these days have a policy in place but no real mechanics to enforce them, which is ridiculous in this day and age. You also need to think about whether people with disabilities can actually enter and navigate your space, have access to bathrooms. Offering some sort of drinkable water supply and if possible snacks to people with limited mobility on site is also something often overlooked (it can be hard to “just jump over to the next store to buy water if you can’t walk that far). And there is always the question of offering some kind of child care to enable parents to participate. The list of these issues is long and I get that not every conference has the resources to tackle all of them, you should think about them when thinking about organizing one though.

So let’s talk about presenting. I enjoy presenting at conferences. It feels like I contribute as well as have a good reason to attend (apart from my own indulgence). It also forces me to boil an idea or a thought down into a presentable piece which helps me think on. I tend not to do the same talk multiple times because it bores me and I find it a waste of time and limited possible space for ideas to redo a session that already has a publicly available recording. I also try to finish a rough version of the session a few or at least one week in advance to have time to properly prepare the session1.

The usual process to land a presentation at a conference is (if you are not some big shot invited speaker) to submit a session proposal to the Call for Participation. After a while you get a notice of acceptance or rejection and if accepted you write your paper and prepare your presentation. Finally you attend the conference and give your talk. Sounds simple but there are a few things that go wrong a lot or at least don’t work ideally.

Now it’s obvious that different conferences have a very different amount of resources to commit to preparing the program/schedule: Some conferences are very expensive, with goodie bags for attendants and a lot of flashy tech and whatever. Some run on almost no budget with all required time and work provided by voluntaries2. From my experience the level of professionalism in how presenters are prepared does in no way correlate to the amount of money available. But since not every conference can provide everything I write down here I’ll separate this list in two blocks: Mandatory and Desirable.

Mandatory

These are the things that I find completely necessary in order to deliver a presentation that actually fits into the conference.

Write a clear and somewhat specific CfP

Writing a CfP is hard: You don’t want to narrow the scope too much in order to get new and exciting ideas and projects in while still being clear enough on what it is your conference  if about and trying to achieve. I feel that some CfPs err on the side of being less specific which makes it very hard to write a proposal that concisely describes a session. Add direct questions you’d like to see answers to or discussion about at the conference. Also provide examples on what you do not care about. Help potential submitters to understand what you really want: It helps to avoid a lot of wasted work for conference organizers as well as submitters. Obviously you can always add a “catchall” but having precise questions helps to see if that weird and spacey idea really fits or not.

Be transparent about the criteria for acceptance

At some point you will have to make a decision on who to accept and who not to. Those decisions are hard and more often than not you will have to reject proposals that you’d have liked to have because there just isn’t enough room. But you also will have to decide based on whatever you consider to be quality. But proposal quality is not an objective metric. Some people look at the person presenting and if they have experience or a big name, others don’t. Some value implementation or hard data a lot, others like to see a certain familiarity with certain technical terms. Be transparent on what constitutes a “good” proposal for your team.

Deadlines aren’t enough, also commit to a schedule

Every CfP comes with a deadline. Get your proposal in in time or you’re out. That’s fair. Where some conferences have a harder time is to commit to a schedule for when notifications of acceptance will be send to the presenters. “At the end of May” or “a few weeks later” might sound specific enough while giving the organizers enough room to take extra time to evaluate proposals, it can be very problematic for presenters though: You have to keep your schedule open during the conference, you might have to book accommodation, organize a place for your cat or your kids to stay. The more embedded you are in a dependency network of jobs and family etc. “just waiting” can be actually very damaging and problematic. As organization try to commit to a plan of when people will get their responses so presenters know how long to keep their schedule open. Deadlines sometimes can’t be held and that’s fine, be transparent and postpone acceptions/rejections for a week or two, but keep submitters in the loop.

Brief your presenters

The weeks before the actual conference are always full with all kinds of things that need to happen before the event can actually take place. Which is why, after having send out acceptances/rejections, often there is some radio silence between the conference and the presenters. There’s usually another email before the event telling presenters what kind of equipment they can expect to be present and other technical and organizational details (schedule, file format for presentation slides etc), but they often lack a briefing for the presenters about what kind of social environment to expect. What kind of people will probably be there, what age, which level of expertise, which backgrounds, which experiences will they probably have? What kind of language should the presentation use, should jargon be avoided or used? What’s the “culture” of the scene attending the conf3? It’s hard to guess especially when a presenter is attending an event for the fist time. What would you like your presenters to do? Speak for the whole time slot (maybe with a bunch of questions afterwards)? Generate an actual debate/discussion? Tell presenters what to expect and what you expect to help them build a better presentation for that specific situation.

Desirable

Certain things are desirable but can’t always be done by every conference. Especially smaller, volunteer-run conferences can’t always do this but I still thought I’d mention them.

Feedback for Proposals

Building a schedule and deciding what to put in it is hard, especially saying “no”, but that’s just part of the job. There are many reasons for a proposal being rejected, no all of them very pleasant to the submitter or the organization. Still I’d wish to get some sort of specific feedback instead of a blanket statement like “yeah we had so much good stuff, we#d love to have taken it all but you just weren’t lucky”. There are things to help submitters to put forward a better presentation proposal the next time without being insulting: “You proposal wasn’t clear and specific enough, we couldn’t understand what your point was”, “We felt your proposal was good but didn’t fit into this conference because of $reason. Maybe try $otherConf?”, “We already filled our quota of white dudes. Your presentation proposal was fine, the others by white males were just better for this conference, so better luck next time!”. Help people understand your reasoning, not by sending out 2 pages of comments on a 500 word proposal, but by giving an actual reason that’s not just a feel-good-marketing term. Still: If possible stay polite, especially if it’s people’s first time very negative feedback can be devastating.

Describe dress code

This one might sound weird but it’s been a problem for me a few times already: Tell presenters about the dress code at your conference. Yeah “there’s none”. Sure. That just means that there is no explicit but an implicit code. Especially when you haven’t been at an event before it’s sometimes really hard to estimate what kind of clothing is expected and being overdressed can be a problem just as being underdressed. Give presenters a description of the spectrum of acceptable clothing especially when you are not sure if they have the same background as the other people at the event.

Drop the motto if it doesn’t mean anything

Oh conference mottos, how I hate thee. At some point many conferences, even those not run by marketing people, adopted the policy of having a different motto each year. These mottos are supposed to represent the vibe, the general topic or impetus of a conference for that specific year. But mostly they don’t mean anything. Which you realize when looking at the program which really has nothing to do with it.

Trying to write a good proposal for a conference is hard so I try to understand the conference as good as possible before deciding how to present a proposal or an idea. So if there is a motto, I try to find something connected to it to put forward my idea. It actually makes it a lot harder to present your session because you have to look at it through the motto lense. And it really sucks when you see afterwards that all that work was for nothing. If you can’t drop the motto, put somewhere in the CfP that it’s not really relevant for the conference to save everybody some time and work.


These are all things that I’d like more conferences to do strictly speaking from a presenter’s perspective. But conferences as a technology, as a thing have many more stakeholders and getting all their needs met especially with limited resources is far from easy. I have a few thoughts on the general format dominant at conferences (a hour of someone lecturing) that’s far from the silver bullet it’s often made out to be but that might be for a different post.

Also: These are just a bunch of notes I wrote down and lack some editing etc. but sadly I am very pressed for time right now so I apologize again for this somewhat random stream of thoughts.

Photo by Cydcor

  1. Having the presentation look as if the presenter(s) are seeing it the first time and “improvise” actually infuriates me as an attendant. It’s the most obvious signal of disrespect for your audience’s time you can send.
  2. We could go very deep into the whole self-exploitation angle of the conference circus here – especially when it comes to some hacker/tech/internet conferences but we’ll leave that for another time
  3. At ttw15 for example some people read their pre-written presentation from paper. On the conferences in the tech context I usually go to, that would be a huge no-no, but certain domains seem to do it that way.

Flattr this!

Posts for Thursday, April 30, 2015

avatar

Use change management when you are using SELinux to its fullest

If you are using SELinux on production systems (with which I mean systems that you offer services with towards customers or other parties beyond you, yourself and your ego), please consider proper change management if you don’t do already. SELinux is a very sensitive security subsystem – not in the sense that it easily fails, but because it is very fine-grained and as such can easily stop applications from running when their behavior changes just a tiny bit.

Sensitivity of SELinux

SELinux is a wonderful security measure for Linux systems that can prevent successful exploitation of vulnerabilities or misconfigurations. Of course, it is not the sole security measure that systems should take. Proper secure configuration of services, least privilege accounts, kernel-level mitigations such as grSecurity and more are other measures that certainly need to be taken if you really find system security to be a worthy goal to attain. But I’m not going to talk about those others right now. What I am going to focus on is SELinux, and how sensitive it is to changes.

An important functionality of SELinux to understand is that it segregates the security control system itself (the SELinux subsystem) from its configuration (the policy). The security control system itself is relatively small, and focuses on enforcement of the policy and logging (either because the policy asks to log something, or because something is prevented, or because an error occurred). The most difficult part of handling SELinux on a system is not enabling or interacting with it. No, it is its policy.

The policy is also what makes SELinux so darn sensitive for small system changes (or behavior that is not either normal, or at least not allowed through the existing policy). Let me explain with a small situation that I recently had.

Case in point: Switching an IP address

A case that beautifully shows how sensitive SELinux can be is an IP address change. My systems all obtain their IP address (at least for IPv4) from a DHCP system. This is of course acceptable behavior as otherwise my systems would never be able to boot up successfully anyway. The SELinux policy that I run also allows this without any hindrance. So that was not a problem.

Yet recently I had to switch an IP address for a system in production. All the services I run are set up in a dual-active mode, so I started with the change by draining the services to the second system, shutting down the service and then (after reconfiguring the DHCP system to now provide a different IP address) reload the network configuration. And then it happened – the DHCP client just stalled.

As the change failed, I updated the DHCP system again to deliver the old IP address and then reloaded the network configuration on the client. Again, it failed. Dumbstruck, I looked at the AVC denials and lo and behold, I notice a dig process running in a DHCP client related domain that is trying to do UDP binds, which the policy (at that time) did not allow. But why now suddenly, after all – this system was running happily for more than a year already (and with occasional reboots for kernel updates).

I won’t bore you with the investigation. It boils down to the fact that the DHCP client detected a change compared to previous startups, and was configured to run a few hooks as additional steps in the IP lease setup. As these hooks were never ran previously, the policy was never challenged to face this. And since the address change occurred a revert to the previous situation didn’t work either (as its previous state information was already deleted).

I was able to revert the client (which is a virtual guest in KVM) to the situation right before the change (thank you savevm and loadvm functionality) so that I could work on the policy first in a non-production environment so that the next change attempt was successful.

Change management

The previous situation might be “solved” by temporarily putting the DHCP client domain in permissive mode just for the change and then back. But that is ignoring the issue, and unless you have perfect operational documentation that you always read before making system or configuration changes, I doubt that you’ll remember this for the next time.

The case is also a good example on the sensitivity of SELinux. It is not just when software is being upgraded. Every change (be it in configuration, behavior or operational activity) might result in a situation that is new for the loaded SELinux policy. As the default action in SELinux is to deny everything, this will result in unexpected results on the system. Sometimes very visible (no IP address obtained), sometimes hidden behind some weird behavior (hostname correctly set but not the domainname) or perhaps not even noticed until far later. Compare it to the firewall rule configurations: you might be able to easily confirm that standard flows are still passed through, but how are you certain that fallback flows or one-in-a-month connection setups are not suddenly prevented from happening.

A somewhat better solution than just temporarily disabling SELinux access controls for a domain is to look into proper change management. Whenever a change has to be done, make sure that you

  • can easily revert the change back to the previous situation (backups!)
  • have tested the change on a non-vital (preproduction) system first

These two principles are pretty vital when you are serious about using SELinux in production. I’m not talking about a system that hardly has any fine-grained policies, like where most of the system’s services are running in “unconfined” domains (although that’s still better than not running with SELinux at all), but where you are truly trying to put a least privilege policy in place for all processes and services.

Being able to revert a change allows you to quickly get a service up and running again so that customers are not affected by the change (and potential issues) for long time. First fix the service, then fix the problem. If you are an engineer like me, you might rather focus on the problem (and a permanent, correct solution) first. But that’s wrong – always first make sure that the customers are not affected by it. Revert and put the service back up, and then investigate so that the next change attempt will not go wrong anymore.

Having a multi-master setup might give some more leeway into investigating issues (as the service itself is not disrupted) so in the case mentioned above I would probably have tried fixing the issue immediately anyway if it wasn’t policy-based. But most users do not have truly multi-master service setups.

Being able to test (and retest) changes in non-production also allows you to focus on automation (so that changes can be done faster and in a repeated, predictable and qualitative manner), regression testing as well as change accumulation testing.

You don’t have time for that?

Be honest with yourself. If you support services for others (be it in a paid-for manner or because you support an organization in your free time) you’ll quickly learn that service availability is one of the most qualitative aspects of what you do. No matter what mess is behind it, most users don’t see all that. All they see is the service itself (and its performance / features). If a change you wanted to make made a service unavailable for hours, users will notice. And if the change wasn’t communicated up front or it is the n-th time that this downtime occurs, they will start asking questions you rather not hear.

Using a non-production environment is not that much of an issue if the infrastructure you work with supports bare metal restores, or snapshot/cloning (in case of VMs). After doing those a couple of times, you’ll easily find that you can create a non-production environment from the production one. Or, you can go for a permanent non-production environment (although you’ll need to take care that this environment is at all times representative for the production systems).

And regarding qualitative changes, I really recommend to use a configuration management system. I recently switched from Puppet to Saltstack and have yet to use the latter to its fullest set (most of what I do is still scripted), but it is growing on me and I’m pretty convinced that I’ll have the majority of my change management scripts removed by the end of this year towards Saltstack-based configurations. And that’ll allow me to automate changes and thus provide a more qualitative service offering.

With SELinux, of course.

Posts for Monday, April 27, 2015

avatar

Moving closer to 2.4 stabilization

The SELinux userspace project has released version 2.4 in february this year, after release candidates have been tested for half a year. After its release, we at the Gentoo Hardened project have been working hard to integrate it within Gentoo. This effort has been made a bit more difficult due to the migration of the policy store from one location to another while at the same time switching to HLL- and CIL based builds.

Lately, 2.4 itself has been pretty stable, and we’re focusing on the proper migration from 2.3 to 2.4. The SELinux policy has been adjusted to allow the migrations to work, and a few final fixes are being tested so that we can safely transition our stable users from 2.3 to 2.4. Hopefully we’ll be able to stabilize the userspace this month or beginning of next month.

avatar

Practical fault detection on timeseries part 2: first macros and templates

In the previous fault detection article, we saw how we can cover a lot of ground in fault detection with simple methods and technology that is available today. It had an example of a simple but effective approach to find sudden spikes (peaks and drops) within fluctuating time series. This post explains the continuation of that work and provides you the means to implement this yourself with minimal effort. I'm sharing with you:
  • Bosun macros which detect our most common not-trivially-detectable symptoms of problems
  • Bosun notification template which provides a decent amount of information
  • Grafana and Graph-Explorer dashboards and integration for further troubleshooting
We reuse this stuff for a variety of cases where the data behaves similarly and I suspect that you will be able to apply this to a bunch of your monitoring targets as well.

Target use case

As in the previous article, we focus on the specific category of timeseries metrics driven by user activity. Those series are expected to fluctuate in at least some kind of (usually daily) pattern, but is expected to have a certain smoothness to it. Think web requests per second or uploads per minute. There are a few characteristics that are considered faulty or at least worth our attention:
looks good
consistent pattern
consistent smoothness
sudden deviation (spike)
Almost always something broke or choked.
could also be pointing up. ~ peaks and valleys
increased erraticness
Sometimes natural
often result of performance issues
lower values than usual (in the third cycle)
Often caused by changes in code or config, sometimes innocent. But best to alert operator in any case [*]

[*] Note that some regular patterns can look like this as well. For example weekend traffic lower than weekdays, etc. We see this a lot.
The illustrations don't portray this for simplicity. But the alerting logic below supports this just fine by comparing to same day last week instead of yesterday, etc.

Introducing the new approach

The previous article demonstrated using graphite to compute standard deviation. This let us alert on the erraticness of the series in general and as a particularly interesting side-effect, on spikes up and down. The new approach is more refined and concrete by leveraging some of bosun's and Grafana's strengths. We can't always detect the last case above via erraticness checking (a lower amount may be introduced gradually, not via a sudden drop) so now we monitor for that as well, covering all cases above. We use
  • Bosun macros which encapsulate all the querying and processing
  • Bosun template for notifications
  • A generic Grafana dashboard which aids in troubleshooting
We can then leverage this for various use cases, as long as the expectations of the data are as outlined above. We use this for web traffic, volume of log messages, uploads, telemetry traffic, etc. For each case we simply define the graphite queries and some parameters and leverage the existing mentioned Bosun and Grafana configuration.

The best way to introduce this is probably by showing how a notification looks like:


(image redacted to hide confidential information
the numbers are not accurate and for demonstration purposes only)

As you can tell by the sections, we look at some global data (for example "all web traffic", "all log messages", etc), and also by data segregated by a particular dimension (for example web traffic by country, log messages by key, etc)
To cover all problematic cases outlined above, we do 3 different checks: (note, everything is parametrized so you can tune it, see further down)

  • Global volume: comparing the median value of the last 60 minutes or so against the corresponding 60 minutes last week and expressing it as a "strength ratio". Anything below a given threshold such as 0.8 is alerted on
  • Global erraticness. To find all forms of erraticness (increased deviation), we use a refined formula. See details below. A graph of the input data is included so you can visually verify the series
  • On the segregated data: compare current (hour or so) median against median derived from the corresponding hours during the past few weeks, and only allow a certain amount of standard deviations difference
If any, or multiple of these conditions are in warning or critical state, we get 1 alert that gives us all the information we need.
Note the various links to GE (Graph-Explorer) and Grafana for timeshifts. The Graph-Explorer links are just standard GEQL queries, I usually use this if i want to be easily manage what I'm viewing (compare against other countries, adjust time interval, etc) because that's what GE is really good at. The timeshift view is a Grafana dashboard that takes in a Graphite expression as a template variable, and can hence be set via a GET parameter by using the url
http://grafana/#/dashboard/db/templatetimeshift?var-patt=expression
It shows the current past week as red dots, and the past weeks before that as timeshifts in various shades of blue representing the age of the data. (darker is older).


This allows us to easily spot when traffic becomes too low, overly erratic, etc as this example shows:


Getting started

Note: I Won't explain the details of the bosun configuration. Familiarity with bosun is assumed. The bosun documentation is pretty complete.

Gist with bosun macro, template, example use, and Grafana dashboard definition. Load the bosun stuff in your bosun.conf and import the dashboard in Grafana.

The pieces fit together like so:
  • The alert is where we define the graphite queries, the name of the dimension segregated by (used in template), how long the periods are, what the various thresholds are and the expressions to be fed into Grafana and Graph-Explorer.
    It also lets you set an importance which controls the sorting of the segregated entries in the notification (see screenshot). By default it is based on the historical median of the values but you could override this. For example for a particular alert we maintain a lookup table with custom importance values.
  • The macros are split in two:
    1. dm-load loads all the initial data based on your queries and computes a bunch of the numbers.
    2. dm-logic does some final computations and evaluates the warning and critical state expressions.
    They are split so that your alerting rule can leverage the returned tags from the queries in dm-load to use a lookup table to set the importance variable or other thresholds, such as s_min_med_diff on a case-by-case basis, before calling dm-logic.
    We warn if one or more segregated items didn't meet their median requirements, and if erraticness exceeds its threshold (note that the latter can be disabled).
    Critical is when more than the specified number of segregated items didn't meet their median requirements, the global volume didn't meet the strength ratio, or if erraticness is enabled and above the critical threshold.
  • The template is evaluated and generates the notification like shown above
  • Links to Grafana (timeshift) and GE are generated in the notification to make it easy to start troubleshooting

Erraticness formula refinements

You may notice that the formula has changed to
(deviation-now * median-historical) /
((deviation-historical * median-now) + 0.01)
  • Current deviation is compared to an automatically chosen historical deviation value (so no more need to manually set this)
  • Accounts for difference in volume: for example if traffic at any point is much higher, we can also expect the deviation to be higher. With the previous formula we would have cases where in the past the numbers were very low, and naturally the deviation then was low and not a reasonable standard to be held against when traffic is higher, resulting in trigger happy alerting with false positives.
    Now we give a fair weight to the deviation ratio by making it inversely proportional to the median ratio
  • The + 0.01 is to avoid division by zero

Still far from perfect

While this has been very helpful to us, I want to highlight a few things that could be improved.

In conclusion

I know many people are struggling with poor alerting rules (static thresholds?)
As I explained in the previous article I fondly believe that the commonly cited solutions (anomaly detection via machine learning) are a very difficult endeavor and results can be achieved much quicker and simpler.
While this only focuses on one class of timeseries (it won't work on diskspace metrics for example) I found this class to be in the most dire need of better fault detection. Hopefully this is useful to you. Good luck and let me know how it goes!

Posts for Tuesday, April 7, 2015

Smartwatches and the “veil of the analogue”

When I was about 5 my father gave me my first watch. It was very cool, had a colorful watch band and had a football player1 on its watch face. And it was a manual watch that I had to wind up every week which I found utterly fascinating: Just a few turns of the small wheel and it “magically” worked for a long time (when you’re 4 or 5 a week is long!).

I don’t know where that watch ended up or how long I used it. I went through some more or less cheap ass digital watches during my time in school. Ugly, plasticy devices that had one thing going for them: They were digital. Precise. You got the exact time with a short glance and not just a rough estimation. Which felt “better” in a weird way, like the future as it was shown in the 70ies SciFi movies, with clear and precise, sometimes curved lines. It felt right in a way. A little bit of the promise of a more developed future. Star Trek on my wrist. In some small and maybe not even meaningful way it felt powerful. As if that quartz driven precision would somehow make the constant flow of time more manageable.

When I went to study computer science I always had some form of cheap watch on my arm to tell the time since I refused to get a cell phone for the longest time (for a bunch of mostly stupid and pretentious reasons). But when some software development gig that me and a friend worked on finished with a bonus payment I got a “real” watch.

eebef7133cff4735e31b72e974cd2833

 

I wore that thing for a few years and loved it. Not just because it reminded me of Gir from Invader Zim (even though that was a big part of it). The fallback to an analog watch felt like a nice contrast to the digital spaces I spend more and more time in. The two watch faces were so small that you could hardly read a precise time on them and it had two of them. It was a strangely out-of-time anchor to the past while I dove head on into what I perceived to be “the future“.

A cellphone came, then a feature phone and finally smartphones and at some point I stopped replacing the batteries in my watch. I carried my phone around with me anyways so why add another device to the mix that had only that one feature? It felt pointless especially with me being sort of the prototypical nerd back then explicitly not “caring” about “looks and style” and all that stuff while sticking very closely to the codices and fashions of that subculture I identified with back then. If you gather from this that I was probably just another insecure dude in the beginning of his twenties a little too full of his own bullshit you would probably be correct.

But about 6 months ago things changed and I got another watch. A “smart” one even (we’ll come back to that word later). Here’s some kind of “review” or a summary of the things I learned from wearing one for those few months. But don’t be afraid, this isn’t a techbloggy text. Nobody cares about pixel counts and the megahertzs of whatever core processors. Given the state of the art the answer to most inquiries about features and performance is usually “enough” or “more than enough”. It’s also not about the different software platforms and if Google’s is better than Apple’s (which very few people have even used for more than a few minutes if at all) because given the relative newness of the form factor and class of device it will look very different in a few years anyways. And for most interesting questions about the devices the technical configurations just don’t matter regardless of what the producer’s advertising is trying to tell you.

I’ll write about how a smartwatch can change your relationship to your phone, your interactions with people and how smartwatches cast a light on where tech might be in a few years. Not from a business, startup, crowdfunding perspective but as a few loosely connected thoughts on humans and their personal devices as sociotechnical systems.

I should also add a short disclaimer. I understand that me posting about jewelry and rather expensive devices could be read as classist. I live a very privileged life with – at least at the moment – enough income to fund frivolities such as a way more expensive than necessary smart watch, a smart phone etc. but I do believe that my experience with these device can help understanding and modeling different uses of technology, their social context and their meaning. These devices will grow cheaper including more and more people (at least in certain parts of the world). But I am aware of the slightly messed up position I write this from.

Let’s dive in (finally!).

So. What is a smartwatch, really?

Wearables are a very hot topic right now2. Many wearables are … well let’s not say stupid, let’s call them very narrowly focused. Step counters are very popular these days for example turning your daily movement into a few numbers to store and compare and optimize. Some of these things try to measure heart rates similar low hanging fruits as well.

On the other end of the spectrum we have our smart phones and tablet computers or maybe even laptops which we carry around each day. Many might not consider their phone a wearable because it resides in your pocket or satchel but in the end it is more than just some object you schlep around all day but – for many if not most of us – an integral piece of our mental exoskeleton. Just ask people whose phone needs a repair longer than a few hours.

Smartwatches are somewhere between those two extremes. Many modern examples of this class of devices include a few of the sensors typically associated with dumb wearables: A heartrate monitor or a pedometer (fancytalk for step counter) for example. But smart watches can do more can install apps and provide features that make them feel very capable … unless you forget your phone.

Because in the end a smartwatch is just a different view into your smartphone. A smaller screen attached to a somewhat more convenient location of your body. Sure, there are great apps for smartwatches. I got one that makes my hand movements give me Jedi force powers turning certain movements into configurable actions. Another app is just a very simple recorder for audio memos. There are calculators, dice rolling apps and many more but their usefulness is usually very limited. No, let’s again say focused. And that is a good thing.

Without a phone connected my watch falls back to one of its dumbest and surprisingly most useful feature: It shows me the time and my agenda.

You can imagine the sort of look that my wife gave me when I proclaimed this fundamental understanding of my new plaything. “So the killer feature of that smart device is showing the time? ” she asked jokingly. But it’s a little more complex. My smartwatch allows me to check certain things (time, agenda, certain notifications from apps I authorized) without picking up my phone which can – and all too often does – pull your attention in like a black hole. You just wanted to check the time but there’s this notification and someone retweeted you and what’s going on on Facebook … what was I supposed to do here?

I’m not a critic of our networked culture in general, not one of the neo-luddites trying to frame being “offline” in some way as the better or more fulfilling mode of life. But the simplicity that the small form factor and screen size of smartwatches enforces, the reduction to very simple interactions can help to stay focused when you want to.

Most apps on my watch are mere extensions of the apps running on my phone. And that’s actually precisely what I want and not the drawback it’s sometimes made out to be. I get a message pushed to my wrist and can react on it with a few prepared response templates or by using voice recognition (with all the funny problems that come with it). But again: I can stay focused on whatever I am doing now (such as riding my bike in traffic) while still being able to tell the person I am meeting that I’ll be there in 5 minutes. The app I use to record my running shows me certain stats on my wrist, I can switch to the next podcast or music track in my queue while still keeping my attention on the road.

I don’t know when I printed my last map screenshot or route description. When smartphones and navigation became widely available there was not longer the need to reduce a place you were going to to a set of a handful of predefined railways to created for yourself in order to  not get lost. You drop out of the train or plane and the unknown city opens up to you like the inviting and fascinating place it probably is. You can just start walking since you know that even if you don’t speak the language perfectly you’ll find your way. My smartwatch does that while allowing me to keep my phone in my pocket allowing me to look less like a tourist or target. When the little black thing on my arm vibrates I check it to see where to turn, apart from that I just keep walking.

Sure, there will be apps coming that use these watches in more creative and useful ways. That thrive not in spite of but because of the tiny form factor. But that’s mostly a bonus and if it doesn’t happen I’d be fine as well. Because the watch as a simplified, ultra-reduced, ultra-focused remote to my mobile digital brain is feature enough. Where digital watches used to give me an intangible feeling of control of time the smart-ish watch does actually help me feel augmented by my devices in a way that doesn’t try to capture as much of my attention as smartphones tend to do. The watch is not a smaller smartphone but your phone’s little helper. The small and agile Robin to the somewhat clunky Batman in your pocket3.

Acceptance

Any new technology has to carve out its niche and fight for acceptance. And some don’t and die for a plethora of reasons (Minidisk, I always liked you). There are many reasons why people, mostly “experts” of some sort, don’t believe that smartwatches will gain any traction.

You have to recharge them every night, my watch runs for weeks, months, years!” Yeah. And on Tuesday it’s darker than at night. Oh, we weren’t doing the whole wrong comparison thing? Damn. Just as people learned to charge their phones every night they’ll get used to throwing their watch on a charger at night. My watch gets charged wirelessly with a Qi standard charger that sets you back about 10 bucks. It’s a non-issue.

But it doesn’t do a lot without a phone! It needs its own camera, internet connection, coffee maker and washing machine!” Nope. Simplicity and reduction is what makes that class of devices interesting and useful. I don’t need a half-assed smartphone on my arm when I have a good one in my pocket. I need something that helps my use my actual device better. Another device means all kinds of annoyances. Just think about synchronization of data.

I am in the lucky position not to have to deal with tech writers and pundits in all of the facets of my live. What I learned from interacting with non-techy people and the watch is actually not that surprising if you think about it: A smart watch is a lot less irritating and invasive than a smart phone.

There are friends where I know I can just look at my phone while we hang out and they’ll not consider it an insult or affront. They might enjoy the break from talking, might want to check a few things themselves or just relax for a second without having to entertain me or the others in the room. But not everybody feels that way (and why should they, it’s not like submerging yourself in the digital is the only right way to live). In those situations the look at the watch is an established and accepted practice mostly unless you check your watch every minute.

Some tech people tend to ignore the social. They might try to press it into services and data but often seem to overlook any sort of information a social act transports apart from the obvious. In pre-digital worlds checking your watch every few minutes was sometimes considered rude or would be read as a signal to leave your hosts etc. But where the glance at the watch is merely the acknowledgement of the existence of time and a world outside of the current situation, getting out your smartphone puts the outside world into focus making the people you share a physical space with just a blur in the corner of your vision.

Of course it’s your right to check your phone whenever you want just as people can be insulted or at least irritated by it. The way a smart watch can serve as a proxy of your access to your digital identity and network from your physical location and context it can help you communicate that you value the moment without feeling disconnected. Especially since neither being very digitally connected nor valuing physical meetings more highly is “better”, having this sort of a reduced stub of the digital that closely on you can serve as a good compromise for these situations.

A smartwatch is accepted because it is a watch. And we as a culture know watches. Sure, some very techy, clunky, funky looking devices break that “veil of the analogue” by screaming “I AM TECHNOLOGY, FEAR ME” through their design. But the more simple versions that avoid the plasticy look of Casio watches on LSD are often even overlooked and not even perceived as technology (and therefore as an irritation or even dangerous) by people who are sceptical of technology. That’s the problem devices such as Google’s Glass project have who also have very interesting and potentially beneficial use cases but look so undeniably alien that everyone expects a laser gun to appear. And that’s where smart watches embed themselves into existing social norms and practices. by looking like the past and not screaming FUTURE all too loud.

Body Area Network and the Future

What does this mean for the Future(tm)?  The idea of the Body Area Network and the Personal Area Network do already exist: We are more and more turning into digital cyborgs4 creating our own personal “cloud” and network of data and services along the axes of and around our physical bodies.

Right now Smartphones seem to be some sort of Hub we carry around. The little monolith containing our data, internet access and main mobile interface to our digital self. Other devices connect to the hub, exchange data and use the services it provides (such as Internet connectivity or a camera). But looking at things like Google’s project Ara a different idea emerges.

Ara is a module smartphone platform that allows you to add, remove and change the hardware modules of your phone at runtime. While it’s mostly framed as a way for people to buy their devices in parts upgrading it when the personal financial situation allows it the modular approach also has different trajectories influencing how our BANs and PANs might look in a few years.

Changing a phone can be annoying and/or time consuming. The backup software might have failed or forgotten something valuable. Maybe an app isn’t available on the new system or the newer version is incompatible to the data structure the old version left in your backup. We suffer through it because many of us rely on our personal information hubs making us potentially more capable (or at least giving us the feeling of being that).

Understanding smart watches as reduced, minimal, simplified interfaces to our data, looking at wearables as very specific data gathering or displaying devices it seems to make sense to centralize your data on one device that your other devices just connect to. These days we work around that issue with tools such as Dropbox and other similar cloud sync services trying to keep all our devices up-to-data and sometimes failing horribly. But what if every new device just integrates into your BAN/PAN, connects to your data store and either contributes to it or gives you a different view on it? In that world wearables could become even “dumber” while still appearing to the user very “smart” (and we know that to the user, the interface is the product).

The smartphones that we use are built with healthy people in mind with nimble fingers and good eyesight. Smart watches illustrate quite effectively that the idea of the one device for every situation has overstayed its welcome somewhat. That different social or even personal circumstances require or or benefit from different styles and types of interfaces. Making it easier for people to find the right interfaces for their needs, for the situations they find themselves in will be the challenge of  the next few years. Watches might not always look like something we’d call a watch today. Maybe they’ll evolve into gloves, or just rings. Maybe the piercing some wear in their upper lip will contain an antenna to amplify the connectivity of the BAN/PAN.

Where Ara tries making phones more modular, wearables – when done right – show that we can benefit a lot from modularizing the mobile access to our digital self. Which will create new subtle but powerful signals: Leaving certain types of interfaces at home or disabled on the table to communicate an ephemeral quality of a situation, only using interfaces focused on shared experience of the self and the other when being with another person creating a new kind of intimacy.

Comedown

But right now it’s just a watch. With some extras. Useful extras though. You wouldn’t believe how often the app projecting me the video from my smartphone camera to my wrist has been useful to find something that has fallen behind the furniture. But none of them really, honestly legitimizing the price of the devices.

But the price will fall and new wearables will pop up. If you have the opportunity, try them out for a while. Not by fiddling around on a tiny display playing around with flashy but ultimately useless apps but by integrating them into your day for a few weeks. Don’t believe any review written with less than a few weeks of actual use.

  1. Football in the meaning most of the world uses it. The one where you play by kicking a ball around into goals.
  2. It’s so hot that my bullshit-o-meter has reached a new peak while reading the term “wearable clothing” somewhere recently.
  3. that sounded dirtier that it was supposed to
  4. we have always been cyborgs, beings combined from biology and culture and technology so that isn’t actually surprising

Flattr this!

Posts for Saturday, April 4, 2015

Paludis 2.4.0 Released

Paludis 2.4.0 has been released:

  • Bug fixes.
  • We now use Ruby 2.2, unless –with-ruby-version is specified.

Filed under: paludis releases

Posts for Sunday, March 29, 2015

A wise choice? Github as infrastructure

So more and more projects are using github as infrastructure. One of the biggest cases I’ve seen is the Go programming language which allows you to specify “imports” directly hosted on code sharing sites like github and “go get” to get them all before compilation, but also lots of other projects are adopting it like Vim’s Vundle plugin manage which also allows fetching and updating of plugins directly from github. Also I wouldn’t be surprised if one or more other languages’ package managers from pip to npm do this too. I know it’s pretty easy and now cool to do this but…

It isn’t actually infrastructure grade. And that is hilighted well in event’s like this week when they are suffering continuals outages from a massive DDOS attack that some news sources are suspecting might be nation-state based.

How much fun is your ops having deploying your new service when half it’s dependencies are being pulled directly from github which is unavailable? Bit of a strange blocker hm?

Posts for Wednesday, March 25, 2015

HowTo: Permanently redirect a request with parameter consideration in Tengine/NginX

Well, this one gave me a super hard time. I looked everywhere and found nothing. There is a lot of misinformation.

As usual, the Nginx and Funtoo communities helped me. Thanks to:

  • MTecknology in #nginx @ Freenode
  • Tracerneo in #funtoo @ Freenode

So, how do we do this? Easy, we use a map:

    # get ready for long redirects
    map_hash_bucket_size 256;
    map_hash_max_size 4092;

    # create the map
    map $request_uri $newuri {
        default 0;

        /index.php?test=1 /yes;
        /index.php?test=2 https://google.com/;
    }

    server {
        listen *;
        server_name test.php.g02.org;
        root /srv/www/php/test/public;

        # permanent redirect
        if ($newuri) {
            return 301 $newuri;
        }


        index index.php index.html;
        autoindex on;

        include include.d/php.conf;

        access_log /var/log/tengine/php-access.log;
        error_log /var/log/tengine/php-error.log;
    }

So, basically, you want to use the $request_uri in order to catch the the uri with it’s parameters. I wasted all day figuring out why $uri didn’t have this. It turns out it discards the parameters… anyway.

This one was a hard one to find. Please, share and improve!

References

Posts for Friday, March 6, 2015

avatar

Trying out Pelican, part one

One of the goals I’ve set myself to do this year (not as a new year resolution though, I *really* want to accomplish this ;-) is to move my blog from WordPress to a statically built website. And Pelican looks to be a good solution to do so. It’s based on Python, which is readily available and supported on Gentoo, and is quite readable. Also, it looks to be very active in development and support. And also: it supports taking data from an existing WordPress installation, so that none of the posts are lost (with some rounding error that’s inherit to such migrations of course).

Before getting Pelican ready (which is available through Gentoo btw) I also needed to install pandoc, and that became more troublesome than expected. While installing pandoc I got hit by its massive amount of dependencies towards dev-haskell/* packages, and many of those packages really failed to install. It does some internal dependency checking and fails, informing me to run haskell-updater. Sadly, multiple re-runs of said command did not resolve the issue. In fact, it wasn’t until I hit a forum post about the same issue that a first step to a working solution was found.

It turns out that the ~arch versions of the haskell packages are better working. So I enabled dev-haskell/* in my package.accept_keywords file. And then started updating the packages… which also failed. Then I ran haskell-updater multiple times, but that also failed. After a while, I had to run the following set of commands (in random order) just to get everything to build fine:

~# emerge -u $(qlist -IC dev-haskell) --keep-going
~# for n in $(qlist -IC dev-haskell); do emerge -u $n; done

It took quite some reruns, but it finally got through. I never thought I had this much Haskell-related packages installed on my system (89 packages here to be exact), as I never intended to do any Haskell development since I left the university. Still, I finally got pandoc to work. So, on to the migration of my WordPress site… I thought.

This is a good time to ask for stabilization requests (I’ll look into it myself as well of course) but also to see if you can help out our arch testing teams to support the stabilization requests on Gentoo! We need you!

I started with the official docs on importing. Looks promising, but it didn’t turn out too well for me. Importing was okay, but then immediately building the site again resulted in issues about wrong arguments (file names being interpreted as an argument name or function when an underscore was used) and interpretation of code inside the posts. Then I found Jason Antman’s converting wordpress posts to pelican markdown post to inform me I had to try using markdown instead of restructured text. And lo and behold – that’s much better.

The first builds look promising. Of all the posts that I made on WordPress, only one gives a build failure. The next thing to investigate is theming, as well as seeing how good the migration goes (it isn’t because there are no errors otherwise that the migration is successful of course) so that I know how much manual labor I have to take into consideration when I finally switch (right now, I’m still running WordPress).

Posts for Tuesday, March 3, 2015

Rejected session proposals for republica #rp5

I submitted two talk proposals to this year’s re:publica conference which got both rejected. If you have another conference where you think they might fit in drop me an email.


The Ethics of (Not-)Sharing

Like this, share this, tweet this. Our web tech is focussed on making things explode: More clicks, more ads, more attention. But recent events have shown that sharing isn’t always good or even wanted. But how to decide when sharing is cool and when it isn’t? This talk will dive into those questions and explain why context matters and why “RT≠Endorsement” is bullshit. Sharing is a political act.
The digital economy, or at least big parts of it, have been characterized as the attention economy: People and companies exchanging goods and services for other people’s attention (usually to translate said attention into money through ads).

In an attention economy the act of sharing is the key to ongoing growth: You need more reach, more follower and likes for your product so getting people to share is paramount in order to raise sales or subscriptions.

But given how the platforms for social interactions used by most people are built the same applies to people and their relationships and posts. Facebook, Google, Twitter, Tumblr, no matter what platform you turn to, the share button is god. Sharing means not just caring but is very often one of the most prominently placed functions on any given site.

And who wouldn’t? Sharing is nice, it’s what we teach our kids. Sharing as a method to spread culture, to give access to resources, to make someone’s voice heard? Who could have anything against sharing? Sharing has become so big and important, that the term itself has been used to whitewash a new kind of business model that is really not about sharing at all, the “Sharing Economy” of Uber and Airbnb.

In light of the recent terror videos of beheadings the question on when it’s OK to share something has come back into the public attention: Are we just doing the terrorists’ work by sharing their propaganda? When Apple’s iCloud was hackend and naked pictures of female celebrities were published, didn’t all people sharing them participate in the sexual assault that it was?

The answers to those very extreme situations might be simple for many. But in our digital lives we are often confronted with the question of whether sharing a certain piece of information is OK, is fair or right.

In this session I want to argue that sharing isn’t just about content but also about context. When sharing something we are not just taking on some responsibility of our social circle, the people around us, but we are also saying something by who we share on what topic at what time and so on. I’ll show some experiments or rules that people have published on their sharing and look at the consequences. The session will finish with the first draft of an ethics of (not)sharing. A set of rules governing what to share and what to leave alone.


In #Cryptowars the social contract is the first casualty

The cryptowars started again this year and the netizens seem to agree that regulation seems stupid: How can governments believe to regulate or ban math? In this talk we’ll examine this position and what it means for a democratic government. How can local laws be enforced in an interconnected, global world? Should they at all? Are the cypherpunks and cryptolibertarians right? This talk will argue the opposite from the basis of democratic legitimization and policy.
The year began with governments trying to regulate cryptography, the so-called cryptowars 2.0. Cameron in the UK, de Maizière in Germany and a bunch of people in the EU were looking into cryptography and how to get access to people’s communications in order to enforce the law. The Internet, hackers and cypherpunks at the forefront, wasn’t happy.

But apart from the big number of technical issues with legislation on cryptography there are bigger questions at hand. Questions regarding democracy and politics in a digital age. Questions that we as a digital community will have to start having good answers to soon.

We’ve enjoyed the exceptionism of the Digital for many years now. Copyright law mas mostly just an annoyance that we circumvented with VPNs and filesharing. We could buy drugs online that our local laws prohibited and no content was out of our reach, regardless of what our governments said. For some this was the definition of freedom.

Then came the politicians. Trying to regulate and enforce, breaking the Internet and being (in our opinion) stupid and clueless while doing so.

But while the Internet allows us (and corporations) to break or evade many laws we have to face the fact that the laws given are part of our democraticly legitimized social contract. That rules and their enforcement are traditionally the price we pay for a better, fairer society.

Do governments have the duty to fight back on cryptography? What kind of restrictions to our almost limitless freedom online should we accept? How can a local democracy work in a globalized digital world? Or is the Internet free of such chains? Are the cryptolibertarians right?

These and more questions I’ll adress in this session. Europe as a young and still malleable system could be the prototype of a digital democracy of the future. Let’s talk about how that could and should work.


 

Flattr this!

Conspiracies everywhere

So Google decided to stop making full disk encryption the default for Android for now (encryption is still easily available in the settings).
UPDATE: It was pointed out to me that Google’s current Nexus line devices (Nexus 6 and 9) do come with encrypted storage out of the box, it’s just not default for legacy devices making the EFF comment even more wrong.

It took about 7 centiseconds for the usual conspiracy nuts to crawl out of the woodwork. Here an example from the EFF:

“We know that there’s been significant government pressure, the Department of Justice has been bringing all the formal and informal pressure it can bear on Google to do exactly what they did today,” Nate Cardozo, a staff attorney at the Electronic Frontier Foundation, told me.

In the real world the situation is a lot simpler, a lot less convoluted: Android phones sadly often come with cheap Flash storage and only few devices use modern file systems. Full disk encryption on many android devices (such as my own Nexus5) is slow as molasses. So Google disabled the default to make phones running its operating system not look like old Pentium machines trying to run Windows8.

It’s easy to see conspiracies everywhere. It’s also not great for your mental health.

Flattr this!

Posts for Sunday, February 15, 2015

avatar

CIL and attributes

I keep on struggling to remember this, so let’s make a blog post out of it ;-)

When the SELinux policy is being built, recent userspace (2.4 and higher) will convert the policy into CIL language, and then build the binary policy. When the policy supports type attributes, these are of course also made available in the CIL code. For instance the admindomain attribute from the userdomain module:

...
(typeattribute admindomain)
(typeattribute userdomain)
(typeattribute unpriv_userdomain)
(typeattribute user_home_content_type)

Interfaces provided by the module are also applied. You won’t find the interface CIL code in /var/lib/selinux/mcs/active/modules though; the code at that location is already “expanded” and filled in. So for the sysadm_t domain we have:

# Equivalent of
# gen_require(`
#   attribute admindomain;
#   attribute userdomain;
# ')
# typeattribute sysadm_t admindomain;
# typeattribute sysadm_t userdomain;

(typeattributeset cil_gen_require admindomain)
(typeattributeset admindomain (sysadm_t ))
(typeattributeset cil_gen_require userdomain)
(typeattributeset userdomain (sysadm_t ))
...

However, when checking which domains use the admindomain attribute, notice the following:

~# seinfo -aadmindomain -x
ERROR: Provided attribute (admindomain) is not a valid attribute name.

But don’t panic – this has a reason: as long as there is no SELinux rule applied towards the admindomain attribute, then the SELinux policy compiler will drop the attribute from the final policy. This can be confirmed by adding a single, cosmetic rule, like so:

## allow admindomain admindomain:process sigchld;

~# seinfo -aadmindomain -x
   admindomain
      sysadm_t

So there you go. That does mean that if something previously used the attribute assignation for any decisions (like “for each domain assigned the userdomain attribute, do something”) will need to make sure that the attribute is really used in a policy rule.

Posts for Saturday, February 14, 2015

I ♥ Free Software 2015

“Romeo, oh, Romeo!” exclaims the 3D-printed robot Julliet to her 3D-printed Romeo.

It is that time of the year again – the day we display our affection to our significant other …and the Free Software we like best.

Usually I sing praise to the underdogs that I use, the projects rarely anyone knows about, small odd things that make my everyday life nicer.

This year though I will point out some communities, that I am (more or less) active in, that impressed me the most in the past year.

  • KDE – this desktop needs no introduction and neither should its community. But ever so often we have to praise things that we take for granted. KDE is one of the largest and nicest FS communities I have ever come across. After meeting a few known faces and some new ones at FOSDEM, I am very much looking forward to going to Akademy again this year!
  • Mageia – as far as GNU/Linux distros go, many would benefit by taking Mageia as a good example on how to include your community and how to develop your infrastructure to be inclusive towards newcommers.
  • Mer, Nemo Mobile – note: Jolla is a company (and commercial product with some proprietary bits), most of its Sailfish OS’s infrastructure is FS and Jolla tries very hard to co-operate with its community and as a rule develops in upstream. This is also the reason why the communities of the mentioned projects are very intertwined. The co-operation in this wider community is very active and while not there yet, Mer and Nemo Mobile (with Glacier UI coming soon) are making me very optimistic that a modern Free Software mobile OS is just around the corner.
  • Last, but not least, I must mention three1 communities that are not FS projects by themselves, but where instrumental to educating me (and many others) about FS and digital freedoms in general – Thank you, LUGOS for introducing me to FS way back in the ’90s and all the help in those early days! Thank you, Cyberpipe for all the things I learnt in your hackerspace! And thank you, FSFE for being the beacon of light for Free Software throughout Europe (and beyond)!

hook out → closing my laptop and running back to my lovely Andreja, whom I thank for bearing with me


  1. Historically Cyberpipe was founded as part of Zavod K6/4, but in 2013 Cyberpipe merged with one of its founders – LUGOS, thus merging the two already before intertwined communities for good. 

Posts for Sunday, February 8, 2015

avatar

Have dhcpcd wait before backgrounding

Many of my systems use DHCP for obtaining IP addresses. Even though they all receive a static IP address, it allows me to have them moved over (migrations), use TFTP boot, cloning (in case of quick testing), etc. But one of the things that was making my efforts somewhat more difficult was that the dhcpcd service continued (the dhcpcd daemon immediately went in the background) even though no IP address was received yet. Subsequent service scripts that required a working network connection failed to start then.

The solution is to configure dhcpcd to wait for an IP address. This is done through the -w option, or the waitip instruction in the dhcpcd.conf file. With that in place, the service script now waits until an IP address is assigned.

Posts for Saturday, February 7, 2015

I’d like my kernel vanilla, please

Yep, vanilla is the flavor of the kernel for me. I like using vanilla in #funtoo. It is nice and it is simple. No patches. No security watch-cha-ma-call-it or anything like that. Just me and that good ‘ol penguin; which deals with my hardware, networking and you-name-it systems.

I like tailoring my kernel to my needs. Ran the glorious:

make localmodconfig

With all my stuff plugged in and turned on. Also, I took the time of browsing the interesting parts of my kernel; checking out the help and all to see if I want those features or not. Specially on my networking section!

Anyway, that hard work is only done a few times (yep, I missed a lot of things the first time). It is fun and, after a while, you end up with a slim kernel that works fine for you.

All this said, I just wanna say: thank you, bitches! To the genkernel-next team. They’re doing great work while enabling me to use btrfs and virtio on my kernel by simplifying the insertion of these modules into my initrd. All I do when I get a kernel src upgrade is:

genkernel --virtio --btrfs --busybox --oldconfig --menuconfig --kernel-config=/etc/kernels/kernel-config-x86_64-3.18.<revision -minus-1> all
boot-update

or, what I just did to install 3.18.6:

genkernel --virtio --btrfs --busybox --oldconfig --menuconfig --kernel-config=/etc/kernels/kernel-config-x86_64-3.18.5 all
boot-update

Funtoo stores my kernel configs in /etc/kernels. This is convenient and genkernel helps me re-build my kernel, taking care of the old configuration and giving me the menuconfig to decide if I wanna tweak it some more or not.

Quite honestly, I don’t think –oldconfig is doing much here. It doesn’t ever ask me what I wanna do with the new stuff. It is supposed to have sane defaults. Maybe I am missing something. If anybody wants to clarify this, I am all eyes.

Oh well, I hope you got an idea of how to maintain your own vanilla kernel config with genkernel-next and Funtoo.

Posts for Friday, January 30, 2015

avatar

Things I should’ve done earlier.

On Linux, there are things that you know are better but you don’t switch because you’re comfortable where you are. Here’s a list of the things I’ve changed the past year that I really should’ve done earlier.

  • screen -> tmux
  • apache -> nginx
  • dropbox -> owncloud
  • bash -> zsh
  • bootstrapping vim-spf -> my own tailored and clean dotfiles
  • phing -> make
  • sahi -> selenium
  • ! mpd -> mpd (oh why did I ever leave you)
  • ! mutt -> mutt (everything else is severely broken)
  • a lot of virtualbox instances -> crossbrowsertesting.com (much less hassle, with support for selenium too!)

… would be interested to know what else I could be missing out on! :)

The post Things I should’ve done earlier. appeared first on thinkMoult.

Posts for Thursday, January 29, 2015

Cryptography and the Black or White Fallacy

Cryptography is the topic du jour in many areas of the Internet. Not the analysis of algorithms or the ongoing quest to find some reasonably strong kind of crypto people without a degree in computer science and black magic are able and willing to use but in the form of the hashtag #cryptowars.

The first crypto wars were fought when the government tried to outlaw certain encryption technologies or at least implementations thereof with a certain strength. Hackers and coders found ways to circumvent the regulation and got the technology out of the US and into the hands of the open source community. Since these days cryptography has been widely adopted to secure websites, business transactions and – for about 7 people on this planet and Harvey the invisible bunny –  emails.

But there is a storm coming:

Governments are publicly wondering whether maybe asking platform providers to keep encryption keys around so that the police can access certain communication given proper authorization (that idea is usually called key escrow). Now obviously that is not something everyone will like or support. And that’s cool, we call it democracy. It’s people debating, presenting ideas, evaluating options and finally coming up with a democratically legitimized consensus or at least a resolution.

There are very good arguments for that kind of potential access (for example enforcement of the social contract/law, consistency with the application of norms in the physical world) as well as against it (for example the right to communicate without interference or the technical difficulty and danger of a key escrow system). For the proponents of such a regulation the argument is simple: Security, Anti-terror, Protection. Bob’s your uncle. For the opposition it’s harder.

I read many texts in the last few days about how key escrow would “ban encryption”. Which we can just discard as somewhat dishonest given the way the proposed legislation is roughly described. The other train of thought seems to be that key escrow would “break” encryption. And I also find that argument somewhat strange.

If you are a purist, the argument is true: If encryption has to perfectly protect something against everyone, key escrow would “break” it. But I wonder what kind of hardware these purists run their encryption on, what kind of operating systems. How could anyone every be sure that the processors and millions of lines of code making up the software that we use to run our computers can be trusted? How easy would it be for Intel or AMD or whatever Chip manufacturer you can think of to implement backdoors? And we know how buggy operating systems are. Even if we consider them to be written in the best of faith.

Encryption that has left the wonderful and perfect world of theory and pure algorithms is always about pragmatism. Key lengths for example are always a trade-off between the performance penalty they cause and the security they provide given a certain technological default. In a few years computers have gotten faster, which would make your keys short enough to be broken but since computers have gotten faster, you can use longer keys and maybe even more complex encryption algorithms.

So why, if deploying encryption is always about compromise, is key escrow automatically considered to “break” all encryption. Why wouldn’t people trust the web anymore? Why would they suddenly be the target of criminals and theft as some disciples of the church of crypto are preaching?

In most cases not the whole world is your enemy. At least I hope so, for your sake. Every situation, every facet of life has different threat models. How do threat models work? When I ride my bike to work I could fall due to a bad road, ice, some driver could hit me with their car. I address those threats in the way I drive or prepare: I always have my bike’s light on to be seen, I avoid certain roads and I keep an eye on the car traffic around me. I don’t consider the dangers of a whale falling down on me, aliens abducting me or the CIA trying to kill me. Some people might (and might have to given they annoyed the CIA or aliens), but for me, those are no threats I spend any mental capacities on.

My laptop’s harddrive is encrypted. The reason is not that it would protect its data against the CIA/NSA/AlienSecurityAgency. Because they’d just lock me up till I give them the key. Or punch me till I do. Or make me listen to Nickelback. No, I encrypt my drive so that in case my laptop gets stolen the thief might have gotten decent hardware but no access to my accounts and certain pieces of information. Actually, in my personal digital threat modeling, governments really didn’t influence my decision much.

In many cases we use encryption not to hide anything from the government. HTTPS makes sense for online stores not because the government could see what I buy (because given reasonable ground for suspicion they could get a court order and check my mail before I get it which no encryption helps against) but because sending around your credit card data in the clear is not a great idea(tm) if you want to be the only person using that credit card to buy stuff.

There are reasonable situations where encryption is used as defense against governments and their agencies. But in those cases it’s some form of open source end-to-end cryptography anyways, something you cannot outlaw (as the crypto wars of old have proven). On the other hands, in many situations encryption is mostly used to protect us from certain asshats who would love to change our Facebook profile picture to a penis or a frog or a frog’s penis1 or who’d like us to pay for their new laptop and Xbox. And they wouldn’t get access to any reasonably secure implementation of key escrow.

The idea that any “impurity”, any interference into cryptography is a typical black or white fallacy. Two options are presented for people to choose from: A) Cryptography deployed perfectly as it is in its ideal form and B) Cryptography is “broken”. But we know from our everyday life that that is – excuse my language – bullshit. Because every form of encryption we use is a compromise in some way, shape or form.

I have to extend trust to the makers of my hardware and software, to the people who might have physical access to my laptop at some point and to the fact that nobody sneaks into my home at night to install weird keyloggers on my machine. All that trust I extend does not “break” the encryption on my harddrive. You could argue that it weakens it against certain adversaries (for example a potentially evil Intel having a backdoor in my machine) but for my personal threat model those aspects are mostly irrelevant or without options. I don’t have to option to completely build my own computer and all the required software on it. Because I’ve got shit to do, pictures of monkeys to look at etc.

Personally I haven’t fully come to a conclusion on whether key escrow is a reasonable, good way to deal with the problem of enforcement of certain laws. And if it is, which situations it should apply to, who that burden should be placed on. But one thing is obvious: All those articles of the “death of crypto” or the “destruction of crypto” or the “war against crypto” seem to be blown massively out of proportion forfeiting the chance to make the case for certain liberties or against certain regulation for a style of communication reminding me of right-wing politicians using terrorist attacks to legitimize massive violations of human rights. Which is ironically exactly the kind of argument that those writing all these “crypto is under fire!!11″ articles usually complain about.

Photo by Origami48616

  1. I don’t know if frogs have penises

Flattr this!

avatar

Practical fault detection & alerting. You don't need to be a data scientist


As we try to retain visibility into our increasingly complicated applications and infrastructure, we're building out more advanced monitoring systems. Specifically, a lot of work is being done on alerting via fault and anomaly detection. This post covers some common notions around these new approaches, debunks some of the myths that ask for over-complicated solutions, and provides some practical pointers that any programmer or sysadmin can implement that don't require becoming a data scientist.

It's not all about math

I've seen smart people who are good programmers decide to tackle anomaly detection on their timeseries metrics. (anomaly detection is about building algorithms which spot "unusual" values in data, via statistical frameworks). This is a good reason to brush up on statistics, so you can apply some of those concepts. But ironically, in doing so, they often seem to think that they are now only allowed to implement algebraic mathematical formulas. No more if/else, only standard deviations of numbers. No more for loops, only moving averages. And so on.
When going from thresholds to something (anything) more advanced, suddenly people only want to work with mathematical formula's. Meanwhile we have entire Turing-complete programming languages available, which allow us to execute any logic, as simple or as rich as we can imagine. Using only math massively reduces our options in implementing an algorithm.

For example I've seen several presentations in which authors demonstrate how they try to fine-tune moving average algorithms and try to get a robust base signal to check against but which is also not affected too much by previous outliers, which raise the moving average and might mask subsequent spikes).
from A Deep Dive into Monitoring with Skyline

But you can't optimize both, because a mathematical formula at any given point can't make the distinction between past data that represents "good times" versus "faulty times".
However: we wrap the output of any such algorithm with some code that decides what is a fault (or "anomaly" as labeled here) and alerts against it, so why would we hold ourselves back in feeding this useful information back into the algorithm?
I.e. assist the math with logic by writing some code to make it work better for us: In this example, we could modify the code to just retain the old moving average from before the time-frame we consider to be faulty. That way, when the anomaly passes, we resume "where we left off". For timeseries that exhibit seasonality and a trend, we need to do a bit more, but the idea stays the same. Restricting ourselves to only math and statistics cripples our ability to detect actual faults (problems).

Another example: During his Monitorama talk, Noah Kantrowitz made the interesting and thought provoking observation that Nagios flap detection is basically a low-pass filter. A few people suggested re-implementing flap detection as a low-pass filter. This seems backwards to me because reducing the problem to a pure mathematical formula loses information. The current code has the high-resolution view of above/below threshold and can visualize as such. Why throw that away and limit your visibility?

Unsupervised machine learning... let's not get ahead of ourselves.

Etsy's Kale has ambitious goals: you configure a set of algorithms, and those algorithms get applied to all of your timeseries. Out of that should come insights into what's going wrong. The premise is that the found anomalies are relevant and indicative of faults that require our attention.
I have quite a variety amongst my metrics. For example diskspace metrics exhibit a sawtooth pattern (due to constant growth and periodic cleanup), crontabs cause (by definition) periodic spikes in activity, user activity causes a fairly smooth graph which is characterized by its daily pattern and often some seasonality and a long-term trend.



Because they look differently, anomalies and faults look different too. In fact, within each category there are multiple problematic scenarios. (e.g. user activity based timeseries should not suddenly drop, but also not be significantly lower than other days, even if the signal stays smooth and follows the daily rhythm)

I have a hard time believing that running the same algorithms on all of that data, and doing minimal configuration on them, will produce meaningful results. At least I expect a very low signal/noise ratio. Unfortunately, of the people who I've asked about their experiences with Kale/Skyline, the only cases where it's been useful is where skyline input has been restricted to a certain category of metrics - it's up to you do this filtering (perhaps via carbon-relay rules), potentially running multiple skyline instances - and sufficient time is required hand-selecting the appropriate algorithms to match the data. This reduces the utility.
"Minimal configuration" sounds great but this doesn't seem to work.
Instead, something like Bosun (see further down) where you can visualize your series, experiment with algorithms and see the results in place on current and historical data, to manage alerting rules seems more practical.

Some companies (all proprietary) take it a step further and pay tens of engineers to work on algorithms that inspect all of your series, classify them into categories, "learn" them and automatically configure algorithms that will do anomaly detection, so it can alert anytime something looks unusual (though not necessarily faulty). This probably works fairly well, but has a high cost, still can't know everything there is to know about your timeseries, is of no help of your timeseries is behaving faulty from the start and still alerts on anomalous, but irrelevant outliers.

I'm suggesting we don't need to make it that fancy and we can do much better by injecting some domain knowledge into our monitoring system:
  • using minimal work of classifying metrics via metric meta-data or rules that parse metric id's, we can automatically infer knowledge of how the series is supposed to behave (e.g. assume that disk_mb_used looks like sawtooth, frontend_requests_per_s daily seasonal, etc) and apply fitting processing accordingly.
    Any sysadmin or programmer can do this, it's a bit of work but should make a hands-off automatic system such as Kale more accurate.
    Of course, adopting metrics 2.0 will help with this as well. Another problem with machine learning is they would have to infer how metrics relate against each other, whereas with metric metadata this can easily be inferred (e.g.: what are the metrics for different machines in the same cluster, etc)
  • hooking into service/configuration management: you probably already have a service, tool, or file that knows how your infrastructure looks like and which services run where. We know where user-facing apps run, where crontabs run, where we store log files, where and when we run cleanup jobs. We know in what ratios traffic is balanced across which nodes, and so on. Alerting systems can leverage this information to apply better suited fault detection rules. And you don't need a large machine learning infrastructure for it. (as an aside: I have a lot more ideas on cloud-monitoring integration)
  • Many scientists are working on algorithms that find cause and effect when different series exhibit anomalies, so they can send more useful alerts. But again here, a simple model of the infrastructure gives you service dependencies in a much easier way.
  • hook into your event tracking. If you have something like anthracite that lists upcoming press releases, then your monitoring system knows not to alert if suddenly traffic is a bit higher. In fact, you might want to alert if your announcement did not create a sudden increase in traffic. If you have a large scale infrastructure, you might go as far as tagging upcoming maintenance windows with metadata so the monitoring knows which services or hosts will be affected (and which shouldn't).

Anomaly detection is useful if you don't know what you're looking for, or providing an extra watching eye on your log data. Which is why it's commonly used for detecting fraud in security logs and such. For operational metrics of which admins know what they mean, should and should not look like, and how they relate to each other, we can build more simple and more effective solutions.

The trap of complex event processing... no need to abandon familiar tools

On your quest into better alerting, you soon read and hear about real-time stream processing, and CEP (complex event processing) systems. It's not hard to be convinced on their merits: who wouldn't want real-time as-soon-as-the-data-arrives-you-can-execute-logic-and-fire-alerts?
They also come with a fairly extensive and flexible language that lets you program or compose monitoring rules using your domain knowledge. I believe I've heard of storm for monitoring, but Riemann is the best known of these tools that focus on open source monitoring. It is a nice, powerful tool and probably the easiest of the CEP tools to adopt. It can also produce very useful dashboards. However, these tools come with their own API or language, and programming against real-time streams is quite a paradigm shift which can be hard to justify. And while their architecture and domain specificity works well for large scale situations, these benefits aren't worth it for most (medium and small) shops I know: it's a lot easier (albeit less efficient) to just query a datastore over and over and program in the language you're used to. With a decent timeseries store (or one written to hold the most recent data in memory such as carbonmem) this is not an issue, and the difference in timeliness of alerts becomes negligible!

An example: finding spikes

Like many places, we were stuck with static thresholds, which don't cover some common failure scenarios. So I started asking myself some questions:

which behavioral categories of timeseries do we have, what kind of issues can arise in each category,
how does that look like in the data, and what's the simplest way I can detect each scenario?

Our most important data falls within the user-driven category from above where various timeseries from across the stack are driven by, and reflect user activity. And within this category, the most common problem (at least in my experience) is spikes in the data. I.e. a sudden drop in requests/s or a sudden spike in response time. As it turned out, this is much easier to detect than one might think:

In this example I just track the standard deviation of a moving window of 10 points. Standard deviation is simply a measure of how much numerical values differ from each other. Any sudden spike bumps the standard deviation. We can then simply set a threshold on the deviation. Fairly trivial to set up, but has been highly effective for us.

You do need to manually declare what is an acceptable standard deviation value to be compared against, which you will typically deduce from previous data. This can be annoying until you build an interface to speed up, or a tool to automate this step.
In fact, it's trivial to collect previous deviation data (e.g. from the same time of the day, yesterday, or the same time of the week, last week) and automatically use that to guide a threshold. (Bosun - see the following section - has "band" and "graphiteBand" functions to assist with this). This is susceptible to previous outliers, but you can easily select multiple previous timeframes to minimize this issue in practice.
it-telemetry thread

So without requiring fancy anomaly detection, machine learning, advanced math, or event processing, we are able to reliably detect faults using simple, familiar tools. Some basic statistical concepts (standard deviation, moving average, etc) are a must, but nothing that requires getting a PhD. In this case I've been using Graphite's stdev function and Graph-Explorer's alerting feature to manage these kinds of rules, but it doesn't allow for a very practical iterative workflow, so the non-trivial rules will be going into Bosun.
BTW, you can also use a script to query Graphite from a Nagios check and do your logic


Workflow is key. A closer look at bosun

One of the reasons we've been chasing self-learning algorithms is that we have lost faith in the feasibility of a more direct approach. We can no longer imagine building and maintaining alerting rules because we have no system that provides powerful alerting, helps us keep oversight and streamlines the process of maintaining and iteratively developing alerting.
I recently discovered bosun, an alerting frontend ("IDE") by Stack Exchange, presented at Lisa14. I highly recommend watching the video. They have identified various issues that made alerting a pain, and built a solution that makes human-controlled alerting much more doable. We've been using it for a month now with good results (I also gave it support for Graphite). I'll explain its merits, and it'll also become apparent how this ties into some of the things I brought up above:
  • in each rule you can query any data you need from any of your datasources (currently graphite, openTSDB, and elasticsearch). You can call various functions, boolean logic, and math. Although it doesn't expose you a full programming language, the bosun language as it stands is fairly complete, and can be extended to cover new needs. You choose your own alerting granularity (it can automatically instantiate alerts for every host/service/$your_dimension/... it finds within your metrics, but you can also trivially aggregate across dimensions, or both). This makes it easy to create advanced alerts that cover a lot of ground, making sure you don't get overloaded by multiple smaller alerts. And you can incorporate data of other entities within the system, to simply make better alerting decisions.
  • you can define your own templates for alert emails, which can contain any html code. You can trivially plot graphs, build tables, use colors and so on. Clear, context-rich alerts which contain all information you need!
  • As alluded to, the bosun authors spent a lot of time thinking about, and solving the workflow of alerting. As you work on advanced fault detection and alerting rules you need to be able to see the value of all data (including intermediate computations) and visualize it. Over time, you will iteratively adjust the rules to become better and more precise. Bosun supports all of this. You can execute your rules on historical data and see exactly how the rule performs over time, by displaying the status in a timeline view and providing intermediate values. And finally, you can see how the alert emails will be rendered as you work on the rule and the templates
The examples section gives you an idea of the things you can do.
I haven't seen anything solve a pragmatic alerting workflow like bosun (hence their name "alerting IDE"), and its ability to not hold you back as you work on your alerts is refreshing. Furthermore, the built-in processing functions are very complimentary to graphite: Graphite has a decent API which works well at aggregating and transforming one or more series into one new series; the bosun language is great at reducing series to single numbers, providing boolean logic, and so on, which you need to declare alerting expressions. This makes them a great combination.
Of course Bosun isn't perfect either. Plenty of things can be done to make it (and alerting in general) better. But it does exemplify many of my points, and it's a nice leap forward in our monitoring toolkit.

Conclusion

Many of us aren't ready for some of the new technologies, and some of the technology isn't - and perhaps never will be - ready for us. As an end-user investigating your options, it's easy to get lured in a direction that promotes over-complication and stimulates your inner scientist but just isn't realistic.
Taking a step back, it becomes apparent we can setup automated fault detection. But instead of using machine learning, use metadata, instead of trying to come up with all-encompassing holy grail of math, use several rules of code that you prototype and iterate over, then reuse for similar cases. Instead of requiring a paradigm shift, use a language you're familiar with. Especially by polishing up the workflow, we can make many "manual" tasks much easier and quicker. I believe we can keep polishing up the workflow, distilling common patterns into macros or functions that can be reused, leveraging metric metadata and other sources of truth to configure fault detection, and perhaps even introducing "metrics coverage", akin to "code coverage": verify how much, and which of the metrics are adequately represented in alerting rules, so we can easily spot which metrics have yet to be included in alerting rules. I think there's a lot of things we can do to make fault detection work better for us, but we have to look in the right direction.

PS: leveraging metrics 2.0 for anomaly detection

In my last metrics 2.0 talk, at LISA14 I explored a few ideas on leveraging metrics 2.0 metadata for alerting and fault detection, such as automatically discovering error metrics across the stack, getting high level insights via tags, correlation, etc. If you're interested, it's in the video from 24:55 until 29:40

Posts for Tuesday, January 27, 2015

StrongSwan VPN (and ufw)

I make ample use of SSH tunnels. They are easy which is the primary reason. But sometimes you need something a little more powerful, like for a phone so all your traffic can’t be snooped out of the air around you, or so that all your traffic not just SOCKS proxy aware apps can be sent over it. For that reason I decided to delve into VPN software over the weekend. After a pretty rushed survey I ended up going with StrongSwan. OpenVPN brings back nothing but memories of complexity and OpenSwan seemed a bit abandoned so I had to pick one of its decendands and StrongSwan seemed a bit more popular than LibreSwan. Unscientific and rushed, like I said.

So there are several scripts floating around that will just auto set it up for you, but where’s the fun (and understanding allowing tweeking) in that. So I found two guides and smashed them together to give me what I wanted:

strongSwan 5: How to create your own private VPN is the much more comprehensive one, but also set up a cert style login system. I wanted passwords initially.

strongSwan 5 based IPSec VPN, Ubuntu 14.04 LTS and PSK/XAUTH has a few more details on a password based setup.

Additional notes: I pretty much ended up doing the first one stright through except creating client certs. Also the XAUTH / IKE1 setup of the password tutorial seems incompatible with the Android StrongSwan client, so I used EAP / IKE2, pretty much straight out of the first one. Also seems like you still need to install the CA cert and vpnHost cert on the phone unless I was missing something.

Also, as an aside, and a curve ball to make things more dificult, this was done one a new server I am playing with. Even since I’d played with OpenBSD’s pf, I’ve been ruined for iptables. It’s just not as nice. So I’d been hearing about ufw from the Ubuntu community from a while and was curious if it was nicer and better. I figured after several years maybe it was mature enough to use on a server. I think maybe I misunderstood its point. Uncomplicated maybe meant not-featureful. Sure for unblocking ports for an app it’s cute and fast, and even for straight unblocking a port its syntax is a bit clearer I guess? But as I delved into it I realized I might have made a mistake. It’s built ontop of the same system iptables uses, but created all new tables so iptables isn’t really compatible with it. The real problem however is that the ufw command has no way to setup NAT masquerading. None. The interface cannot do that. Whoops. There is a hacky work around I found at OpenVPN – forward all client traffic through tunnel using UFW which involves editing config files in pretty much iptables style code. Not uncomplicated or easier or less messy like I’d been hopnig for.

So a little unimpressed with ufw (but learned a bunch about it so that’s good and I guess what I was going for) and had to add “remove ufw and replace with iptables on that server” to my todo list, but after a Sunday’s messing around I was able to get my phone to work over the VPN to my server and the internet. So a productive time.

Posts for Wednesday, January 21, 2015

avatar

Old Gentoo system? Not a problem…

If you have a very old Gentoo system that you want to upgrade, you might have some issues with too old software and Portage which can’t just upgrade to a recent state. Although many methods exist to work around it, one that I have found to be very useful is to have access to old Portage snapshots. It often allows the administrator to upgrade the system in stages (say in 6-months blocks), perhaps not the entire world but at least the system set.

Finding old snapshots might be difficult though, so at one point I decided to create a list of old snapshots, two months apart, together with the GPG signature (so people can verify that the snapshot was not tampered with by me in an attempt to create a Gentoo botnet). I haven’t needed it in a while anymore, but I still try to update the list every two months, which I just did with the snapshot of January 20th this year.

I hope it at least helps a few other admins out there.

Posts for Wednesday, January 14, 2015

Digital dualism, libertarians and the law – cypherpunks against Cameron edition

The sociologist Nathan Jurgenson coined the term “digital dualism” in 2011. Digital dualism is the idea that the digital sphere is something separate from the physical sphere, that those two “spaces” are distinct and have very different rulesets and properties, different “natural laws”.

Jurgenson defined this term in light of an avalanche of articles explaining the emptiness and non-realness of digital experiences. Articles celebrating the “Offline” as the truer, realer and – yes – better space. But the mirror-image to those offline-enthusiasts also exists. Digital dualism permeates the Internet positivists probably as much as it does most Internet sceptics. Take one of the fundamental, central documents that so many of the ideology of leading digital activists and organisations can be traced back to: The Declaration of the Independence of Cyberspace. Digital dualism is at the core of that eloquent piece of writing propping up “cyberspace” as the new utopia, the (quote) “new home of Mind“.

I had to think of that, as Jurgenson calls it, digital dualism fallacy, when Great Britain’s Prime Minister David Cameron’s position on digital communication went public. Actually – I started to think about it when the reactions to Mr. Cameron’s plans emerged.

BoingBoing’s Cory Doctorow immediately warned that Cameron’s proposal would “endanger every Briton and destroy the IT industry“, the British Guardian summarized that Cameron wanted to “ban encryption“, a statement repeated by security guru Bruce Schneier. So what did Mr. Cameron propose?

In a public speech, about 4 minutes long, Cameron argued that in the light of terrorist attacks such as the recent attacks in Paris, the British government needed to implement steps to make it harder for terrorists to communicate without police forces listening in. The quote most news agencies went with was:

In our country, do we want to allow a means of communication between people which […] we cannot read?

Sounds grave and … well … evil. A big brother style government peeking into even the most private conversations of its citizens.

But the part left out (as indicated by the […]) adds some nuance. Cameron actually says (go to 1:50 in the video):

In our country, do we want to allow a means of communication between people which even in extremis with a signed warrant by the home secretary personally we cannot read?

He also goes into more detail, illustrating a process he wants to establish for digital communication analogue to the legal process we (as in liberal democracies) already have established for other, physical means of communication.

Most liberal democracies have similar processes for when the police needs to or at least wants to investigate some private individual’s communication such as their mail or the conversations within their own apartments or houses. The police needs to make their case to a judge explaining the precise and current danger for the public’s or some individual’s safety or present enough evidence to implicate the suspect in a crime of significant gravity. Then and only then the judge (or a similar entity) can decide that the given situation warrants the suspects’ human rights to be infringed upon. With that warrant or court order the police may now go and read a person’s mail to the degree the judge allowed them to.

Cameron wants something similar for digital communication meaning that the police can read pieces of it with a warrant or court order. And here we have to look at encryption: Encryption makes communication mostly impossible to read unless you have the relevant keys to unlock it. But there are different ways to implement encryption that might look very similar but make a big difference in cases like this.

The platform provider – for example WhatsApp or Google with their GMail service – could encrypt the data for its users. That would mean that the key to lock or unlock the data would reside with the platform provider who would make sure that nobody apart from themselves or the parties communicating could read it. In the best-practice case of so-called end-to-end encryption, only the two parties communicating have the keys to open the encrypted data. Not even the platform provider could read the message.

If we look at physical mail, the content of a letter is protected with a nifty technology called an “envelope”. An envelope is a paper bag that makes the actual contents of the letter unreadable, only the source and target addresses as well as the weight and size of the content can be seen. Physically envelopes are not too impressive, you can easily tear them open and look at what’s in them, but they’ve got two things going for them. First of all you can usually see when an envelope has been opened. But secondly and a lot more powerfully the law protects the letter inside. Opening someone else’s mail is a crime even for police detectives (unless they have the court order we spoke about earlier). But if the content is written in some clever code or secret language, the police is still out of luck, even with a court order.

From my understanding of Cameron’s argument, supported by his choice of examples, what he is going for is something called key escrow. This means that a platform provider has to keep the encryption keys necessary to decrypt communication going over their servers available for a while. Only when an authorized party asks for them with proper legitimisation (court order), the platform provider hands over the keys for the specific conversations requested. This would actually work very similar to how the process for access to one’s mail works today. (Britain does already have a so called key disclosure law called RIPA which forces suspects to hand over their own personal encryption keys with a court order. This servers a slightly different use case though because forcing someone to hand over their keys does automatically inform them of their status as a suspect making surveillance in order to detect networks of criminals harder.)

Key escrow is highly problematic as anyone slightly tech-savvy can probably guess. The recent hacks on Sony have shown us that even global corporations with significant IT staff and budget have a hard time keeping their own servers and infrastructure secure from unauthorized access. Forcing companies to store all those encryption keys on their servers would paint an even bigger target on them than there already is: Gaining access to those servers would not only give crackers a lot of data about people but access to their communication and potentially even the opportunity for impersonation with all of its consequences. And even if we consider companies trustworthy and doing all they can to implement secure servers and services, bugs happen. Every software more complex than “Hello World” has bugs, some small, some big. And if they can give attackers access to the keys to all castles, they will be found if just by trial and error or pure luck. People are persistent like that.

Tech people know that, but Mr. Cameron might actually not. And as a politician his position is actually very consistent and consequent. It’s his job to make sure that the democratically legitimized laws and rules of the country he governs are enforced, that the rights these laws give its citizens and all people are defended. That is what being elected the prime minister of the UK means. Public and personal security are, just as a reasonable expectancy of privacy, a big part of those rights, of those basic human rights. Mr. Cameron seems to see the safety and security of the people in Britain in danger and applies and a adapts a well-established process to the digital sphere and the communication therein homogenizing the situation between the physical and the digital spheres. He is in fact actively reducing or negating digital dualism while implicitly valuing the Internet and the social processes in it as real and equal to those in the physical sphere. From this perspective his plan (not the potentially dangerous and flawed implementations) is actually very forward thinking and progressive.

But laws are more than just ideas or plans, each law can only be evaluated in the context of its implementation. A law giving every human being the explicit right to ride to work on a unicorn is worthless as long as unicorns don’t exist. And who would take care of all the unicorn waste anyways? And as we already analysed, key escrow and similar ways of giving governments central access to encryption keys is very, very problematic. So even if we might agree that his idea about the police having potential access to selected communication with a court order is reasonable, the added risks of key escrow would make his proposal more dangerous and harmful that it would bring benefit. But agree the cypherpunks do not.

Cypherpunks are a subculture of activists “advocating widespread use of strong cryptography as a route to social and political change” (quote Wikipedia). Their ideology can be characterized as deeply libertarian, focused on the individual and its freedom from oppression and restriction. To them privacy and anonymity are key to the digital age. Quoting the Cypherpunk Manifesto:

Privacy is necessary for an open society in the electronic age. […]

We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy […]

We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.

Famous cypherpunks include Wikileaks’ Julian Assange, Jacob Applebaum who worked on the anonymisation software Tor and Snowden’s leaked documents as well as EFF’s Jillian C. York. If there was an actual cypherpunk club, it’s member list would be a who-is-who of the digital civil rights scene. The cypherpunk movement is also where most of the most fundamental critique of Cameron’s plans came from, their figure heads pushed the idea of the government banning encryption.

Cypherpunks generally subscribe to digital dualism as well. The quote from their manifesto makes it explicit mirroring the idea of the exceptionalism of the Internet and the digital sphere: “The technologies of the past did not allow for strong privacy, but electronic technologies do.” In their belief the Internet is a new and different thing, something that will allow all their libertarian ideas of free and unrestricted societies to flourish. Governments don’t sit all too well with that idea.

Where the anti-Internet digital dualists argue for the superiority of the physical, the space where governments rule in their respective areas, mostly conceptualizing the digital sphere as a toy, a play thing or maybe an inferior medium, the pro-Internet digital dualists of the cypherpunk clan feel that the Internet has transcended, superseded the physical. That in this space for its inhabitants new rules – only new rules – apply. Governments aren’t welcome in this world of bits and heroes carrying the weapons of freedom forged from code.

To these self-proclaimed warriors of digital freedom every attempt by governments to regulate the Internet, to enforce their laws in whatever limited way possible, is an attack, a declaration of war, an insult to what the Internet means and is. And they do have good arguments.

The Internet has a different structure than the physical. Where in the physical world distances matter a lot to define who belongs together, where borders are sometimes actually hard to cross, the Internet knows very little distance. We feel that our friends on the other side of the globe might have a different schedule, might have finished dinner before we even had breakfast, but they are still as close to us as our next-door neighbor. Messages travel to any point on this globe fast enough for us not to be able to perceive a significant difference between a message to a friend in Perth or one in Madrid.

Which government is supposed to regulate the conversation some Chinese, some Argentinian and some Icelandic people are having? Whose laws should apply? Does the strictest law apply or the most liberal one? Can a person break the laws of a country without ever having stepped into it, without ever having had the plan to visit that place? And how far is that country potentially allowed to go to punish these regressions? Most of these questions haven’t been answered sufficiently and convincingly.

The approach of the Internet as this whole new thing beyond the reach of the governments of the physical world of stone and iron seems to solve these – very hard – problems quite elegantly. By leaving the building. But certain things don’t seem to align with our liberal and democratic ideas. Something’s rotten in the state of cypherpunkia.

Our liberal democracies are founded of the principle of equality before the law. The law has to treat each an every one the same. No matter how rich you are, who your familiy is or what color your toenails have: The rules are the rules. There is actually quite the outrage when that principle is transgressed, when privileged people get free where minorities are punished harshly. The last months with their numerous dead people of color killed by policemen in the US have illustrated the dangerous, even deadly consequences of a society applying rules and the power of the enforcement entities unequally. Equality before the law is key to any democracy.

Here’s where pro-Internet digital dualism is problematic. It claims a different, more liberal ruleset for skilled, tech-savvy people. For those able to set up, maintain and use the digital tools and technologies securely. For the digital elite. The high priests of the new digital world.

The main argument against Cameron’s plans seems not to be that the government should never look at any person’s communication but that it shouldn’t be allowed to look at the digital communication that a certain group of people has access to and adopted as their primary means of communication. It’s not challenging the idea of what a government is allowed to do, it’s trying to protect a privilege.

Even with the many cases of the abuse of power by the police or by certain individuals within that structure using their access to spy on their exes or neighbors or whoever, there still seems to be a democratic majority supporting a certain level of access of the government or police to private communication in order to protect other goods such as public safety. And where many journalists and critics push for stronger checks and better processes to control the power of the police and its officers I don’t see many people arguing for a total restriction.

This debate about government access illustrates what can happen when libertarian criticism of the actions of certain governments or government agencies of democratic states capsizes and becomes contempt for the idea of democracy and its processes itself.

Democracy is not about efficiency, it’s about distributing, legitimizing and checking power as fairly as possible. The processes that liberal democracies have established to give the democratically legitimized government access to an individual’s communication or data in order to protect a public or common good are neither impenetrable nor efficient. It’s about trade-offs and checks and balances to try to protect the system against manipulation from within while still getting anything done. It’s not perfect, especially not in the implementations that exist but it does allow people to participate equally, whether they like hacking code or not.

When digital activists argue against government activities that are properly secured by saying “the requirement of a court order is meaningless because they are trivial to get” they might mean to point at some explicit flaw in a certain process. But often they also express their implicit distrust towards all government processes. Forgetting or ignoring that governments in democratic countries are the legitimized representation if the power of the people.

Digital dualism is a dangerous but powerful fallacy. Where it has created a breeding ground for texts about the horrors of the Internet, the falsehood of all social interaction in this transnational digital sphere it has also created an environment where the idea of government and with it often the ideas of democracy have been put up for debate to be replaced with … well … not much. Software that skilled people can use to defend themselves against other skilled people who might have even better software.

Cryptography is a very useful tool for the individual. It allows us to protect communication and data, makes so much of the Internet even possible. Without encryption we couldn’t order anything online or do our banking or send emails or tweets or Facebook updates without someone hacking in, we couldn’t store our data on cloud services as backups. We couldn’t trust the Internet at all.

But we are more than individuals. We are connected into social structures that sometimes have to deal with people working against them or the rules the social systems agreed upon. Technology, even one as powerful as cryptography, does not protect and strengthen the social systems that we live in, the societies and communities that we rely on and that make us human, define our cultures.

The fight against government spying (and that is what this aggressive battle against Cameron’s suggestion stems from: The fear that any system like that would be used by governments and spy agencies to collect even more data) mustn’t make us forget what defines our cultures, our commons and our communities.

We talk a lot about communities online and recently even about codes of conduct and how to enforce them. Big discussions have emerged online on how to combat harassment, how to sanction asocial behavior and how to protect those who might not be able to protect themselves. In a way the Internet is having a conversation with itself trying to define its own rules.

But we mustn’t stop there. You might think that coming up with rules on how to act online and ways to enforce them is hard, but the actual challenge is to find a way to reintegrate all we do online with the offline world. Because the are not separate: Together they form the world.

The question isn’t how to keep the governments out of the Internet. The real question is how we can finally overcome the deeply rooted digital dualism to create a world that is worth living for for people who love tech as well as people who might not care. The net is no longer the cyber-utopia of a few hackers. It’s potentially part of everybody’s life and reality.

How does the democracy of the future look like? How should different national laws apply in this transnational space? How do human rights translate into the digital sphere and where do we need to draw he lines for government regulation and intervention? Those are hard questions that we have to talk about. Not just hackers and techies with each other but everyone. And I am sure that at the end of that debate a key escrow system such as the one Mr. Cameron seemingly proposed wouldn’t be what we agree on. But to find that out we have to start the discussion.

Photo  by dullhunk

Flattr this!

Planet Larry is not officially affiliated with Gentoo Linux. Original artwork and logos copyright Gentoo Foundation. Yadda, yadda, yadda.