Posts for Wednesday, September 24, 2014


After SELinux System Administration, now the SELinux Cookbook

Almost an entire year ago (just a few days apart) I announced my first published book, called SELinux System Administration. The book covered SELinux administration commands and focuses on Linux administrators that need to interact with SELinux-enabled systems.

An important part of SELinux was only covered very briefly in the book: policy development. So in the spring this year, Packt approached me and asked if I was interested in authoring a second book for them, called SELinux Cookbook. This book focuses on policy development and tuning of SELinux to fit the needs of the administrator or engineer, and as such is a logical follow-up to the previous book. Of course, given my affinity with the wonderful Gentoo Linux distribution, it is mentioned in the book (and even the reference platform) even though the book itself is checked against Red Hat Enterprise Linux and Fedora as well, ensuring that every recipe in the book works on all distributions. Luckily (or perhaps not surprisingly) the approach is quite distribution-agnostic.

Today, I got word that the SELinux Cookbook is now officially published. The book uses a recipe-based approach to SELinux development and tuning, so it is quickly hands-on. It gives my view on SELinux policy development while keeping the methods and processes aligned with the upstream policy development project (the reference policy).

It’s been a pleasure (but also somewhat a pain, as this is done in free time, which is scarce already) to author the book. Unlike the first book, where I struggled a bit to keep the page count to the requested amount, this book was not limited. Also, I think the various stages of the book development contributed well to the final result (something that I overlooked a bit in the first time, so I re-re-reviewed changes over and over again this time – after the first editorial reviews, then after the content reviews, then after the language reviews, then after the code reviews).

You’ll see me blog a bit more about the book later (as the marketing phase is now starting) but for me, this is a major milestone which allowed me to write down more of my SELinux knowledge and experience. I hope it is as good a read for you as I hope it to be.


InfluxDB as a graphite backend, part 2

The Graphite + InfluxDB series continues.

  • In part 1, "On Graphite, Whisper and InfluxDB" I described the problems of Graphite's whisper and ceres, why I disagree with common graphite clustering advice as being the right path forward, what a great timeseries storage system would mean to me, why InfluxDB - despite being the youngest project - is my main interest right now, and introduced my approach for combining both and leveraging their respective strengths: InfluxDB as an ingestion and storage backend (and at some point, realtime processing and pub-sub) and graphite for its renown data processing-on-retrieval functionality. Furthermore, I introduced some tooling: carbon-relay-ng to easily route streams of carbon data (metrics datapoints) to storage backends, allowing me to send production data to Carbon+whisper as well as InfluxDB in parallel, graphite-api, the simpler Graphite API server, with graphite-influxdb to fetch data from InfluxDB.
  • Not Graphite related, but I wrote influx-cli which I introduced here. It allows to easily interface with InfluxDB and measure the duration of operations, which will become useful for this article.
  • In the Graphite & Influxdb intermezzo I shared a script to import whisper data into InfluxDB and noted some write performance issues I was seeing, but the better part of the article described the various improvements done to carbon-relay-ng, which is becoming an increasingly versatile and useful tool.
  • In part 2, which you are reading now, I'm going to describe recent progress, share more info about my setup, testing results, state of affairs, and ideas for future work

Progress made

  • InfluxDB saw two major releases:
    • 0.7 (and followups), which was mostly about some needed features and bug fixes
    • 0.8 was all about bringing some major refactorings in the hands of early adopters/testers: support for multiple storage engines, configurable shard spaces, rollups and retention schemes. There was some other useful stuff like speed and robustness improvements for the graphite input plugin (by yours truly) and various things like regex filtering for 'list series'. Note that a bunch of older bugs remained open throughout this release (most notably the broken derivative aggregator), and a bunch of new ones appeared. Maybe this is why the release was mostly in the dark. In this context, it's not so bad, because we let graphite-api do all the processing, but if you want to query InfluxDB directly you might hit some roadblocks.
    • An older fix, but worth mentioning: series names can now also contain any character, which means you can easily use metrics2.0 identifiers. This is a welcome relief after having struggled with Graphite's restrictions on metric keys.
  • graphite-api received various bug fixes and support for templating, statsd instrumentation and caching.
    Much of this was driven by graphite-influxdb: the caching allows us to cache metadata and the statsd integration gives us insights into the performance of the steps it goes through of building a graph (getting metadata from InfluxDB, querying InfluxDB, interacting with cache, post processing data, etc).
  • the progress on InfluxDB and graphite-api in turn enabled graphite-influxdb to become faster and simpler (note: graphite-influxdb requires InfluxDB 0.8). Furthermore you can now configure series resolutions (but different retentions per serie is on the roadmap, see State of affairs and what's coming), and of course it also got a bunch of bugfixes.
Because of all these improvements, all involved components are now ready for serious use.

Putting it all together, with docker

Docker probably needs no introduction, it's a nifty tool to build an environment with given software installed, and allows to easily deploy it and run it in isolation. graphite-api-influxdb-docker is a very creatively named project that generates the - also very creatively named - docker image graphite-api-influxdb, which contains graphite-api and graphite-influxdb, making it easy to hook in a customized configuration and get it up and running quickly. This is the recommended way to set this up, and this is what we run in production.

The setup

  • a server running InfluxDB and graphite-api with graphite-influxdb via the docker approach described above:
    dell PowerEdge R610
    24 x Intel(R) Xeon(R) X5660  @ 2.80GHz
    96GB RAM
    perc raid h700
    6x600GB seagate 10k rpm drives in raid10 = 1.6 TB, Adaptive Read Ahead, Write Back, 64 kB blocks, no read caching
    no sharding/shard spaces, compiled from git just before 0.8, using LevelDB (not rocksdb, which is now the default)
    LevelDB max-open-files = 10000 (lsof shows about 30k open files total for the InfluxDB process), LRU 4096m, everything else is default I think.
  • a server running graphite-web, carbon, and whisper:
    dell PowerEdge R710
    16 x Intel(R) Xeon(R) E5640  @ 2.67GHz
    96GB RAM
    perc raid h700
    8x150GB seagate 15k rm in raid5 = 952 GB, Read Ahead, Write Back, 64 kB blocks, no read caching
    MAX_UPDATES_PER_SECOND = 1000  # to sequentialize writes
  • a relay server running carbon-relay-ng that sends the same production load into both. (about 2500 metrics/s, or 150k minutely)
As you can tell, on both machines RAM is vastly over provisioned, and they have lots of cpu available (the difference in cores should be negligible), but the difference in RAID level is important to note: RAID 5 comes with a write penalty. Even though the whisper machine has more, and faster disks, it probably has a disadvantage for writes. Maybe. Haven't done raid stuff in a long time, and I haven't it measured it out.
Clearly you'll need to take the results with a grain of salt, as unfortunately I do not have 2 systems available with the same configuration and their baseline (raw) performance is unknown..
Note: no InfluxDB clustering, see State of affairs and what's coming.

The empirical validation & migration

Once everything was setup and I could confidently send 100% of traffic to InfluxDB via carbon-relay-ng, it was trivial to run our dashboards with a flag deciding which server to go to. This way I have literally been running our graphite dashboards next to each other, allowing us to compare both stacks on:
  • visual differences: after a bunch of work and bug fixing, we got to a point where both dashboards looked almost exactly the same. (note that graphite-api's implementation of certain functions can behave slightly different, see for example this divideSeries bug)
  • speed differences by simply refreshing both pages and watching the PNGs load, with some assistance from firebug's network requests profiler. The difference here was big: graphs served up by graphite-api + InfluxDB loaded considerably faster. A page with 40 graphs or so would load in a few seconds instead of 20-30 seconds (on both first, as well as subsequent hits). This is for our default, 6-hour timeframe views. When cranking the timeframes up to a couple of weeks, graphite-api + InfluxDB was still faster.
Soon enough my colleagues started asking to make graphite-api + InfluxDB the default, as it was much faster in all common cases. I flipped the switch and everybody has been happy.

When loading a page with many dashboards, the InfluxDB machine will occasionally spike up to 500% cpu, though I rarely get to see any iowait (!), even after syncing the block cache (i just realized it'll probably still use the cache for reads after sync?)
The carbon/whisper machine, on the other hand, is always fighting iowait, which could be caused by the raid 5 write amplification but the random io due to the whisper format probably has more to do with it. Via the MAX_UPDATES_PER_SECOND I've tried to linearize writes, with mixed success. But I've never gone to deep into it. So basically comparing write performance would be unfair in these circumstances, I am only comparing reads in these tests. Despite the different storage setups, the Linux block cache should make things fair for reads. Whisper's iowait will handicap the reads, but I always did successive runs with fully loaded PNGs to make sure the block cache was warm for reads.

A "slightly more professional" benchmark

I could have stopped here, but the validation above was not very scientific. I wanted to do a somewhat more formal benchmark, to measure read speeds (though I did not have much time so it had to be quick and easy).
I wanted to compare InfluxDB vs whisper, and specifically how performance scales as you play with parameters such as number of series, points per series, and time range fetched (i.e. amount of points). I posted the benchmark on the InfluxDB mailing list. Look there for all information. I just want to reiterate the conclusion here: I was surprised. Because of the results above, I had assumed that InfluxDB would perform reads noticeably quicker than whisper but this is not the case. (maybe because whisper reads are nicely sequential - it's mostly writes that suffer from the whisper format)
This very much contrasts my earlier findings where the graphite-api+InfluxDB powered dashboards clearly take the lead. I have yet to figure out why this is. Maybe something to do with the performance of graphite-web vs graphite-api itself, gunicorn vs apache, worker configuration, or maybe InfluxDB only starts outperforming whisper as concurrency increases. Some more investigation is definitely needed!

Future benchmarks

The simple benchmark above was very simple to execute, as it only requires influx-cli and whisper-fetch (so you can easily check for yourself), but clearly there is a need to test more realistic scenarios with concurrent reads, and doing some write benchmarks would be nice too.
We should also look into cpu and memory usage. I have had the luxury of being able to completely ignore memory usage, but others seem to notice excessive InfluxDB memory usage.
I would also like to see storage efficiency tests. Last time I checked, using LevelDB I was pretty close to 24B per record (which makes sense because time, seq_no and value are all 64bit values, and each record has those 3 fields). (this was with snappy enabled, so it didn't seem to give much benefit). With whisper, I have files where the file size in Bytes divided by total records comes down to 114, for others 31. I haven't looked much into it but it looks like at least InfluxDB is more storage efficient. Also, whisper explicitly encodes None values of course, with InfluxDB those are implied (and require no space)

conclusion: many tests and benchmarks should happen, but I don't really have time to conduct them. Hopefully other people in the community will take this on.

State of affairs and what's coming

  • InfluxDB typically performs pretty well, but not in all cases. More validation is needed. It wouldn't surprise me at this point if tools like hbase/Cassandra/riak clearly outperform InfluxDB, as long as we keep in mind that InfluxDB is a young project. A year, or two, from now, it'll probably perform much better. (and then again, it's not all about raw performance. InfluxDB's has other strengths)
  • A long time goal which is now a reality: You can use any Graphite dashboard on top of InfluxDB, as long as the data is stored in a graphite-compatible format.. Again, the easiest to get running is via graphite-api-influxdb-docker. There are two issues to be mentioned, though:
  • With the 0.8 release out the door, the shard spaces/rollups/retention intervals feature will start stabilizing, so we can start supporting multiple retention intervals per metric
  • Because InfluxDB clustering is undergoing major changes, and because clustering is not a high priority for me, I haven't needed to worry about this. I'll probably only start looking at clustering somewhere in 2015 because I have more pressing issues.
  • Once the new clustering system and the storage subsystem have matured (sounds like a v1.0 ~ v1.2 to me) we'll get more speed improvements and robustness. Most of the integration work is done, it's just a matter of doing smaller improvements, bug fixes and waiting for InfluxDB to become better. Maintaining this stack aside, I personally will start focusing more on:
    • per-second resolution in our data feeds, and potentially storage
    • realtime (but basic) anomaly detection, realtime graphs for some key timeseries. Adrian Cockcroft had an inspirational piece in his Monitorama keynote about how alerts from timeseries should trigger within seconds.
    • Mozilla's awesome heka project (this heka video is great), which should help a lot with the above. Also looking at Etsy's kale stack for anomaly detection
    • metrics 2.0 and making sure metrics 2.0 works well with InfluxDB. Up to now I find the series / columns as a data model too limiting and arbitrary, it could be so much more powerful, ditto for the query language.
  • Can we do anything else to make InfluxDB (+graphite) faster? Yes!
    • Long term, of course, InfluxDB should have powerful enough processing functions and query syntax, so that we don't even need a graphite layer anymore.
    • A storage engine optimized for fixed intervals would probably help, we could have the timestamps implicit instead of explicit. And maybe making the sequence number field optional. Each of these fields currently consumes 1/3 of the record... The sequence number field is not only useless in the Graphite use case, I've also rarely seen people make use of this in other use cases. Not storing the values as 64bit floats would help too. Finally we could have InfluxDB have fill in None values without it doing "group by" (timeframe consolidation)
    • Then of course, there are projects to replace graphite-web/graphite-api with a Go codebase: graphite-ng and carbonapi. the latter is more production ready, but depends on some custom tooling and io using protobufs. But it performs an order of magnitude better than the python api server! I haven't touched graphite-ng in a while, but hopefully at some point I can take it up again
  • Another thing to keep in mind when switching to graphite-api + InfluxDB: you loose the graphite composer. I have a few people relying on this, so I can either patch it to talk to graphite-api (meh), separate it out (meh) or replace it with a nicer dashboard like tessera, grafana or descartes. (or Graph-Explorer, but it can be a bit too much of a paradigm shift).
  • some more InfluxDB stuff I'm looking forward to:
    • binary protocol and result streaming (faster communication and responses!) (the latter might not get implemented though)
    • "list series" speed improvements (if metadata querying gets fast enough, we won't need ES anymore for metrics2.0 index)
    • InfluxDB instrumentation so we actually start getting an idea of what's going on in the system, a lot of the testing and troubleshooting is still in the dark.
  • Tracking exceptions in graphite-api is much harder than it should be. Currently there's no way to display exceptions to the user (in the http response) or to even log them. So sometimes you'll get http 500 responses and don't know why. You can use the sentry integration which works all right, but is clunky. Hopefully this will be addressed soon.


The graphite-influxdb stack works and is ready for general consumption. It's easy to install and operate, and performs well. It is expected that InfluxDB will over time mature and ultimately meet all my requirements of the ideal backend. It definitely has a long way to go. More benchmarks and tests are needed. Keep in mind that we're not doing large volumes of metrics. For small/medium shops this solution should work well, but on larger scales you will definitely run into issues. You might conclude that InfluxDB is not for you (yet) (there are alternative projects, after all).

Finally, a closing thought:
Having graphs and dashboards that look nice and load fast is a good thing to have, but keep in mind that graphs and dashboards should be a last resort. It's a solution if all else fails. The fewer graphs you need, the better you're doing.
How can you avoid needing graphs? Automatic alerting on your data.

I see graphs as a temporary measure: they provide headroom while you develop an understanding of the operational behavior of your infrastructure, conceive a model of it, and implement the alerting you need to do troubleshooting and capacity planning. Of course, this process consumes more resources (time and otherwise), and these expenses are not always justifiable, but I think this is the ideal case we should be working towards.

Either way, good luck and have fun!

Posts for Saturday, September 20, 2014


Graphite & Influxdb intermezzo: migrating old data and a more powerful carbon relay

Migrating data from whisper into InfluxDB

"How do i migrate whisper data to influxdb" is a question that comes up regularly, and I've always replied it should be easy to write a tool to do this. I personally had no need for this, until a recent small influxdb outage where I wanted to sync data from our backup server (running graphite + whisper) to influxdb, so I wrote a script:
<style>.highlight .hll { background-color: #ffffcc } .highlight { background: #f8f8f8; } .highlight .c { color: #408080; font-style: italic } /* Comment */ .highlight .err { border: 1px solid #FF0000 } /* Error */ .highlight .k { color: #008000; font-weight: bold } /* Keyword */ .highlight .o { color: #666666 } /* Operator */ .highlight .cm { color: #408080; font-style: italic } /* Comment.Multiline */ .highlight .cp { color: #BC7A00 } /* Comment.Preproc */ .highlight .c1 { color: #408080; font-style: italic } /* Comment.Single */ .highlight .cs { color: #408080; font-style: italic } /* Comment.Special */ .highlight .gd { color: #A00000 } /* Generic.Deleted */ .highlight .ge { font-style: italic } /* Generic.Emph */ .highlight .gr { color: #FF0000 } /* Generic.Error */ .highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ .highlight .gi { color: #00A000 } /* Generic.Inserted */ .highlight .go { color: #888888 } /* Generic.Output */ .highlight .gp { color: #000080; font-weight: bold } /* Generic.Prompt */ .highlight .gs { font-weight: bold } /* Generic.Strong */ .highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ .highlight .gt { color: #0044DD } /* Generic.Traceback */ .highlight .kc { color: #008000; font-weight: bold } /* Keyword.Constant */ .highlight .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */ .highlight .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */ .highlight .kp { color: #008000 } /* Keyword.Pseudo */ .highlight .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */ .highlight .kt { color: #B00040 } /* Keyword.Type */ .highlight .m { color: #666666 } /* Literal.Number */ .highlight .s { color: #BA2121 } /* Literal.String */ .highlight .na { color: #7D9029 } /* Name.Attribute */ .highlight .nb { color: #008000 } /* Name.Builtin */ .highlight .nc { color: #0000FF; font-weight: bold } /* Name.Class */ .highlight .no { color: #880000 } /* Name.Constant */ .highlight .nd { color: #AA22FF } /* Name.Decorator */ .highlight .ni { color: #999999; font-weight: bold } /* Name.Entity */ .highlight .ne { color: #D2413A; font-weight: bold } /* Name.Exception */ .highlight .nf { color: #0000FF } /* Name.Function */ .highlight .nl { color: #A0A000 } /* Name.Label */ .highlight .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */ .highlight .nt { color: #008000; font-weight: bold } /* Name.Tag */ .highlight .nv { color: #19177C } /* Name.Variable */ .highlight .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */ .highlight .w { color: #bbbbbb } /* Text.Whitespace */ .highlight .mf { color: #666666 } /* Literal.Number.Float */ .highlight .mh { color: #666666 } /* Literal.Number.Hex */ .highlight .mi { color: #666666 } /* Literal.Number.Integer */ .highlight .mo { color: #666666 } /* Literal.Number.Oct */ .highlight .sb { color: #BA2121 } /* Literal.String.Backtick */ .highlight .sc { color: #BA2121 } /* Literal.String.Char */ .highlight .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */ .highlight .s2 { color: #BA2121 } /* Literal.String.Double */ .highlight .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */ .highlight .sh { color: #BA2121 } /* Literal.String.Heredoc */ .highlight .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */ .highlight .sx { color: #008000 } /* Literal.String.Other */ .highlight .sr { color: #BB6688 } /* Literal.String.Regex */ .highlight .s1 { color: #BA2121 } /* Literal.String.Single */ .highlight .ss { color: #19177C } /* Literal.String.Symbol */ .highlight .bp { color: #008000 } /* Name.Builtin.Pseudo */ .highlight .vc { color: #19177C } /* Name.Variable.Class */ .highlight .vg { color: #19177C } /* Name.Variable.Global */ .highlight .vi { color: #19177C } /* Name.Variable.Instance */ .highlight .il { color: #666666 } /* Literal.Number.Integer.Long */</style>
# whisper dir without trailing slash.
start=$(date -d 'sep 17 6am' +%s)
end=$(date -d 'sep 17 12pm' +%s)
pipe_path=$(mktemp -u)
mkfifo $pipe_path
function influx_updater() {
    influx-cli -db $db -async < $pipe_path
influx_updater &
while read wsp; do
  series=$(basename ${wsp//\//.} .wsp)
  echo "updating $series ..." --from=$start --until=$end $wsp_dir/$wsp.wsp | grep -v 'None$' | awk '{print "insert into \"'$series'\" values ("$1"000,1,"$2")"}' > $pipe_path
done < <(find $wsp_dir -name '*.wsp' | sed -e "s#$wsp_dir/##" -e "s/.wsp$//")

It relies on the recently introduced asynchronous inserts feature of influx-cli - which commits inserts in batches to improve the speed - and the whisper-fetch tool.
You could probably also write a Go program using the unofficial whisper-go bindings and the influxdb Go client library. But I wanted to keep it simple. Especially when I found out that whisper-fetch is not a bottleneck: starting whisper-fetch, and reading out - in my case - 360 datapoints of a file always takes about 50ms, whereas InfluxDB at first only needed a few ms to flush hundreds of records, but that soon increased to seconds.
Maybe it's a bug in my code, I didn't test this much, because I didn't need to; but people keep asking for a tool so here you go. Try it out and maybe you can fix a bug somewhere. Something about the write performance here must be wrong.

A more powerful carbon-relay-ng

carbon-relay-ng received a bunch of love and has been a great help in my graphite+influxdb experiments.

Here's what changed:
  • First I made it so that you can adjust routes at runtime while data is flowing through, via a telnet interface.
  • Then Paul O'Connor built an embedded web interface to manage your routes in an easier and prettier way (pictured above)
  • The relay now also emits performance metrics via statsd (I want to make this better by using go-metrics which will hopefully get expvar support at some point - any takers?).
  • Last but not least, I borrowed the diskqueue code from NSQ so now we can also spool to disk to bridge downtime of endpoints and re-fill them when they come back up
Beside our metrics storage, I also plan to put our anomaly detection (currently playing with heka and kale) and carbon-tagger behind the relay, centralizing all routing logic, making things more robust, and simplifying our system design. The spooling should also help to deploy to our metrics gateways at other datacenters, to bridge outages of datacenter interconnects.

I used to think of carbon-relay-ng as the python carbon-relay but on steroids, now it reminds me more of something like nsqd but with an ability to make packet routing decisions by introspecting the carbon protocol,
or perhaps Kafka but much simpler, single-node (no HA), and optimized for the domain of carbon streams.
I'd like the HA stuff though, which is why I spend some of my spare time figuring out the intricacies of the increasingly popular raft consensus algorithm. It seems opportune to have a simpler Kafka-like thing, in Go, using raft, for carbon streams. (note: InfluxDB might introduce such a component, so I'm also a bit waiting to see what they come up with)

Reminder: notably missing from carbon-relay-ng is round robin and sharding. I believe sharding/round robin/etc should be part of a broader HA design of the storage system, as I explained in On Graphite, Whisper and InfluxDB. That said, both should be fairly easy to implement in carbon-relay-ng, and I'm willing to assist those who want to contribute it.

Posts for Saturday, August 30, 2014


Showing return code in PS1

If you do daily management on Unix/Linux systems, then checking the return code of a command is something you’ll do often. If you do SELinux development, you might not even notice that a command has failed without checking its return code, as policies might prevent the application from showing any output.

To make sure I don’t miss out on application failures, I wanted to add the return code of the last executed command to my PS1 (i.e. the prompt displayed on my terminal).
I wasn’t able to add it to the prompt easily – in fact, I had to use a bash feature called the prompt command.

When the PROMPT_COMMMAND variable is defined, then bash will execute its content (which I declare as a function) to generate the prompt. Inside the function, I obtain the return code of the last command ($?) and then add it to the PS1 variable. This results in the following code snippet inside my ~/.bashrc:

export PROMPT_COMMAND=__gen_ps1
function __gen_ps1() {
  local EXITCODE="$?";
  # Enable colors for ls, etc.  Prefer ~/.dir_colors #64489
  if type -P dircolors >/dev/null ; then
    if [[ -f ~/.dir_colors ]] ; then
      eval $(dircolors -b ~/.dir_colors)
    elif [[ -f /etc/DIR_COLORS ]] ; then
      eval $(dircolors -b /etc/DIR_COLORS)
  if [[ ${EUID} == 0 ]] ; then
    PS1="RC=${EXITCODE} \[\033[01;31m\]\h\[\033[01;34m\] \W \$\[\033[00m\] "
    PS1="RC=${EXITCODE} \[\033[01;32m\]\u@\h\[\033[01;34m\] \w \$\[\033[00m\] "

With it, my prompt now nicely shows the return code of the last executed command. Neat.

Edit: Sean Patrick Santos showed me my utter failure in that this can be accomplished with the PS1 variable immediately, without using the overhead of the PROMPT_COMMAND. Just make sure to properly escape the $ sign which I of course forgot in my late-night experiments :-(.

OpenSSH + 2 and 3 factor auth

Posts for Friday, August 29, 2014


Gentoo Hardened august meeting

Another month has passed, so we had another online meeting to discuss the progress within Gentoo Hardened.

Lead elections

The yearly lead elections within Gentoo Hardened were up again. Zorry (Magnus Granberg) was re-elected as project lead so doesn’t need to update his LinkedIn profile yet ;-)


blueness (Anthony G. Basile) has been working on the uclibc stages for some time. Due to the configurable nature of these setups, many /etc/portage files were provided as part of the stages, which shouldn’t happen. Work is on the way to update this accordingly.

For the musl setup, blueness is also rebuilding the stages to use a symbolic link to the dynamic linker (/lib/ as recommended by the musl maintainers.

Kernel and grsecurity with PaX

A bug has been submitted which shows that large binary files (in the bug, a chrome binary with debug information is shown to be more than 2 Gb in size) cannot be pax-mark’ed, with paxctl informing the user that the file is too big. The problem is when the PAX marks are in ELF (as the application mmaps the binary) – users of extended attributes based PaX markings do not have this problem. blueness is working on making things a bit more intelligent, and to fix this.


I have been making a few changes to the SELinux setup:

  • The live ebuilds (those with version 9999 which use the repository policy rather than snapshots of the policies) are now being used as “master” in case of releases: the ebuilds can just be copied to the right version to support the releases. The release script inside the repository is adjusted to reflect this as well.
  • The SELinux eclass now supports two variables, SELINUX_GIT_REPO and SELINUX_GIT_BRANCH, which allows users to use their own repository, and developers to work in specific branches together. By setting the right value in the users’ make.conf switching policy repositories or branches is now a breeze.
  • Another change in the SELinux eclass is that, after the installation of SELinux policies, we will check the reverse dependencies of the policy package and relabel the files of these packages. This allows us to only have RDEPEND dependencies towards the SELinux policy packages (if the application itself does not otherwise link with libselinux), making the dependency tree within the package manager more correct. We still need to update these packages to drop the DEPEND dependency, which is something we will focus on in the next few months.
  • In order to support improved cooperation between SELinux developers in the Gentoo Hardened team – perfinion (Jason Zaman) is in the queue for becoming a new developer in our mids – a coding style for SELinux policies is being drafted up. This is of course based on the coding style of the reference policy, but with some Gentoo specific improvements and more clarifications.
  • perfinion has been working on improving the SELinux support in OpenRC (release 0.13 and higher), making some of the additions that we had to make in the past – such as the selinux_gentoo init script – obsolete.

The meeting also discussed a few bugs in more detail, but if you really want to know, just hang on and wait for the IRC logs ;-) Other usual sections (system integrity and profiles) did not have any notable topics to describe.

Posts for Monday, August 25, 2014

flashing android mobiles on gentoo

This is just a quick tip in case you ever want to flash a mobile phone on gentoo.

If you look at the cyanogenmod howto [1] (in my case for a nexus s) you'll see that you need the tools "adb" and "fastboot" which usually comes with the android sdk. Naturally the howto suggests you to install this sdk, which isn't even available on gentoo.
However if you don't want java and all it's other dependencies on your computer (which is required for the sdk) there is package which installs only those two needed tools. It's called dev-util/android-tools - and it's in portage :)

This is all you need:
* dev-util/android-tools
Available versions: (~)0_p20130123
Description: Android platform tools (adb and fastboot)


Posts for Tuesday, August 19, 2014


Switching to new laptop

I’m slowly but surely starting to switch to a new laptop. The old one hasn’t completely died (yet) but given that I had to force its CPU frequency at the lowest Hz or the CPU would burn (and the system suddenly shut down due to heat issues), and that the connection between the battery and laptop fails (so even new battery didn’t help out) so I couldn’t use it as a laptop… well, let’s say the new laptop is welcome ;-)

Building Gentoo isn’t an issue (having only a few hours per day to work on it is) and while I’m at it, I’m also experimenting with EFI (currently still without secure boot, but with EFI) and such. Considering that the Gentoo Handbook needs quite a few updates (and I’m thinking to do more than just small updates) knowing how EFI works is a Good Thing ™.

For those interested – the EFI stub kernel instructions in the article on the wiki, and also in Greg’s wonderful post on booting a self-signed Linux kernel (which I will do later) work pretty well. I didn’t try out the “Adding more kernels” section in it, as I need to be able to (sometimes) edit the boot options (which isn’t easy to accomplish with EFI stub-supporting kernels afaics). So I installed Gummiboot (and created a wiki article on it).

Lots of things still planned, so little time. But at least building chromium is now a bit faster – instead of 5 hours and 16 minutes, I can now enjoy the newer versions after little less than 40 minutes.

Posts for Sunday, August 10, 2014

jumping directly into found results in menuconfig

For those who still use menuconfig for configuring their kernel - there's a neat trick which let you jump directly into a found result.

For example you would like to add a new driver. Usually you go into menuconfig and start searching for it with the "/" shortcut. What you probably not know, after you found your module - like you searched for the "NetXen Multi port Gigabit Ehernet NIC" with just searching for "xen" - you can go directly to the particular config via it's number shortcut:
Search result for "xen"

Notice this line:

The "(5)" is the shortcut. Just press the number 5 on your keyboard and you'll jump directly into the QLogic devices config.
For every found entry there is a number shortcut which let you directly jump into the given config. If you go back with esc-esc <esc><esc>you also go back to the search result.</esc></esc>

I think not many people know this trick and i hope someone can use it for further kernel builds ;)

Posts for Saturday, August 9, 2014


Some changes under the hood

In between conferences, technical writing jobs and traveling, we did a few changes under the hood for SELinux in Gentoo.

First of all, new policies are bumped and also stabilized (2.20130411-r3 is now stable, 2.20130411-r5 is ~arch). These have a few updates (mergers from upstream), and r5 also has preliminary support for tmpfiles (at least the OpenRC implementation of it), which is made part of the selinux-base-policy package.

The ebuilds to support new policy releases now are relatively simple copies of the live ebuilds (which always contain the latest policies) so that bumping (either by me or other developers) is easy enough. There’s also a release script in our policy repository which tags the right git commit (the point at which the release is made), creates the necessary patches, uploads them, etc.

One of the changes made is to “drop” the BASEPOL variable. In the past, BASEPOL was a variable inside the ebuilds that pointed to the right patchset (and base policy) as we initially supported policy modules of different base releases. However, that was a mistake and we quickly moved to bumping all policies with every releaes, but kept the BASEPOL variable in it. Now, BASEPOL is “just” the ${PVR} value of the ebuild so no longer needs to be provided. In the future, I’ll probably remove BASEPOL from the internal eclass and the selinux-base* packages as well.

A more important change to the eclass is support for the SELINUX_GIT_REPO and SELINUX_GIT_BRANCH variables (for live ebuilds, i.e. those with the 9999 version). If set, then they pull from the mentioned repository (and branch) instead of the default hardened-refpolicy.git repository. This allows for developers to do some testing on a different branch easily, or for other users to use their own policy repository while still enjoying the SELinux integration support in Gentoo through the sec-policy/* packages.

Finally, I wrote up a first attempt at our coding style, heavily based on the coding style from the reference policy of course (as our policy is still following this upstream project). This should allow the team to work better together and to decide on namings autonomously (instead of hours of discussing and settling for something as silly as an interface or boolean name ;-)

Posts for Friday, August 8, 2014

The Jamendo experiment – “week” 1

As forecast in a previous blog post, this is the first "weekly" report from my Jamendo experiment. In the first part I will talk a bit about the player that I use (Amarok), after that will be a short report on where I get my music fix now and how it fares and in the end I will introduce some artists and albums that I found on Jamendo and like.

Amarok 2.0.2 sadly has a bug that makes it lack some Jamendo albums. This makes searching and playing Jamendo albums directly from Amarok a bit less then perfect and forces me to still use Firefox (and Adobe Flash) to browse music on Jamendo. Otherwise Amarok with its version 2.x has become an amazing application or even platform, if you will, not only for playing and organising, but also for discovering new music. You can even mix in the same playlist your local collection with tracks from web services and even streams.

Most of the music I got directly from Jamendo, a bit less I listened online from Magnatune and the rest was streams from Last.FM (mostly from my recommendations). As far as music on Jamendo and Magnatune – both offer almost exclusively CC licensed music – I honestly found it equally as good, if not better, then what conservative record labels and stations offer. This could in part be because of my music taste, but even so, I am rather picky with music. As far as the quality of the sound is concerned, being able to download music in Ogg/Vorbis (quality 7) made me smile and my ears as well. If only I had a better set of headphones!

Now here's the list of artists that I absolutely must share:

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

Jimmy the Hideous Penguin – Jimmy Penguin is by far my absolute favorite artist right now! His experimental scratching style over piano music is just godly to my ears – the disrhythmia that his scratching brings over the standard hip hop beats, piano and/or electronica is just genius! The first album that made me fall in love was Jimmy Penguin's New Ideas – it starts with six tracks called ff1 to ff6 with already the first one (ff1) showing a nice melange of broken sampling layered with a melody and even over that lies some well placed scratching. The whole album is amazing! From the previously mentioned ff* tracks, I would especially like to put into the limelight apart from ff1, then also ff3 and ff4. The ff6 (A Long Way to Go) and Polish Jazz Thing bare some jazz elements as well, while Fucking ABBA feels like flirting with R&B/UK garage. On the other hand the album Split Decisions has more electronic elements in it and feels a bit more meditative, if you will. The last of his albums that I looked at was Summer Time, which I have not listened to thoroughly enough, but so far I like it a lot and it's nice to see Jimmy Penguin take on even more styles, as the track Jimmy Didn't Name It has some unmistakable Asian influences.

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

No Hair on Head – very enjoyable lounge/chillout electronica. Walking on Light is the artist's first album and is a collection of some his tracks that he made in the past 5 years. It's great to see that outside mainstream artists are still trying to make albums that make sense – consistent style, but still diverse enough – and this album is just such. The first track Please! is not a bad start into the album, Inducio is also a nice lively track, but I what I think could be hits are the tracks Anywhere You Want and Fiesta en Bogotá – the first one starts rather standard, but then develops into a very nice pop-ish, almost house-like summery electronic song with tongue-in-cheek lyrics; the latter features an accordion and to me feels somehow like driving through Provence or Karst (although Bogotá lies actually in Columbia).

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

Electronoid – great breakbeat! If you like Daft Punk's album Homework or less popular tracks by the Chemical Brothers, you will most probably enjoy Electronoid (album) as well.

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

Morning Boy— great mix of post punk with pop-ish elements. On their album For us, the drifters. For them, the Bench, the song Maryland reminds me of Dinosaur Jr., while Whatever reminds me of Joan of Arc with added pop. Although All Your Sorrows is probably the track I like best so far – it just bursts with positive attitude while still being somewhat mellow.

Bilk (archived) – a fast German pop punk with female vocals that limits on the Neue Deutsche Welle music movement from the 80's. Their album Ich will hier raus (archived) is not bad and might even compare to more known contemporary artists like Wir sind Helden. Update: Sadly they removed themselves from Jamendo, they have their own website now, but unfortunately there is no licensing info available about the music.

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

Ben Othman – so far I have listened to two of his albums – namely Lounge Café Tunis "Intellectuel" and Lounge Café Tunis "Sahria" – they consist of good lounge/chillout music with at times very present Arabic influences.

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

Silence – this seems like a very popular artist, but so far I only managed to skim through the album L'autre endroit. It seems like a decent mix of trip-hop with occasional electric guitars and other instruments. Sometimes it bares elements of IDM and/or dark or industrial influences. I feel it is too early for me to judge if it conforms my taste, but it looks like an artist to keep an eye on.

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

Project Divinity – enjoyable, very calm ambiental new age music. The mellowness and openness of the album Divinity is very easy to the ears and cannot be anything else then calming.

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

SoLaRis – decent goatrance, sometimes wading even into the dark psytrance waters.

<iframe frameborder="0" height="315" id="widget" scrolling="no" src=";layout=standard&amp;manualWidth=400&amp;width=480&amp;theme=light&amp;highlight=0&amp;tracklist=true&amp;tracklist_n=4&amp;embedCode=" style="width: 480px; height: 315px; display: block; margin: auto;" width="480"></iframe>

Team9 – after listening to some of their tracks on Jamendo, I decided to download their full album We Don't Disco (for free, under CC-BY-SA license) from their (archived) homepage. Team9 is more known for their inventive remixes of better known artists' songs, but their own work at least equally as amazing! They describe themselves as "melodic, ambient and twisted" and compare themselves to "Vangelis and Jean Michel Jarre taking Royksopp and Fad Gadget out the back of the kebab shop for a smoke" – both descriptions suit them very well. The whole album is great, maybe the title track We Don't Disco Like We Used To and the track _Aesthetic Athletics _stand out a bit more because they feel a bit more oldskool and disco-ish then the rest of them, but quality-wise the rest of the tracks is just as amazing!

As you can see, listening only to free (as in speech, not only as in beer) music is not only possible, but quite enjoyable! There is a real alternative out there! Tons of great artists out there are just waiting to be listened to – that ultimately is what music is all about!

hook out → going to bed…

Posts for Wednesday, August 6, 2014

How to write your Pelican-powered blog using ownCloud and WebDAV

Originally this HowTo was part of my last post – a lengthy piece about how I migrated my blog to Pelican. As this specific modification might be more interesting than reading the whole thing, I decided to fork and extend it.

What and why?

What I was trying to do is to be able to add, edit and delete content from Pelican from anywhere, so whenever inspiration strikes I can simply take out my phone or open up a web browser and create a rough draft. Basically a make-shift mobile and desktop blogging app.

I decided to that the easiest this to do this by accessing my content via WebDAV via ownCloud that runs on the same server.

Why not Git and hooks?

The answer is quite simple: because I do not need it and it adds another layer of complication.

I know many use Git and its hooks to keep track of changes as well as for backups and for pushing from remote machines onto the server. And that is a very fine way of running it, especially if there are several users committing to it.

But for the following reasons, I do not need it:

  • I already include this page with its MarkDown sources, settings and the HTML output in my standard RSnapshot backup scheme of this server, so no need for that;
  • I want to sometimes draft my posts on my mobile and Git and Vim on a touch-screen are just annoying to use;
  • this is a personal blog, so the distributed VCS side of Git is just an overhead really;
  • there is no added benefit to sharing the MarkDown sources on-line, if all the HTML sources are public anyway.

Setting up the server

Pairing up Pelican and ownCloud

In ownCloud it is very easy to mount external storage, and a folder local to the server is still considered “extrenal” as it is outside of ownCloud. Needless to say, there is a nice GUI for that.

Once you open up the Admin page in ownCloud, you will see the External Storage settings. For security reasons only admins can mount a local folder, so if you aren’t one, you will not see Local as an option and you will have to ask your friendly ownCloud sysAdmin to add the folder from his Admin page for you.

If that is not an option, on a GNU/Linux server there is an easy, yet hackish solution as well: just link Pelican’s content folder into your ownCloud user’s file system – e.g:

ln -s /var/www/ /var/www/owncloud/htdocs/data/hook/files/Blog

In order to have the files writeable over WebDAV, they need to have write permission from the user that PHP and web-server are running under – e.g.:

chown -R nginx:nginx /var/www/owncloud/htdocs/data/hook/files/Blog/

Automating page generation and ownership

To have pages constantly automatically generated, there is a option to call pelican --autoreload and I did consider turning it into an init script, but decided against it for two reasons:

  • it consumes too much CPU power just to check for changes;
  • as on my poor ARM server a full (re-)generation of this blog takes about 6 minutes2, I did not want to hammer my system for every time I save a minor change.

What I did instead was to create an fcronjob to (re-)generate the website every night at 3 in the morning (and send a mail to root’s default address), under the condition that there blog posts have either been changed in content or added since yesterday:

%nightly,mail * 3 cd /var/www/ && posts=(content/**/*.markdown(Nm-1)); if (( $#posts )) LC_ALL="en_GB.utf8" make html

Update: the above command is changed to use Zsh; for the old sh version, use:

%nightly,mail * 3 cd /var/www/ && [[ `find content -iname "*.markdown" -mtime -1` != "" ]] && LC_ALL="en_GB.utf8" make html

In order to have the file permissions on the content directory always correct for ownCloud (see above), I changed the Makefile a bit. The relevant changes can be seen below:

    chown -R nginx:nginx $(INPUTDIR)

    [ ! -d $(OUTPUTDIR) ] || rm -rf $(OUTPUTDIR)

    chown -R nginx:nginx $(INPUTDIR)

E-mail draft reminder

Not directly relevant, but still useful.

In order not to forget any drafts unattended, I have also set up an FCron job to send me an e-mail with a list of all unfinished drafts to my private address.

It is a very easy hack really, but I find it quite useful to keep track of things – find the said fcronjob below:

%midweekly,mailto( * * cd /var/www/ && ack "Status: draft"

Client software


As a mobile client I plan to use ownNotes, because it runs on my Nokia N91 and supports MarkDown highlighting out-of-the-box.

All I needed to do in ownNotes is to provide it with my ownCloud log-in credentials and state Blog as the "Remote Folder Name" in the preferences.

But before I can really make use of ownNotes, I have to wait for it to starts using properly managing file-name extensions.

ownCloud web interface

Since ownCloud includes a webGUI text editor with MarkDown highlighting out of the box, I sometimes use that as well.

An added bonus is that the Activity feed of ownCloud keeps a log of when which file changed or was added.

It does not seem possible yet to collaboratively edit files other than ODT in ownCloud’s webGUI, but I imagine that might be the case in the future.

Kate via WebDAV

In many other desktop environments it is child’s play to add a WebDAV remote folder — just adding a link to the file manager should be enough, e.g.: webdavs://

KDE’s Dolphin makes it easier for you, because all you have to do is select RemoteAdd remote folder and if you already have a connection to your ownCloud with some other service (e.g. Zanshin and KOrganizer for WebCal), it will suggest all the details to you, if you choose Recent connection.

Once you have the remote folder added, you can use it transparently all over KDE. So when you open up Kate, you can simply navigate the remote WebDAV folders, open up the files, edit and save them as if they were local files. It really is as easy as that! ☺

Note: I probably could have also used the more efficient KIO FISH, but I have not bothered with setting up a more complex permission set-up for such a small task. For security reasons it is not possible to log in via SSH using the same user the web server runs under.

SSH and Vim

Of course, it is also possible to ssh to the web server, su to the correct user, edit the files with Vim and let FCron and Make file make sure the ownership is done appropriately.

hook out → back to studying Arbitration law

  1. Yes, I am well aware you can run Vim and Git on MeeGo Harmattan and I do use it. But Vim on a touch-screen keyboard is not very fun to use for brainstorming. 

  2. At the time of writing this blog includes 343 articles and 2 pages, which took Pelican 440 seconds to generate on my poor little ARM server (on a normal load). 

Posts for Tuesday, August 5, 2014

kmscon - next generation virtual terminals

KMSCON is a simple terminal emulator based on linux kernel mode setting (KMS). It can replace the in-kernel VT implementation with a userspace console. It's a pretty new project and still very experimental.
Even though gentoo provides a ebuild its rather rudiment and it's better to use the live ebuild form [1] plus the libtsm package, which is needed for kmscon, from [2]. Personally i've added those ebuilds into my private overlay.

Don't forget to unmask/keyword the live ebuild:
# emerge -av =sys-apps/kmscon-9999

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild R *] sys-apps/kmscon-9999::local USE="drm fbdev gles2 optimizations pango unicode -debug -doc -multiseat -pixman -static-libs -systemd" 0 kB

Total: 1 package (1 reinstall), Size of downloads: 0 kB

After successfully emerging kmscon it's pretty simple to start a new vt with (as root):
# kmscon --vt=8 --xkb-layout=de --hwaccel

This starts kmscon on vt8 with hardware-accel on and a german keyboard layout.

If your experimental you can add (or replace) an additional virtual terminal to your inittab. A line like following should suffice to start kmscon everytime you boot your system.
c11:2345:respawn:/usr/bin/kmscon --vt=8 --xkb-layout=de --hwaccel

I've tested it with my amd cards (r600g and radeonsi) and it worked with some minor output corruptions. However, in certain cases it works already faster than agetty, for example printing dmesg output. So far it looks really promising, sadly development seems to be really slow. You'll find the git repository here [3]


Posts for Friday, August 1, 2014


Gentoo Hardened July meeting

I failed to show up myself (I fell asleep – kids are fun, but deplete your energy source quickly), but that shouldn’t prevent me from making a nice write-up of the meeting.


GCC 4.9 gives some issues with kernel compilations and other components. Lately, breakage has been reported with GCC 4.9.1 compiling MySQL or with debugging symbols. So for hardened, we’ll wait this one out until the bugs are fixed.

For GCC 4.10, the –enable-default-pie patch has been sent upstream. If that is accepted, the SSP one will be sent as well.

In uclibc land, stages are being developed for PPC. This is the final architecture that is often used in embedded worlds that needed support for it in Gentoo, and that’s now being finalized. Go blueness!


A libpcre upgrade broke relabeling operations on SELinux enabled systems. A fix for this has been made part of libselinux, but a little too late, so some users will be affected by the problem. It’s easily worked around (removing the *.bin files in the contexts/files/ directory of the SELinux configuration) and hopefully will never occur again.

The 2.3 userland has finally been stabilized (we had a few dependencies that we were waiting for – and we were a dependency ourselves for other packages as well).

Finally, some thought discussion is being done (not that there’s much feedback on it, but every documented step is a good step imo) on the SELinux policy within Gentoo (and the principles that we’ll follow that are behind it).

Kernel and grsecurity / PaX

Due to some security issues, the Linux kernel sources have been stabilized more rapidly than usual, which left little time for broad validation and regression testing. Updates and fixes have been applied since and new stabilizations occurred. Hopefully we’re now at the right, stable set again.

The C-based install-xattr application (which is performance-wise a big improvement over the Python-based one) is working well in “lab environments” (some developers are using it exclusively). It is included in the Portage repository (if I understand the chat excerpts correctly) but as such not available for broader usage yet.

An update against elfix is made as well as there was a dependency mismatch when building with USE=-ptpax. This will be corrected in elfix-0.9.

Finally, blueness is also working on a GLEP (Gentoo Linux Enhancement Proposal) to export VDB information (especially NEEDED.ELF.2) as this is important for ELF/library graph information (as used by revdep-pax, migrate-pax, etc.). Although Portage already does this, this is not part of the PMS and as such other package managers might not do this (such as Paludis).


Updates on the profiles has been made to properly include multilib related variables and other metadata. For some profiles, this went as easy as expected (nice stacking), but other profiles have inheritance troubles making it much harder to include the necessary information. Although some talks have arised on the gentoo-dev mailinglist about refactoring how Gentoo handles profiles, there hasn’t been done much more than just talking :-( But I’m sure we haven’t heard the last of this yet.


Blueness has added information on EMULTRAMP in the kernel configuration, especially noting to the user that it is needed for Python support in Gentoo Hardened. It is also in the PaX Quickstart document, although this document is becoming a very large one and users might overlook it.

Posts for Thursday, July 31, 2014

The right tool for the job

Every subculture, even most smaller groups establish practices that are typical for said subculture or group. They often emerge within the foundations of the group itself or the background of an influential part of the members. A group of historians will probably tackle problems in a different way than engineers would for example: Where the historians might look for similarities in structure between the current issue and the past, engineers would try to divide the problem up into smaller and smaller units of work, assign them and hope that by assembling all the parts a solution will be created. Obviously the previous example was slightly exaggerated and simplified but you catch my drift. The people or the “culture” a group emerged from influence massively the set of tools the group has to interact with the world.

These tools exist on many levels. They can be physical objects like with a group of mechanics bringing actual tools from their workshops into the group. There are digital tools such as publication software or networked democracy/liquid democracy tools. The tools can be intellectual: Specific methods to process information or analyze things. Social tools can help organize and communicate. The list goes on and on.

Today I want to talk about the intellectual or procedural tools of a certain subculture1 that I do have my run-ins with: The hackers. Not the “let’s break shit and steal money like they do in cheesy movies” type but the “we are fighting for digital civil liberties and free software and crypto for everyone and shit” type. The type that can probably best be defined as: People unwilling to always follow the instructions that things come with, especially technical things.

While the myth of the evil hackers destroying everything still is very powerful especially within mainstream media, that subculture has – even given all the problems and issues raging through that scene2 – gotten kind of a tough job these days. Because we as a society are overwhelmed by our own technical progress.

So we’ve kinda stumbled on this nice thing that some scientists developed to share information and we realized: Wow I can copy all kinds of music and movies I can share Information and publish my own creative works! And others found that thing interesting as well, bolted some – not always3 beautifully designed interfaces and technologies onto that “Internet” thing and used it to sell books and clothes and drugs and bitcoins to a global customer base.

Obviously I simplified things again a little. But there’s no denying that the Internet changed many many aspects of our life with shopping only being one of them. Global companies could suddenly move or spread data (and themselves) to different locations in zero-time circumventing in many cases at least parts of the legal system that was supposed to protect the people against their actions. Established social rules such as copyright or privacy came under pressure. And then there was the intelligence community. What a field trip they had!

All the things that used to be hard to gather, that could only be acquired through deploying agents and time and money, conversations and social graphs and “metadata” could be gathered, stored and queried. Globally. All the time. The legal system supposed to protect the people actually gave them the leverage to store all data they could get their hands on. All for the good of the people and their security.

So here we are with this hot and flaming mess and we need someone, anyone to fix it. To make things ok. So we ask the hackers because they actually know, understand and – more often than many want to admit – build the technology causing problems now. And they tried to come up with solutions.

The hacker subculture is largely and dominantly shaped by a related group of people: Security specialists. To be able to assess and test the security of a technical system or an algorithm you really need to understand it and its environment at a level of detail that eludes many people. The problems the security community have to deal with are cognitively hard and complex, the systems and their interactions and interdependencies growing each day. The fact that those security holes or exploits can also be worth a lot of money to someone with … let’s say flexible ethics also informed the competitiveness of that scene.

So certain methods or MOs developed. One very prominent one that has influenced the hacker culture a lot is the “break shit in a funny way” MO. It goes like this: You have something that people (usually the people selling it) claim to be secure. Let’s say a voting machine or an iris scanner on a new smartphone. In come the hackers. They prod the system, poke it with sticks and tools until they get the voting machine to play pong and the iris scanner to project My Little Pony episodes. They break shit.

This leads to (if you are somewhat tech savvy) very entertaining talks at hacker conferences where the ways of how to break it are displayed. Some jokes at the expense of the developers are thrown in and it usually ends with a patch, a technical solution to the problem, that does at least mitigate the worst problems. Hilarity ensues.

But herein lies the problem. The issues we have with our political system, with the changes that tech brought to the social sphere are not easily decomposed into modules, broken and fixed with some technological patch. Showing that the NSA listens to your stuff, how they do it is all fine and dandy but the technical patch, the bazillion of crypto tools that are released every day don’t address the issues at hand – the political questions, the social questions.

That’s not the fault of the hacker scene really. They did their job, analyzed what happened and sometimes could even provide fixes. But building new social or legal concepts really isn’t in their toolbox. When forced they have to fallback on things such as “whistleblowing” as a catchall which really is no replacement for political theory. Obviously there are hackers who are also political but it’s not genuine to the subculture, nothing belonging to them.

In Germany we can see that every day within the politically … random … actions of the Pirate Party who recruited many of their members from said hacker culture (or related subcultures). They think in systems and patches, talk about “a new operating system for democracy”. Even the wording, the framing shows that they don’t think in political terms but in their established technical phrases. Which again isn’t their fault, it’s what every subculture does.

Hackers can do a lot for our societies. They can help officials or NGOs to better understand technology and maybe even its consequences. They just might not in general be the right people to talk to when it comes to building legal or social solutions.

The different subcultures in a society all contribute different special skill sets and knowledge to the discourse. It’s about bringing all the right people and groups to the table in every phase of the debate. That doesn’t mean that people should be excluded but that certain groups or subcultures should maybe take the lead when it comes to the domains they know a lot about.

Use the right tool for the job.

Header image by: Ivan David Gomez Arce

  1. if it actually is a subculture which we could debate but let’s do that another time
  2. I’m not gonna get into it here, it’s a topic for another text that I’m probably not going to write
  3. as in never

flattr this!

Posts for Friday, July 25, 2014

On whistleblowing

As some might know, I spent the last week in New York attending the HOPE conference. Which was btw. one of the more friendly and diverse conferences I have been to and which I enjoyed a lot not just because of it’s awe inspiring location.

It was not surprising that the session program would put big emphasis on whistleblowing. Edward Snowden’s leaks have pretty much defined the last year when it came to tech-related news. HOPE contextualized those leaks by framing Snowden with the famous US whistleblowers Thomas Drake and Daniel Ellsberg who both have had immense impact with their leaks. Drake had leaked information on NSA programs violating many US laws, Ellsberg had released the “Pentagon papers” proving that the public had been lied to by different US governments when it came to the Vietnam war. Ellsberg, Drake, Snowden. 3 whistleblowers, 3 stories of personal sacrifice and courage1. 3 stories about heroes.

All of them enforced how important better infrastructure for leaks was. How important it was that the hacker community would provide better tools and tutorials that help keeping informers anonymous and protected. How central it was to make OpSec (operations security) easier for journalists and potential whistleblowers. Especially Snowden voiced how well he understood people not leaking anything when faced with the complete destruction of their lives as they know it.

And the community did actually try to deliver. SecureDrop was presented as a somewhat simpler way for journalists to supply a drop site for hot documents and the Minilock project is supposed to make the encryption of files much easier and less error-prone.

But in between the celebration of the courage of individuals and tools helping such individuals something was missing.

Maybe it was the massive presence of Snowden or maybe the constant flow of new details about his leaks but in our focus on and fascination for the whistleblower(s) and their work we as a community have somewhat forgotten to think about politics and policies, about what it actually is that “we” want.

Whistleblowing can be important, can change the world actually. But it is not politics. Whistleblowing can be the emergency brake for political processes and structures. But sadly nothing more.

Just creating some sort of transparency (and one could argue that Snowden’s leak has not really created even that since just a selected elite of journalists is allowed to access the treasure chest) doesn’t change anything really. Look at the Snowden leaks: One year full of articles and columns and angry petitions. But nothing changed. In spite of transparency things are mostly going on as they did before. In fact: Certain governments such as the Germans have talked about actually raising the budget for (counter)intelligence. The position of us as human beings in this cyberphysical world has actually gotten worse.

Simple solutions are really charming. We need a few courageous people. And we can build some tech to lower the courage threshold, tools protecting anonymity. Problem solved, back to the playground. We’ve replaced political theory, structures, activism and debate with one magic word: Whistleblowing. But that’s not how it works.

What happens after the leak? Why do we think that a political system that has created and legitimized the surveillance and intelligence state times upon times would autocorrect itself just because we drop some documents into the world? Daniel Ellsberg called it “telling the truth with documents”. But just telling some truth isn’t enough.

It’s time to stop hiding behind the hope for whistleblowers and their truth. To stop dreaming of a world that would soon be perfect if “the truth” is just out there. That’s how conspiracy nuts think.

“Truth” can be a resource to create ideas and policy from. To create action. But that doesn’t happen automagically and it’s not a job we can just outsource to the media because they know all that weird social and political stuff. Supporting the works of whistleblowers is important and I was happy to see so many initiatives, but they can get us at most a few steps forward on our way to fixing the issues of our time.

Header image by: Kate Ter Haar

  1. I have written about the problem I have with the way Snowden is framed (not him as a person or with his actions) here

flattr this!

Posts for Sunday, July 13, 2014


Anonymous edits in Hellenic Wikipedia from Hellenic Parliament IPs

Inspired from another project called “Anonymous Wikipedia edits from the Norwegian parliament and government offices” I decided to create something similar for the Hellenic Parliament.

I downloaded the XML dumps (elwiki-20140702-pages-meta-history.xml.7z) for the elwiki from The compressed file is less than 600Mb but uncompressing it leads to a 73Gb XML which contains the full history of edits. Then I modified a parser I found on this blog to extract the data I wanted: Page Title, Timestamp and IP.

Then it was easy to create a list that contains all the edits that have been created by Hellenic Parliament IPs ( throughout the History of Hellenic Wikipedia:
The list

Interesting edits

  1. Former Prime Minister “Κωνσταντίνος Σημίτης”
    An IP from inside the Hellenic Parliament tried to remove the following text at least 3 times in 17-18/02/2014. This is a link to the first edit: Diff 1.

    Για την περίοδο 1996-2001 ξοδεύτηκαν 5,2 τρις δρχ σε εξοπλισμούς. Οι δαπάνες του Β` ΕΜΠΑΕ (2001-2006) υπολογίζεται πως έφτασαν τα 6 με 7 τρις δρχ.<ref name="enet_01_08_01">[ ''To κόστος των εξοπλισμών''], εφημερίδα ”Ελευθεροτυπία”, δημοσίευση [[1 Αυγούστου]] [[2001]].</ref>Έπειτα απο τη σύλληψη και ενοχή του Γ.Καντά,υπάρχουν υπόνοιες για την εμπλοκή του στο σκάνδαλο με μίζες από Γερμανικές εταιρίες στα εξοπλιστικά,κάτι το οποίο διερευνάται απο την Εισαγγελία της Βρέμης.

  2. Former MP “Δημήτρης Κωνσταντάρας”
    Someone modified his biography twice. Diff Links: Diff 1 Diff 2.
  3. Former football player “Δημήτρης Σαραβάκος”
    In the following edit someone updated this player’s bio adding that he ‘currently plays in porn films’. Diff link. The same editor seems to have removed that reference later, diff link.
  4. Former MP “Θεόδωρος Ρουσόπουλος”
    Someone wanted to update this MP’s bio and remove some reference of a scandal. Diff link.
  5. The movie “Ραντεβού με μια άγνωστη”
    Claiming that the nude scenes are probably not from the actor named “Έλενα Ναθαναήλ”. Diff link.
  6. The soap opera “Χίλιες και Μία Νύχτες (σειρά)”
    Someone created the first version of the article on this soap opera. Diff Link.
  7. Politician “Γιάννης Λαγουδάκος”
    Someone edited his bio so it seemed that he would run for MP with the political party called “Ανεξάρτητοι Έλληνες”. Diff Link
  8. University professor “Γεώργιος Γαρδίκας”
    Someone edited his profile and added a link for amateur football team “Αγιαξ Αιγάλεω”. Diff Link.
  9. Politician “Λευτέρης Αυγενάκης”
    Someone wanted to fix his bio and upload a file, so he/she added a link from the local computer “C:\Documents and Settings\user2\Local Settings\Temp\ΑΥΓΕΝΑΚΗΣ”. Diff link.
  10. MP “Κώστας Μαρκόπουλος”
    Someone wanted to fix his bio regarding his return to the “Νέα Δημοκρατία” political party. Diff Link.
  11. (Golden Dawn) MP “Νίκος Μιχαλολιάκος”
    Someone was trying to “fix” his bio removing some accusations. Diff Link.
  12. (Golden Dawn) MP “Ηλίας Κασιδιάρης”
    Someone was trying to fix his bio and remove various accusations and incidents. Diff Link 1, Diff Link 2, Diff Link 3.

Who’s done the edits ?
The IP range of the Hellenic Parliament is not only used by MPs but from people working in the parliament as well. Don’t rush to any conclusions…
Oh, and the IP is probably a proxy inside the Parliament.

Threat Model
Not that it matters a lot for MPs and politicians in general, but it’s quite interesting that if someone “anonymously” edits a wikipedia article, wikimedia stores the IP of the editor and provides it to anyone that wants to download the wiki archives. If the IP range is known, or someone has the legal authority within a country to force an ISP to reveal the owner of an IP, it is quite easy to spot the actual person behind an “anonymous” edit. But if someone creates an account to edit wikipedia articles, wikimedia does not publish the IPs of its users, the account database is private. To get an IP of a user, one would need to take wikimedia to courts to force them to reveal that account’s IP address. Since every wikipedia article edit history is available for anyone to download, one is actually “more anonymous to the public” if he/she logs in or creates a (new) account every time before editing an article, than editing the same article without an account. Unless someone is afraid that wikimedia will leak/disclose their account’s IPs.
So depending on their threat model, people can choose whether they want to create (new) account(s) before editing an article or not :)

Similar Projects

  • Parliament WikiEdits
  • congress-edits
  • Riksdagen redigerar
  • Stortinget redigerer
  • AussieParl WikiEdits
  • anon
  • Bonus
    Anonymous edit from “Synaspismos Political Party” (ΣΥΡΙΖΑ) address range for “Δημοκρατική Αριστερά” political party article, changing it’s youth party blog link to the PASOK youth party blog link. Diff Link

    Posts for Wednesday, July 9, 2014


    Segmentation fault when emerging packages after libpcre upgrade?

    SELinux users might be facing failures when emerge is merging a package to the file system, with an error that looks like so:

    >>> Setting SELinux security labels
    /usr/lib64/portage/bin/ line 1112: 23719 Segmentation fault      /usr/sbin/setfiles "${file_contexts_path}" -r "${D}" "${D}"
     * ERROR: dev-libs/libpcre-8.35::gentoo failed:
     *   Failed to set SELinux security labels.

    This has been reported as bug 516608 and, after some investigation, the cause is found. First the quick workaround:

    ~# cd /etc/selinux/strict/contexts/files
    ~# rm *.bin

    And do the same for the other SELinux policy stores on the system (targeted, mcs, mls, …).

    Now, what is happening… Inside the mentioned directory, binary files exist such as file_contexts.bin. These files contain the compiled regular expressions of the non-binary files (like file_contexts). By using the precompiled versions, regular expression matching by the SELinux utilities is a lot faster. Not that it is massively slow otherwise, but it is a nice speed improvement nonetheless.

    However, when pcre updates occur, then the basic structures that pcre uses internally might change. For instance, a number might switch from a signed integer to an unsigned integer. As pcre is meant to be used within the same application run, most applications do not have any issues with such changes. However, the SELinux utilities effectively serialize these structures and later read them back in. If the new pcre uses a changed structure, then the read-in structures are incompatible and even corrupt.

    Hence the segmentation faults.

    To resolve this, Stephen Smalley created a patch that includes PCRE version checking. This patch is now included in sys-libs/libselinux version 2.3-r1. The package also recompiles the existing *.bin files so that the older binary files are no longer on the system. But there is a significant chance that this update will not trickle down to the users in time, so the workaround might be needed.

    I considered updating the pcre ebuilds as well with this workaround, but considering that libselinux is most likely to be stabilized faster than any libpcre bump I let it go.

    At least we have a solution for future upgrades; sorry for the noise.

    Edit: libselinux-2.2.2-r5 also has the fix included.

    Posts for Wednesday, July 2, 2014


    Multilib in Gentoo

    One of the areas in Gentoo that is seeing lots of active development is its ongoing effort to have proper multilib support throughout the tree. In the past, this support was provided through special emulation packages, but those have the (serious) downside that they are often outdated, sometimes even having security issues.

    But this active development is not because we all just started looking in the same direction. No, it’s thanks to a few developers that have put their shoulders under this effort, directing the development workload where needed and pressing other developers to help in this endeavor. And pushing is more than just creating bugreports and telling developers to do something.

    It is also about communicating, giving feedback and patiently helping developers when they have questions.

    I can only hope that other activities within Gentoo and its potential broad impact work on this as well. Kudos to all involved, as well as all developers that have undoubtedly put numerous hours of development effort in the hope to make their ebuilds multilib-capable (I know I had to put lots of effort in it, but I find it is worthwhile and a big learning opportunity).

    Posts for Monday, June 30, 2014


    D-Bus and SELinux

    After a post about D-Bus comes the inevitable related post about SELinux with D-Bus.

    Some users might not know that D-Bus is an SELinux-aware application. That means it has SELinux-specific code in it, which has the D-Bus behavior based on the SELinux policy (and might not necessarily honor the “permissive” flag). This code is used as an additional authentication control within D-Bus.

    Inside the SELinux policy, a dbus permission class is supported, even though the Linux kernel doesn’t do anything with this class. The class is purely for D-Bus, and it is D-Bus that checks the permission (although work is being made to implement D-Bus in kernel (kdbus)). The class supports two permission checks:

    • acquire_svc which tells the domain(s) allowed to “own” a service (which might, thanks to the SELinux support, be different from the domain itself)
    • send_msg which tells which domain(s) can send messages to a service domain

    Inside the D-Bus security configuration (the busconfig XML file, remember) a service configuration might tell D-Bus that the service itself is labeled differently from the process that owned the service. The default is that the service inherits the label from the domain, so when dnsmasq_t registers a service on the system bus, then this service also inherits the dnsmasq_t label.

    The necessary permission checks for the sysadm_t user domain to send messages to the dnsmasq service, and the dnsmasq service itself to register it as a service:

    allow dnsmasq_t self:dbus { acquire_svc send_msg };
    allow sysadm_t dnsmasq_t:dbus send_msg;
    allow dnsmasq_t sysadm_t:dbus send_msg;

    For the sysadm_t domain, the two rules are needed as we usually not only want to send a message to a D-Bus service, but also receive a reply (which is also handled through a send_msg permission but in the inverse direction).

    However, with the following XML snippet inside its service configuration file, owning a certain resource is checked against a different label:

      <associate own=""
                 context="system_u:object_r:dnsmasq_dbus_t:s0" />

    With this, the rules would become as follows:

    allow dnsmasq_t dnsmasq_dbus_t:dbus acquire_svc;
    allow dnsmasq_t self:dbus send_msg;
    allow sysadm_t dnsmasq_t:dbus send_msg;
    allow dnsmasq_t sysadm_t:dbus send_msg;

    Note that only the access for acquiring a service based on a name (i.e. owning a service) is checked based on the different label. Sending and receiving messages is still handled by the domains of the processes (actually the labels of the connections, but these are always the process domains).

    I am not aware of any policy implementation that uses a different label for owning services, and the implementation is more suited to “force” D-Bus to only allow services with a correct label. This ensures that other domains that might have enough privileges to interact with D-Bus and own a service cannot own these particular services. After all, other services don’t usually have the privileges (policy-wise) to acquire_svc a service with a different label than their own label.

    Posts for Sunday, June 29, 2014


    D-Bus, quick recap

    I’ve never fully investigated the what and how of D-Bus. I know it is some sort of IPC, but higher level than the POSIX IPC methods. After some reading, I think I start to understand how it works and how administrators can work with it. So a quick write-down is in place so I don’t forget in the future.

    There is one system bus and, for each X session of a user, also a session bus.

    A bus is governed by a dbus-daemon process. A bus itself has objects on it, which are represented through path-like constructs (like /org/freedesktop/ConsoleKit). These objects are provided by a service (application). Applications “own” such services, and identify these through a namespace-like value (such as org.freedesktop.ConsoleKit).
    Applications can send signals to the bus, or messages through methods exposed by the service. If methods are invoked (i.e. messages send) then the application must specify the interface (such as org.freedesktop.ConsoleKit.Manager.Stop).

    Administrators can monitor the bus through dbus-monitor, or send messages through dbus-send. For instance, the following command invokes the org.freedesktop.ConsoleKit.Manager.Stop method provided by the object at /org/freedesktop/ConsoleKit owned by the service/application at org.freedesktop.ConsoleKit:

    ~$ dbus-send --system --print-reply 

    What I found most interesting however was to query the busses. You can do this with dbus-send although it is much easier to use tools such as d-feet or qdbus.

    To list current services on the system bus:

    ~# qdbus --system

    The numbers are generated by D-Bus itself, the namespace-like strings are taken by the objects. To see what is provided by a particular service:

    ~# qdbus --system org.freedesktop.PolicyKit1

    The methods made available through one of these:

    ~# qdbus --system org.freedesktop.PolicyKit1 /org/freedesktop/PolicyKit1/Authority
    method QDBusVariant org.freedesktop.DBus.Properties.Get(QString interface_name, QString property_name)
    method QVariantMap org.freedesktop.DBus.Properties.GetAll(QString interface_name)
    property read uint org.freedesktop.PolicyKit1.Authority.BackendFeatures
    property read QString org.freedesktop.PolicyKit1.Authority.BackendName
    property read QString org.freedesktop.PolicyKit1.Authority.BackendVersion
    method void org.freedesktop.PolicyKit1.Authority.AuthenticationAgentResponse(QString cookie, QDBusRawType::(sa{sv} identity)
    method void org.freedesktop.PolicyKit1.Authority.CancelCheckAuthorization(QString cancellation_id)
    signal void org.freedesktop.PolicyKit1.Authority.Changed()

    Access to methods and interfaces is governed through XML files in /etc/dbus-1/system.d (or session.d depending on the bus). Let’s look at /etc/dbus-1/system.d/dnsmasq.conf as an example:

    <!DOCTYPE busconfig PUBLIC
     "-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
            <policy user="root">
                    <allow own=""/>
                    <allow send_destination=""/>
            <policy context="default">
                    <deny own=""/>
                    <deny send_destination=""/>

    The configuration mentions that only the root Linux user can ‘assign’ a service/application to the name, and root can send messages to this same service/application name. The default is that no-one can own and send to this service/application name. As a result, only the Linux root user can interact with this object.

    D-Bus also supports starting of services when a method is invoked (instead of running this service immediately). This is configured through *.service files inside /usr/share/dbus-1/system-services/.

    We, the lab rats

    The algorithm constructing your Facebook feed is one of the most important aspects of Facebooks business. Making sure that you see all the things you are interested in while skipping the stuff you don’t care about is key to keeping you engaged and interested in the service. On the other hand Facebook needs to understand how you react to certain types of content to support its actual business (making money from ads or “boosted” posts).

    So it’s no surprise that Facebook is changing an tweaking the algorithm every day. And every new iteration will be released into a small group of the population to check how it changes people’s behavior and engagement. See if it’s a better implementation than the algorithm used before. Human behavior boiled down to a bunch of numbers.

    The kind and amount of data that Facebook sits on is every social scientist’s dream: Social connection, interactions, engagement metrics, and deeply personal content all wrapped up in one neat structured package with a bow on top. And Facebook is basically the only entity with full access: There is no real open set of similar data points to study and understand human behavior from.

    So the obvious happened. Facebook and some scientists worked together to study human behavior. To put it in a nutshell: The picked almost 700000 Facebook users and changed the way their feed worked. Some got more “negative” posts, some more “positive” posts and the scientists measured how that changed people’s behavior (by seeing how their language changed in their own posts). Result: The mood of the things you read does change your own behavior and feeling congruently. Read positive stuff and feel better, read negative stuff and feel worse. This is news because we only know this from direct social interaction not from interaction mediated through the Internet (the result might not surprise people who believe that the Internet and the social interactions in it are real though).

    Many people have criticized this study and for very different reasons, some valid, some not.

    The study is called scientifically unethical: The subjects didn’t know that their behavior was monitored and that they were in an experiment. It is obviously often necessary to leave somewhat in the dark what the actual goal of an experiment is in order to make sure that the results remain untainted, but it’s scientific standard to tell people that they are in an experiment. And to tell them what was going on after the experiment concludes. (Usually with experiments that change people’s behavior so deeply you would even consider adding psychological exit counselling for participants.) This critique is fully legitimate and it’s something the scientists will have to answer for. Not Facebook cause they tweak their algorithm each day with people’s consent (EULA etc) but that’s nothing the scientists can fall back on. What happened is a certain break of trust: The deal is that Facebook can play with the algorithm as much as they want so long as they try to provide me with more relevance. They changed their end of the bargain (not even with bad intentions but they did it intransparently) which taints people’s (and my) relationship to the company slightly.

    From a purely scientific standpoint the study is somewhat problematic. Not because of their approach which looks solid after reading their paper but because noone but them can reproduce their results. It’s closed source science so it cannot really be peer reviewed. Strictly speaking we can only consider their paper an idea because their data could be basically made up (not that I want to imply that but we can’t check anything). Not good science though sadly the way many studies are.

    Most of the blame lands on the scientists. They should have known that their approach was wrong. The potential data was seductive but they would have had to force Facebook to do this more transparently. The best way would have been an opt-in: “The scientists want to study human interaction so they ask to get access to certain statistics of your feed. They will at no point be able to read all your posts or your complete feed. Do you want to participate? [Yes] [No]“. A message to people who were part of the study after it concluded with a way to remove the data set from the study as sort of a punishment for breaking trust would be the least that would have been needed to be done.

    Whenever you work with people and change their life you run risks. What happens if one of the people whose feed you worsen up is depressive? What will that do to him or her? The scientists must have thought about that but decided not to care. There are many words we could find for that kind of behavior: Disgusting. Assholeish. Sociopathic.

    It’s no surprise that Facebook didn’t catch this issue because tweaking their feed is what they do all day. And among all their rhetoric their users aren’t the center of their attention. We could put the bad capitalism stamp of disapproval on this thought and move on but it does show something Facebook needs to learn: Users might not pay but without them Facebook is nothing. There is a lot of lock-in but when the trust into Facebook’s sincerity gets damaged too much, you open yourself up for competition and people leaving. There is still quite some trust as the growth of users and interaction in spite of all the bad “oh noez, Facebook will destroy your privacy and life and kill baby seals!” press shows. But that’s not a given.

    Companies sitting on these huge amounts of social data have not only their shareholders to look out for but also their users. The need to establish ways for users to participate and keep them honest. Build structures to get feedback from users or form groups representing users and their interests to them. That’s the actual thing Facebook can and should learn from this.

    For a small scientific step almost everybody lost: Scientists showed an alarming lack of awareness and ethic, Facebook an impressive lack of understanding of how important trust is and people using Facebook because for an experiment their days might have been ruined. Doesn’t look like a good exchange to me. But we shouldn’t let this put a black mark in the study of social behavior online.

    Studying how people interact is important to better understand what we do and how and why we do it. Because we want systems to be built in a way that suits us, helps us lead better more fulfilling lives. We want technology to enrichen our worlds. And for that we need to understand how we perceive and interact with them.

    In a perfect world we’d have a set of data that is open and that can be analyzed. Sadly we don’t so we’ll have to work with the companies having access to that kind of data. But as scientists we need to make sure that – no matter how great the insights we generate might be – we treat people with the dignity they deserve. That we respect their rights. That we stay honest and transparent.

    I’d love to say that we need to develop these rules because that would take some of the blame from the scientists involved making the scientific community look less psychopathic. Sadly these rules and best practices have existed for ages now. And it’s alarming to see how many people involved in this project didn’t know them or respect them. That is the main teaching from this case: We need to take way better care of teaching scientists the ethics of science. Not just how to calculate and process data but how to treat others.

    Title image by: MoneyBlogNewz

    flattr this!

    Posts for Friday, June 27, 2014

    The body as a source of data

    The quantified self is starting to penetrate the tiny bubble of science enthusiasts and nerds. More health-related devices start connecting to the cloud (think scales and soon smart watches or heartrate monitors and similar wearables). Modern smartphones have built-in stepcounters or use GPS data to track movement and interpolate (from the path and the speed the mode of transportation as well as the amount of calories probably spent). Apple’s new HealthKit as well as Google’s new GoogleFit APIs are pushing the gathering of data about one’s own body into the spotlight and potentially a more mainstream demographic.

    Quantifiying oneself isn’t always perceived in a positive light. Where one group sees ways to better understand their own body and how it influences their feelings and lives others interpret the projection of body functions down to digital data as a mechanization of a natural thing, something diminishing the human being, as humans kneeling under the force of capitalism and its implied necessity to optimize oneself’s employability and “worth” and finally a dangerous tool giving companies too much access to data about us and how we live and feel. What if our health insurance knew how little we sleep, how little we exercise and what bad dieting habits we entertain?

    Obviously there are holistic ways to think about one’s own body. You can watch yourself in the mirror for 5 minutes every morning to see if everything is OK. You can meditate and try to “listen into your body”. But seeing how many negative influences on one’s personal long-term health cannot really be felt until it is too late a data-centric approach seems to be a reasonable path towards detecting dangerous (or simply unpleasant) patterns and habits.

    The reason why metrics in engineering are based on numbers is that this model of the world makes the comparison of two states simple: “I used to have a foo of 4 now my foo is 12.” Regardless of what that means, it’s easy to see that foo has increased which can be translated in actions if necessary (“eat less stuff containing foo”). Even projecting feelings onto numbers can yield very useful results: “After sleeping for 5 hours my happiness throughout the day seems to average around 3, after sleeping 7 hours it averages around 5″ can provide a person a useful input when deciding whether to sleep more or not. Regardless of what exactly a happiness of “3” or “5” means in comparison to others.

    A human body is a complex machine. Chemical reactions and electric currents happen throughout it at a mindblowing speed. And every kind of data set, no matter how great the instrument used to collect it, only represents a tiny fraction of a perspective of a part of what constitutes a living body. Even if you aggregate all the data about a human being we can monitor and record these days, all you have is just a bunch of data. Good enough to mine for certain patterns suggesting certain traits or illnesses or properties but never enough to say that you actually know what makes a person tick.

    But all that data can be helpful to people for very specific questions. Tracking food intake and physical activity can help a person control their weight if they want to. Correlating sleep and performance can help people figuring out what kind of schedule they should sleep on to feel as good as possible. And sometimes these numbers can just help oneself to measure one’s own progress, if you managed to beat your 10k record.

    With all the devices and data monitors we surround us with, gathering huge amounts of data becomes trivial. And everyone can store that data on their own harddrives and develop and implement algorithms to analyse and use this source of information. Why do we need the companies who will just use the data to send us advertising in exchange for hosting our data?

    It comes back to the question whether telling people to host their own services and data is cynical. As I already wrote I do believe it is. Companies with defined standard APIs can help individuals who don’t have the skills or the money to pay people with said skills to learn more about their bodies and how they influence their lives. They can help make that mass of data manageable, queryable, actionable. Simply usable. That doesn’t mean that there isn’t a better way. That an open platform to aggregate one’s digital body representation wouldn’t be better. But we don’t have that, especially not for mainstream consumption.

    Given these thoughts I find recent comments on the dangers and evils of using one of the big companies to handle the aggregation of the data about your body somewhat classist. Because I believe that you should be able to understand your body better even if you can’t code or think of algorithms (or pay others to do that for you individually). The slippery slope argument that if the data exists somewhere it will very soon be used to trample on your rights and ruin your day doesn’t only rob certain people of the chance to improve their life or gain new insights, it actually enforces a pattern where people with fewer resources tend to get the short end of the stick when it comes to health an life expectancy.

    It’s always easy to tell people not to use some data-based product because of dangers for their privacy or something similar. It’s especially easy when whatever that service is supposed to do for you you already own. “Don’t use Facebook” is only a half-earnest argument if you (because of other social or political networks) do not need this kind of networking to participate in a debate or connect to others. It’s a deeply paternalist point of view and carries a certain lack of empathy.

    Companies aren’t usually all that great just as the capitalist system we live in isn’t great. “The market is why we can’t have nice things” as Mike Rugnetta said it in this week’s Idea Channel. But at least with companies you know their angle (Hint: It’s their bottom line). You know that they want to make money and that they offer that service “for free” usually means that you pay with attention (through ads). There’s no evil conspiracy, no man with a cat on his lap saying “No Mr. Bond, I want you to DIE!”.

    But given that a company lets you access and export all that data you pour into their service I can only urge you to think whether the benefit that their service can give you isn’t worth those handful of ads. Companies aren’t evil demons with magic powers. They are sociopathic and greedy, but that’s it.

    The belief that a company “just knows too much” if they gather data about your body in on place overestimates the truth that data carries. They don’t own your soul or can now cast spells on you. Data you emit isn’t just a liability, something you need to keep locked up and avoid. It can also be your own tool, your light in the darkness.

    Header image by: SMI Eye Tracking

    flattr this!

    Posts for Tuesday, June 24, 2014

    “The Open-Source Everything Revolution” and the boxology syndrome

    Yesterday @kunstreich pointed me to a rather interesting article in the Guardian. Under the ambitious title “The open source revolution is coming and it will conquer the 1% – ex CIA spy“. We’ll pause for a second while you read the article.

    For those unwilling to or with limited amount of time available, here’s my executive summary. Robert David Steele, who has worked for the CIA  for quite a while at some point wanted to introduce more Open Source practices into the intelligence community. He realized that the whole secret tech and process thing didn’t scale and that gathering all those secret and protected pieces of information were mostly not worth the effort, when there’s so much data out there in the open. He also figured out that our current western societies aren’t doing so well: The distribution of wealth and power is messed up and companies have – with help by governments – created a system where they privatize the commons and every kind of possible profit while having the public pay for most of the losses. Steele, who’s obviously a very well educated person, now wants to make everything open. Open source software, open governments, open data, “open society”1 in order to fix our society and ensure a better future:

    open source The Open Source Everything Revolution and the boxology syndrome

    Open Source Everything (from the Guardian)

    Steele’s visions sounds charming: When there is total knowledge and awareness, problems can be easily detected and fixed. Omniscience as the tool to a perfect world. This actually fits quite well into the intelligence agency mindset: “We need all the information to make sure nothing bad will happen. Just give us all the data and you will be safe.” And Steele does not want to abolish Intelligence agencies, he wants to make them transparent and open (the question remains if they can be considered intelligence agencies by our common definition then).

    But there are quite a few problems with Steele’s revolutionary manifesto. It basically suffers from “Boxology Syndrome”.

    The boxology syndrome is a Déformation professionnelle that many people in IT and modelling suffer from. It’s characterized by the belief that every complex problem and system can be sufficiently described by a bunch of boxes and connecting lines. It happens in IT because the object-oriented design approach teaches exactly that kind of thinking: Find the relevant terms and items, make them classes (boxes) and see how they connect. Now you’ve modeled the domain and the problem solution. That was easy!

    But life tends to be messy and confusing, the world doesn’t seem to like to live in boxes, just as people don’t like it.

    Open source software is brilliant. I love how my linux systems2 work transparently and allow me to change how they work according to my needs. I love how I can dive into existing apps and libraries to pick pieces I want to use for other projects, how I can patch and mix things to better serve my needs. But I am the minority.

    4014689 a1bbcaf037 300x225 The Open Source Everything Revolution and the boxology syndrome

    By: velkr0

    Steele uses the word “open” as a silver bullet to … well … everything. He rehashes the ideas from David Brin’s “The Transparent Society” but seems to be working very hard to not use the word transparent. Which in many cases seems to be what he is actually going for but it feels like he is avoiding the connotations attached to the word when it comes to people and societies: In a somewhat obvious try to openwash, he reframes the ideas of Brin my attaching the generally positively connotated word “open”.

    But open data and open source software do not magically make everyone capable of seizing these new found opportunities. Some people have the skills, the resources, the time and the interest to get something out of it, some people can pay people with the skills to do what they want to get done. And many, many people are just left alone, possibly swimming in a digital ocean way to deep and vast to see any kind of ground or land. Steele ignores the privilege of the educated and skilled few or somewhat naively hopes that they’ll cover the needs of those unable to serve their own out of generosity. Which could totally happen but do we really want to bet the future on the selflessness and generosity of everyone?

    Transparency is not a one-size-fits-all solution. We have different levels of transparency we require from the government or companies we interact with or that person serving your dinner. Some entities might offer more information than required (which is especially true for people who can legally demand very little transparency from each other but share a lot of information for their own personal goals and interests).

    Steele’s ideas – which are really seductive in their simplicity – don’t scale. Because he ignores the differences in power, resources and influence between social entities. And because he assumes that – just because you know everything – you will make the “best” decision.

    There is a lot of social value in having access to a lot of data. But data, algorithms and code are just a small part of what can create good decisions for society. There hardly ever is the one best solution. We have to talk and exchange positions and haggle to find an accepted and legitimized solution.

    Boxes and lines just don’t cut it.

    Title image by: Simona

    1. whatever that is supposed to mean
    2. I don’t own any computer with proprietary operating systems except for my gaming consoles

    flattr this!

    Posts for Sunday, June 22, 2014


    Chroots for SELinux enabled applications

    Today I had to prepare a chroot jail (thank you grsecurity for the neat additional chroot protection features) for a SELinux-enabled application. As a result, “just” making a chroot was insufficient: the application needed access to /sys/fs/selinux. Of course, granting access to /sys is not something I like to see for a chroot jail.

    Luckily, all other accesses are not needed, so I was able to create a static /sys/fs/selinux directory structure in the chroot, and then just mount the SELinux file system on that:

    ~# mount -t selinuxfs none /var/chroot/sys/fs/selinux

    In hindsight, I probably could just have created a /selinux location as that location, although deprecated, is still checked by the SELinux libraries.

    Anyway, there was a second requirement: access to /etc/selinux. Luckily it was purely for read operations, so I was first contemplating of copying the data and doing a chmod -R a-w /var/chroot/etc/selinux, but then considered a bind-mount:

    ~# mount -o bind,ro /etc/selinux /var/chroot/etc/selinux

    Alas, bad luck – the read-only flag is ignored during the mount, and the bind-mount is still read-write. A simple article on informed me about the solution: I need to do a remount afterwards to enable the read-only state:

    ~# mount -o remount,ro /var/chroot/etc/selinux

    Great! And because my brain isn’t what it used to be, I just make a quick blog for future reference ;-)

    Planet Larry is not officially affiliated with Gentoo Linux. Original artwork and logos copyright Gentoo Foundation. Yadda, yadda, yadda.