Posts for Saturday, August 29, 2015

avatar

Doing away with interfaces

CIL is SELinux' Common Intermediate Language, which brings on a whole new set of possibilities with policy development. I hardly know CIL but am (slowly) learning. Of course, the best way to learn is to try and do lots of things with it, but real-life work and time-to-market for now forces me to stick with the M4-based refpolicy one.

Still, I do try out some things here and there, and one of the things I wanted to look into was how CIL policies would deal with interfaces.

Recap on interfaces

With the M4 based reference policy, interfaces are M4 macros that expand into the standard SELinux rules. They are used by the reference policy to provide a way to isolate module-specific code and to have "public" calls.

Policy modules are not allowed (by convention) to call types or domains that are not defined by the same module. If they want to interact with those modules, then they need to call the interface(s):

# module "ntp"
# domtrans: when executing an ntpd_exec_t binary, the resulting process 
#           runs in ntpd_t
interface(`ntp_domtrans',`
  domtrans_pattern($1, ntpd_exec_t, ntpd_t)
)

# module "hal"
ntp_domtrans(hald_t)

In the above example, the purpose is to have hald_t be able to execute binaries labeled as ntpd_exec_t and have the resulting process run as the ntpd_t domain.

The following would not be allowed inside the hal module:

domtrans_pattern(hald_t, ntpd_exec_t, ntpd_t)

This would imply that both hald_t, ntpd_exec_t and ntpd_t are defined by the same module, which is not the case.

Interfaces in CIL

It seems that CIL will not use interface files. Perhaps some convention surrounding it will be created - to know this, we'll have to wait until a "cilrefpolicy" is created. However, functionally, this is no longer necessary.

Consider the myhttp_client_packet_t declaration from a previous post. In it, we wanted to allow mozilla_t to send and receive these packets. The example didn't use an interface-like construction for this, so let's see how this would be dealt with.

First, the module is slightly adjusted to create a macro called myhttp_sendrecv_client_packet:

(macro myhttp_sendrecv_client_packet ((type domain))
  (typeattributeset cil_gen_require domain)
  (allow domain myhttp_client_packet_t (packet (send recv)))
)

Another module would then call this:

(call myhttp_sendrecv_client_packet (mozilla_t))

That's it. When the policy modules are both loaded, then the mozilla_t domain is able to send and receive myhttp_client_packet_t labeled packets.

There's more: namespaces

But it doesn't end there. Whereas the reference policy had a single namespace for the interfaces, CIL is able to use namespaces. It allows to create an almost object-like approach for policy development.

The above myhttp_client_packet_t definition could be written as follows:

(block myhttp
  ; MyHTTP client packet
  (type client_packet_t)
  (roletype object_r client_packet_t)
  (typeattributeset client_packet_type (client_packet_t))
  (typeattributeset packet_type (client_packet_t))

  (macro sendrecv_client_packet ((type domain))
    (typeattributeset cil_gen_require domain)
    (allow domain client_packet_t (packet (send recv)))
  )
)

The other module looks as follows:

(block mozilla
  (typeattributeset cil_gen_require mozilla_t)
  (call myhttp.sendrecv_client_packet (mozilla_t))
)

The result is similar, but not fully the same. The packet is no longer called myhttp_client_packet_t but myhttp.client_packet_t. In other words, a period (.) is used to separate the object name (myhttp) and the object/type (client_packet_t) as well as interface/macro (sendrecv_client_packet):

~$ sesearch -s mozilla_t -c packet -p send -Ad
  ...
  allow mozilla_t myhttp.client_packet_t : packet { send recv };

And it looks that namespace support goes even further than that, but I still need to learn more about it first.

Still, I find this a good evolution. With CIL interfaces are no longer separate from the module definition: everything is inside the CIL file. I secretly hope that tools such as seinfo would support querying macros as well.

Posts for Tuesday, August 25, 2015

avatar

Slowly converting from GuideXML to HTML

Gentoo has removed its support of the older GuideXML format in favor of using the Gentoo Wiki and a new content management system for the main site (or is it static pages, I don't have the faintest idea to be honest). I do still have a few GuideXML pages in my development space, which I am going to move to HTML pretty soon.

In order to do so, I make use of the guidexml2wiki stylesheet I developed. But instead of migrating it to wiki syntax, I want to end with HTML.

So what I do is first convert the file from GuideXML to MediaWiki with xsltproc.

Next, I use pandoc to convert this to restructured text. The idea is that the main pages on my devpage are now restructured text based. I was hoping to use markdown, but the conversion from markdown to HTML is not what I hoped it was.

The restructured text is then converted to HTML using rst2html.py. In the end, I use the following function (for conversion, once):

# Convert GuideXML to RestructedText and to HTML
gxml2html() {
  basefile=${1%%.xml};

  # Convert to Mediawiki syntax
  xsltproc ~/dev-cvs/gentoo/xml/htdocs/xsl/guidexml2wiki.xsl $1 > ${basefile}.mediawiki

  if [ -f ${basefile}.mediawiki ] ; then
    # Convert to restructured text
    pandoc -f mediawiki -t rst -s -S -o ${basefile}.rst ${basefile}.mediawiki;
  fi

  if [ -f ${basefile}.rst ] ; then
    # Use your own stylesheet links (use full https URLs for this)
    rst2html.py  --stylesheet=link-to-bootstrap.min.css,link-to-tyrian.min.css --link-stylesheet ${basefile}.rst ${basefile}.html
  fi
}

Is it perfect? No, but it works.

Posts for Saturday, August 22, 2015

avatar

Making the case for multi-instance support

With the high attention that technologies such as Docker, Rocket and the like get (I recommend to look at Bocker by Peter Wilmott as well ;-), I still find it important that technologies are well capable of supporting a multi-instance environment.

Being able to run multiple instances makes for great consolidation. The system can be optimized for the technology, access to the system limited to the admins of said technology while still providing isolation between instances. For some technologies, running on commodity hardware just doesn't cut it (not all software is written for such hardware platforms) and consolidation allows for reducing (hardware/licensing) costs.

Examples of multi-instance technologies

A first example that I'm pretty familiar with is multi-instance database deployments: Oracle DBs, SQL Servers, PostgreSQLs, etc. The consolidation of databases while still keeping multiple instances around (instead of consolidating into a single instance itself) is mainly for operational reasons (changes should not influence other database/schema's) or technical reasons (different requirements in parameters, locales, etc.)

Other examples are web servers (for web hosting companies), which next to virtual host support (which is still part of a single instance) could benefit from multi-instance deployments for security reasons (vulnerabilities might be better contained then) as well as performance tuning. Same goes for web application servers (such as TomCat deployments).

But even other technologies like mail servers can benefit from multiple instance deployments. Postfix has a nice guide on multi-instance deployments and also covers some of the use cases for it.

Advantages of multi-instance setups

The primary objective that most organizations have when dealing with multiple instances is the consolidation to reduce cost. Especially expensive, propriatary software which is CPU licensed gains a lot from consolidation (and don't think a CPU is a CPU, each company has its (PDF) own (PDF) core weight table to get the most money out of their customers).

But beyond cost savings, using multi-instance deployments also provides for resource sharing. A high-end server can be used to host the multiple instances, with for instance SSD disks (or even flash cards), more memory, high-end CPUs, high-speed network connnectivity and more. This improves performance considerably, because most multi-instance technologies don't need all resources continuously.

Another advantage, if properly designed, is that multi-instance capable software can often leverage the multi-instance deployments for fast changes. A database might be easily patched (remove vulnerabilities) by creating a second codebase deployment, patching that codebase, and then migrating the database from one instance to another. Although it often still requires downtime, it can be made considerably less, and roll-back of such changes is very easy.

A last advantage that I see is security. Instances can be running as different runtime accounts, through different SELinux contexts, bound on different interfaces or chrooted into different locations. This is not an advantage compared to dedicated systems of course, but more an advantage compared to full consolidation (everything in a single instance).

Don't always focus on multi-instance setups though

Multiple instances isn't a silver bullet. Some technologies are generally much better when there is a single instance on a single operating system. Personally, I find that such technologies should know better. If they are really designed to be suboptimal in case of multi-instance deployments, then there is a design error.

But when the advantages of multiple instances do not exist (no license cost, hardware cost is low, etc.) then organizations might focus on single-instance deployments, because

  • multi-instance deployments might require more users to access the system (especially when it is multi-tenant)
  • operational activities might impact other instances (for instance updating kernel parameters for one instance requires a reboot which affects other instances)
  • the software might not be properly "multi-instance aware" and as such starts fighting for resources with its own sigbling instances

Given that properly designed architectures are well capable of using virtualization (and in the future containerization) moving towards single-instance deployments becomes more and more interesting.

What should multi-instance software consider?

Software should, imo, always consider multi-instance deployments. Even when the administrator decides to stick with a single instance, all that that takes is that the software ends up with a "single instance" setup (it is much easier to support multiple instances and deploy a single one, than to support single instances and deploy multiple ones).

The first thing software should take into account is that it might (and will) run with different runtime accounts - service accounts if you whish. That means that the software should be well aware that file locations are separate, and that these locations will have different access control settings on them (if not just a different owner).

So instead of using /etc/foo as the mandatory location, consider supporting /etc/foo/instance1, /etc/foo/instance2 if full directories are needed, or just have /etc/foo1.conf and /etc/foo2.conf. I prefer the directory approach, because it makes management much easier. It then also makes sense that the log location is /var/log/foo/instance1, the data files are at /var/lib/foo/instance1, etc.

The second is that, if a service is network-facing (which most of them are), it must be able to either use multihomed systems easily (bind to different interfaces) or use different ports. The latter is a challenge I often come across with software - the way to configure the software to deal with multiple deployments and multiple ports is often a lengthy trial-and-error setup.

What's so difficult with using a base port setting, and document how the other ports are derived from this base port. Neo4J needs 3 ports for its enterprise services (transactions, cluster management and online backup), but they all need to be explicitly configured if you want a multi-instance deployment. What if one could just set baseport = 5001 with the software automatically selecting 5002 and 5003 as other ports (or 6001 and 7001). If the software in the future needs another port, there is no need to update the configuration (assuming the administrator leaves sufficient room).

Also consider the service scripts (/etc/init.d) or similar (depending on the init system used). Don't provide a single one which only deals with one instance. Instead, consider supporting symlinked service scripts which automatically obtain the right configuration from its name.

For instance, a service script called pgsql-inst1 which is a symlink to /etc/init.d/postgresql could then look for its configuration in /var/lib/postgresql/pgsql-inst1 (or /etc/postgresql/pgsql-inst1).

Just like supporting .d directories, I consider multi-instance support an important non-functional requirement for software.

Posts for Wednesday, August 19, 2015

avatar

Switching OpenSSH to ed25519 keys

With Mike's news item on OpenSSH's deprecation of the DSA algorithm for the public key authentication, I started switching the few keys I still had using DSA to the suggested ED25519 algorithm. Of course, I wouldn't be a security-interested party if I did not do some additional investigation into the DSA versus Ed25519 discussion.

The issue with DSA

You might find DSA a bit slower than RSA:

~$ openssl speed rsa1024 rsa2048 dsa1024 dsa2048
...
                  sign    verify    sign/s verify/s
rsa 1024 bits 0.000127s 0.000009s   7874.0 111147.6
rsa 2048 bits 0.000959s 0.000029s   1042.9  33956.0
                  sign    verify    sign/s verify/s
dsa 1024 bits 0.000098s 0.000103s  10213.9   9702.8
dsa 2048 bits 0.000293s 0.000339s   3407.9   2947.0

As you can see, RSA verification outperforms DSA in verification, while signing with DSA is better than DSA. But for what OpenSSH is concerned, this speed difference should not be noticeable on the vast majority of OpenSSH servers.

So no, it is not the speed, but the secure state of the DSS standard.

The OpenSSH developers find that ssh-dss (DSA) is too weak, which is followed by various sources. Considering the impact of these keys, it is important that they follow the state-of-the-art cryptographic services.

Instead, they suggest to switch to elliptic curve cryptography based algorithms, with Ed25519 and Curve25519 coming out on top.

Switch to RSA or ED25519?

Given that RSA is still considered very secure, one of the questions is of course if ED25519 is the right choice here or not. I don't consider myself anything in cryptography, but I do like to validate stuff through academic and (hopefully) reputable sources for information (not that I don't trust the OpenSSH and OpenSSL folks, but more from a broader interest in the subject).

Ed25519 should be written fully as Ed25519-SHA-512 and is a signature algorithm. It uses elliptic curve cryptography as explained on the EdDSA wikipedia page. An often cited paper is Fast and compact elliptic-curve cryptography by Mike Hamburg, which talks about the performance improvements, but the main paper is called High-speed high-security signatures which introduces the Ed25519 implementation.

Of the references I was able to (quickly) go through (not all papers are publicly reachable) none showed any concerns about the secure state of the algorithm.

The (simple) process of switching

Switching to Ed25519 is simple. First, generate the (new) SSH key (below just an example run):

~$ ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/testuser/.ssh/id_ed25519): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/testuser/.ssh/id_ed25519.
Your public key has been saved in /home/testuser/.ssh/id_ed25519.pub.
The key fingerprint is:
SHA256:RDaEw3tNAKBGMJ2S4wmN+6P3yDYIE+v90Hfzz/0r73M testuser@testserver
The key's randomart image is:
+--[ED25519 256]--+
|o*...o.+*.       |
|*o+.  +o ..      |
|o++    o.o       |
|o+    ... .      |
| +     .S        |
|+ o .            |
|o+.o . . o       |
|oo+o. . . o ....E|
| oooo.     ..o+=*|
+----[SHA256]-----+

Then, make sure that the ~/.ssh/authorized_keys file contains the public key (as generated as id_ed25519.pub). Don't remove the other keys yet until the communication is validated. For me, all I had to do was to update the file in the Salt repository and have the master push the changes to all nodes (starting with non-production first of course).

Next, try to log on to the system using the Ed25519 key:

~$ ssh -i ~/.ssh/id_ed25519 testuser@testserver

Make sure that your SSH agent is not running as it might still try to revert back to another key if the Ed25519 one does not work. You can validate if the connection was using Ed25519 through the auth.log file:

~$ sudo tail -f auth.log
Aug 17 21:20:48 localhost sshd[13962]: Accepted publickey for root from \
  192.168.100.1 port 43152 ssh2: ED25519 SHA256:-------redacted----------------

If this communication succeeds, then you can remove the old key from the ~/.ssh/authorized_keys files.

On the client level, you might want to hide ~/.ssh/id_dsa from the SSH agent:

# Obsolete - keychain ~/.ssh/id_dsa
keychain ~/.ssh/id_ed25519

If a server update was forgotten, then the authentication will fail and, depending on the configuration, either fall back to the regular authentication or fail immediately. This gives a nice heads-up to you to update the server, while keeping the key handy just in case. Just refer to the old id_dsa key during the authentication and fix up the server.

Posts for Sunday, August 16, 2015

avatar

Updates on my Pelican adventure

It's been a few weeks that I switched my blog to Pelican, a static site generator build with Python. A number of adjustments have been made since, which I'll happily talk about.

The full article view on index page

One of the features I wanted was to have my latest blog post to be fully readable from the front page (called the index page within Pelican). Sadly, I could not find a plugin of setting that would do this, but I did find a plugin that I can use to work around this: the summary plugin.

Enabling the plugin was a breeze. Extract the plugin sources in the plugin/ folder, and enable it in pelicanconf.py:

PLUGINS = [..., 'summary']

With this plug-in, articles can use inline comments to tell the system at which point the summary of the article stops. Usually, the summary (which is displayed on index pages) is a first paragraph (or set of paragraphs). What I do is I now manually set the summmary to the entire blog post for the latest post, and adjust later when a new post comes up.

It might be some manual labour, but it fits nicely and doesn't hack around in the code too much.

Commenting with Disqus

I had some remarks that the Disqus integration is not as intuitive as expected. Some readers had difficulties finding out how to comment as a guest (without the need to log on through popular social media or through Disqus itself).

Agreed, it is not easy to see at first sight that people need to start typing their name in the Or sign up with disqus before they can select I'd rather post as guest. As I don't have any way of controlling the format and rendered code with Disqus, I updated the theme a bit to add in two paragraphs on commenting. The first paragraph tells how to comment as guest.

The second paragraph for now informs readers that non-verified comments are put in the moderation queue. Once I get a feeling of how the spam and bots act on the commenting system, I will adjust the filters and also allow guest comments to be readily accessible (no moderation queue). Give it a few more weeks to get myself settled and I'll adjust it.

If the performance of the site is slowed down due to the Disqus javascripts: both Firefox (excuse me, Aurora) and Chromium have this at the initial load. Later, the scripts are properly cached and load in relatively fast (a quick test shows all pages I tried load in less than 2 seconds - WordPress was at 4). And if you're not interested in commenting, then you can even use NoScript or similar plugins to disallow any remote javascript.

Still, I will continue to look at how to make commenting easier. I recently allowed unmoderated comments (unless a number of keywords are added, and comments with links are also put in the moderation queue). If someone knows of another comment-like system that I could integrate I'm happy to hear about it as well.

Search

My issue with Tipue Search has been fixed by reverting a change in tipue_search.py (the plugin) where the URL was assigned to the loc key instead of url. It is probably a mismatch between the plugin and the theme (the change of the key was done in May in Tipue Search itself).

With this minor issue changed, the search capabilities are back on track on my blog. Enabling is was a matter of:

PLUGINS = [..., `tipue_search`]
DIRECT_TEMPLATES = ((..., 'search'))

Tags and categories

WordPress supports multiple categories, but Pelican does not. So I went through the various posts that had multiple categories and decided on a single one. While doing so, I also reduced the categories to a small set:

  • Databases
  • Documentation
  • Free Software
  • Gentoo
  • Misc
  • Security
  • SELinux

I will try to properly tag all posts so that, if someone is interested in a very particular topic, such as PostgreSQL, he can reach those posts through the tag.

Posts for Thursday, August 13, 2015

avatar

Finding a good compression utility

I recently came across a wiki page written by Herman Brule which gives a quick benchmark on a couple of compression methods / algorithms. It gave me the idea of writing a quick script that tests out a wide number of compression utilities available in Gentoo (usually through the app-arch category), with also a number of options (in case multiple options are possible).

The currently supported packages are:

app-arch/bloscpack      app-arch/bzip2          app-arch/freeze
app-arch/gzip           app-arch/lha            app-arch/lrzip
app-arch/lz4            app-arch/lzip           app-arch/lzma
app-arch/lzop           app-arch/mscompress     app-arch/p7zip
app-arch/pigz           app-arch/pixz           app-arch/plzip
app-arch/pxz            app-arch/rar            app-arch/rzip
app-arch/xar            app-arch/xz-utils       app-arch/zopfli
app-arch/zpaq

The script should keep the best compression information: duration, compression ratio, compression command, as well as the compressed file itself.

Finding the "best" compression

It is not my intention to find the most optimal compression, as that would require heuristic optimizations (which has triggered my interest in seeking such software, or writing it myself) while trying out various optimization parameters.

No, what I want is to find the "best" compression for a given file, with "best" being either

  • most reduced size (which I call compression delta in my script)
  • best reduction obtained per time unit (which I call the efficiency)

For me personally, I think I would use it for the various raw image files that I have through the photography hobby. Those image files are difficult to compress (the Nikon DS3200 I use is an entry-level camera which applies lossy compression already for its raw files) but their total size is considerable, and it would allow me to better use the storage I have available both on my laptop (which is SSD-only) as well as backup server.

But next to the best compression ratio, the efficiency is also an important metric as it shows how efficient the algorithm works in a certain time aspect. If one compression method yields 80% reduction in 5 minutes, and another one yields 80,5% in 45 minutes, then I might want to prefer the first one even though that is not the best compression at all.

Although the script could be used to get the most compression (without resolving to an optimization algorithm for the compression commands) for each file, this is definitely not the use case. A single run can take hours for files that are compressed in a handful of seconds. But it can show the best algorithms for a particular file type (for instance, do a few runs on a couple of raw image files and see which method is most succesful).

Another use case I'm currently looking into is how much improvement I can get when multiple files (all raw image files) are first grouped in a single archive (.tar). Theoretically, this should improve the compression, but by how much?

How the script works

The script does not contain much intelligence. It iterates over a wide set of compression commands that I tested out, checks the final compressed file size, and if it is better than a previous one it keeps this compressed file (and its statistics).

I tried to group some of the compressions together based on the algorithm used, but as I don't really know the details of the algorithms (it's based on manual pages and internet sites) and some of them combine multiple algorithms, it is more of a high-level selection than anything else.

The script can also only run the compressions of a single application (which I use when I'm fine-tuning the parameter runs).

A run shows something like the following:

Original file (test.nef) size 20958430 bytes
      package name                                                 command      duration                   size compr.Δ effic.:
      ------------                                                 -------      --------                   ---- ------- -------
app-arch/bloscpack                                               blpk -n 4           0.1               20947097 0.00054 0.00416
app-arch/bloscpack                                               blpk -n 8           0.1               20947097 0.00054 0.00492
app-arch/bloscpack                                              blpk -n 16           0.1               20947097 0.00054 0.00492
    app-arch/bzip2                                                   bzip2           2.0               19285616 0.07982 0.03991
    app-arch/bzip2                                                bzip2 -1           2.0               19881886 0.05137 0.02543
    app-arch/bzip2                                                bzip2 -2           1.9               19673083 0.06133 0.03211
...
    app-arch/p7zip                                      7za -tzip -mm=PPMd           5.9               19002882 0.09331 0.01592
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=24           5.7               19002882 0.09331 0.01640
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=25           6.4               18871933 0.09955 0.01551
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=26           7.7               18771632 0.10434 0.01364
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=27           9.0               18652402 0.11003 0.01224
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=28          10.0               18521291 0.11628 0.01161
    app-arch/p7zip                                       7za -t7z -m0=PPMd           5.7               18999088 0.09349 0.01634
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=24           5.8               18999088 0.09349 0.01617
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=25           6.5               18868478 0.09972 0.01534
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=26           7.5               18770031 0.10442 0.01387
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=27           8.6               18651294 0.11008 0.01282
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=28          10.6               18518330 0.11643 0.01100
      app-arch/rar                                                     rar           0.9               20249470 0.03383 0.03980
      app-arch/rar                                                 rar -m0           0.0               20958497 -0.00000        -0.00008
      app-arch/rar                                                 rar -m1           0.2               20243598 0.03411 0.14829
      app-arch/rar                                                 rar -m2           0.8               20252266 0.03369 0.04433
      app-arch/rar                                                 rar -m3           0.8               20249470 0.03383 0.04027
      app-arch/rar                                                 rar -m4           0.9               20248859 0.03386 0.03983
      app-arch/rar                                                 rar -m5           0.8               20248577 0.03387 0.04181
    app-arch/lrzip                                                lrzip -z          13.1               19769417 0.05673 0.00432
     app-arch/zpaq                                                    zpaq           0.2               20970029 -0.00055        -0.00252
The best compression was found with 7za -t7z -m0=PPMd:mem=28.
The compression delta obtained was 0.11643 within 10.58 seconds.
This file is now available as test.nef.7z.

In the above example, the test file was around 20 MByte. The best compression compression command that the script found was:

~$ 7za -t7z -m0=PPMd:mem=28 a test.nef.7z test.nef

The resulting file (test.nef.7z) is 18 MByte, a reduction of 11,64%. The compression command took almost 11 seconds to do its thing, which gave an efficiency rating of 0,011, which is definitely not a fast one.

Some other algorithms don't do bad either with a better efficiency. For instance:

   app-arch/pbzip2                                                  pbzip2           0.6               19287402 0.07973 0.13071

In this case, the pbzip2 command got almost 8% reduction in less than a second, which is considerably more efficient than the 11-seconds long 7za run.

Want to try it out yourself?

I've pushed the script to my github location. Do a quick review of the code first (to see that I did not include anything malicious) and then execute it to see how it works:

~$ sw_comprbest -h
Usage: sw_comprbest --infile=<inputfile> [--family=<family>[,...]] [--command=<cmd>]
       sw_comprbest -i <inputfile> [-f <family>[,...]] [-c <cmd>]

Supported families: blosc bwt deflate lzma ppmd zpaq. These can be provided comma-separated.
Command is an additional filter - only the tests that use this base command are run.

The output shows
  - The package (in Gentoo) that the command belongs to
  - The command run
  - The duration (in seconds)
  - The size (in bytes) of the resulting file
  - The compression delta (percentage) showing how much is reduced (higher is better)
  - The efficiency ratio showing how much reduction (percentage) per second (higher is better)

When the command supports multithreading, we use the number of available cores on the system (as told by /proc/cpuinfo).

For instance, to try it out against a PDF file:

~$ sw_comprbest -i MEA6-Sven_Vermeulen-Research_Summary.pdf
Original file (MEA6-Sven_Vermeulen-Research_Summary.pdf) size 117763 bytes
...
The best compression was found with zopfli --deflate.
The compression delta obtained was 0.00982 within 0.19 seconds.
This file is now available as MEA6-Sven_Vermeulen-Research_Summary.pdf.deflate.

So in this case, the resulting file is hardly better compressed - the PDF itself is already compressed. Let's try it against the uncompressed PDF:

~$ pdftk MEA6-Sven_Vermeulen-Research_Summary.pdf output test.pdf uncompress
~$ sw_comprbest -i test.pdf
Original file (test.pdf) size 144670 bytes
...
The best compression was found with lrzip -z.
The compression delta obtained was 0.27739 within 0.18 seconds.
This file is now available as test.pdf.lrz.

This is somewhat better:

~$ ls -l MEA6-Sven_Vermeulen-Research_Summary.pdf* test.pdf*
-rw-r--r--. 1 swift swift 117763 Aug  7 14:32 MEA6-Sven_Vermeulen-Research_Summary.pdf
-rw-r--r--. 1 swift swift 116606 Aug  7 14:32 MEA6-Sven_Vermeulen-Research_Summary.pdf.deflate
-rw-r--r--. 1 swift swift 144670 Aug  7 14:34 test.pdf
-rw-r--r--. 1 swift swift 104540 Aug  7 14:35 test.pdf.lrz

The resulting file is 11,22% reduced from the original one.

Posts for Tuesday, August 11, 2015

avatar

Why we do confine Firefox

If you're a bit following the SELinux development community you will know Dan Walsh, a Red Hat security engineer. Today he blogged about CVE-2015-4495 and SELinux, or why doesn't SELinux confine Firefox. He should've asked why the reference policy or Red Hat/Fedora policy does not confine Firefox, because SELinux is, as I've mentioned before, not the same as its policy.

In effect, Gentoo's SELinux policy does confine Firefox by default. One of the principles we focus on in Gentoo Hardened is to develop desktop policies in order to reduce exposure and information leakage of user documents. We might not have the manpower to confine all desktop applications, but I do think it is worthwhile to at least attempt to do this, even though what Dan Walsh mentioned is also correct: desktops are notoriously difficult to use a mandatory access control system on.

How Gentoo wants to support more confined desktop applications

What Gentoo Hardened tries to do is to support the XDG Base Directory Specification for several documentation types. Downloads are marked as xdg_downloads_home_t, pictures are marked as xdg_pictures_home_t, etc.

With those types defined, we grant the regular user domains full access to those types, but start removing access to user content from applications. Rules such as the following are commented out or removed from the policies:

# userdom_manage_user_home_content_dirs(mozilla_t)
# userdom_manage_user_home_content_files(mozilla_t)

Instead, we add in a call to a template we have defined ourselves:

userdom_user_content_access_template(mozilla, { mozilla_t mozilla_plugin_t })

This call makes access to user content optional through SELinux booleans. For instance, for the mozilla_t domain (which is used for Firefox), the following booleans are created:

# Read generic (user_home_t) user content
mozilla_read_generic_user_content       ->      true

# Read all user content
mozilla_read_all_user_content           ->      false

# Manage generic (user_home_t) user content
mozilla_manage_generic_user_content     ->      false

# Manage all user content
mozilla_manage_all_user_content         ->      false

As you can see, the default setting is that Firefox can read user content, but only non-specific types. So ssh_home_t, which is used for the SSH related files, is not readable by Firefox with our policy by default.

By changing these booleans, the policy is fine-tuned to the requirements of the administrator. On my systems, mozilla_read_generic_user_content is switched off.

You might ask how we can then still support a browser if it cannot access user content to upload or download. Well, as mentioned before, we support the XDG types. The browser is allowed to manage xdg_download_home_t files and directories. For the majority of cases, this is sufficient. I also don't mind copying over files to the ~/Downloads directory just for uploading files. But I am well aware that this is not what the majority of users would want, which is why the default is as it is.

There is much more work to be done sadly

As said earlier, the default policy will allow reading of user files if those files are not typed specifically. Types that are protected by our policy (but not by the reference policy standard) includes SSH related files at ~/.ssh and GnuPG files at ~/.gnupg. Even other configuration files, such as for my Mutt configuration (~/.muttrc) which contains a password for an IMAP server I connect to, are not reachable.

However, it is still far from perfect. One of the reasons is that many desktop applications are not "converted" yet to our desktop policy approach. Yes, Chromium is also already converted, and policies we've added such as for Skype also do not allow direct access unless the user explicitly enabled it. But Evolution for instance isn't yet.

Converting desktop policies to a more strict setup requires lots of testing, which translates to many human resources. Within Gentoo, only a few developers and contributors are working on policies, and considering that this is not a change that is already part of the (upstream) reference policy, some contributors also do not want to put lots of focus on it either. But without having done the works, it will not be easy (nor probably acceptable) to upstream this (the XDG patch has been submitted a few times already but wasn't deemed ready yet then).

Having a more restrictive policy isn't the end

As the blog post of Dan rightly mentioned, there are still quite some other ways of accessing information that we might want to protect. An application might not have access to user files, but can be able to communicate (for instance through DBus) with an application that does, and through that instruct it to pass on the data.

Plugins might require permissions which do not match with the principles set up earlier. When we tried out Google Talk (needed for proper Google Hangouts support) we noticed that it requires many, many more privileges. Luckily, we were able to write down and develop a policy for the Google Talk plugin (googletalk_plugin_t) so it is still properly confined. But this is just a single plugin, and I'm sure that more plugins exist which will have similar requirements. Which leads to more policy development.

But having workarounds does not make the effort we do worthless. Being able to work around a firewall through application data does not make the firewall useless, it is just one of the many security layers. The same is true with SELinux policies.

I am glad that we at least try to confine desktop applications more, and that Gentoo Hardened users who use SELinux are at least somewhat more protected from the vulnerability (even with the default case) and that our investment for this is sound.

Posts for Sunday, August 9, 2015

avatar

Can SELinux substitute DAC?

A nice twitter discussion with Erling Hellenäs caught my full attention later when I was heading home: Can SELinux substitute DAC? I know it can't and doesn't in the current implementation, but why not and what would be needed?

SELinux is implemented through the Linux Security Modules framework which allows for different security systems to be implemented and integrated in the Linux kernel. Through LSM, various security-sensitive operations can be secured further through additional access checks. This criteria was made to have LSM be as minimally invasive as possible.

The LSM design

The basic LSM design paper, called Linux Security Modules: General Security Support for the Linux Kernel as presented in 2002, is still one of the better references for learning and understanding LSM. It does show that there was a whish-list from the community where LSM hooks could override DAC checks, and that it has been partially implemented through permissive hooks (not to be mistaken with SELinux' permissive mode).

However, this definitely is partially implemented because there are quite a few restrictions. One of them is that, if a request is made towards a resource and the UIDs match (see page 3, figure 2 of the paper) then the LSM hook is not consulted. When they don't match, a permissive LSM hook can be implemented. Support for permissive hooks is implemented for capabilities, a powerful DAC control that Linux supports and which is implemented through LSM as well. I have blogged about this nice feature a while ago.

These restrictions are also why some other security-conscious developers, such as grsecurity's team and RSBAC do not use the LSM system. Well, it's not only through these restrictions of course - other reasons play a role in them as well. But knowing what LSM can (and cannot) do also shows what SELinux can and cannot do.

The LSM design itself is already a reason why SELinux cannot substitute DAC controls. But perhaps we could disable DAC completely and thus only rely on SELinux?

Disabling DAC in Linux would be an excessive workload

The discretionary access controls in the Linux kernel are not easy to remove. They are often part of the code itself (just grep through the source code after -EPERM). Some subsystems which use a common standard approach (such as VFS operations) can rely on good integrated security controls, but these too often allow the operation if DAC allows it, and will only consult the LSM hooks otherwise.

VFS operations are the most known ones, but DAC controls go beyond file access. It also entails reading program memory, sending signals to applications, accessing hardware and more. But let's focus on the easier controls (as in, easier to use examples for), such as sharing files between users, restricting access to personal documents and authorizing operations in applications based on the user id (for instance, the owner can modify while other users can only read the file).

We could "work around" the Linux DAC controls by running everything as a single user (the root user) and having all files and resources be fully accessible by this user. But the problem with that is that SELinux would not be able to take over controls either, because you will need some user-based access controls, and within SELinux this implies that a mapping is done from a user to a SELinux user. Also, access controls based on the user id would no longer work, and unless the application is made SELinux-aware it would lack any authorization system (or would need to implement it itself).

With DAC Linux also provides quite some "freedom" which is well established in the Linux (and Unix) environment: a simple security model where the user and group membership versus the owner-privileges, group-privileges and "rest"-privileges are validated. Note that SELinux does not really know what a "group" is. It knows SELinux users, roles, types and sensitivities.

So, suppose we would keep multi-user support in Linux but completely remove the DAC controls and rely solely on LSM (and SELinux). Is this something reusable?

Using SELinux for DAC-alike rules

Consider the use case of two users. One user wants another user to read a few of his files. With DAC controls, he can "open up" the necessary resources (files and directories) through extended access control lists so that the other user can access it. No need to involve administrators.

With a MAC(-only) system, updates on the MAC policy usually require the security administrator to write additional policy rules to allow something. With SELinux (and without DAC) it would require the users to be somewhat isolated from each other (otherwise the users can just access everything from each other), which SELinux can do through User Based Access Control, but the target resource itself should be labeled with a type that is not managed through the UBAC control. Which means that the users will need the privilege to change labels to this type (which is possible!), assuming such a type is already made available for them. Users can't create new types themselves.

UBAC is by default disabled in many distributions, because it has some nasty side-effects that need to be taken into consideration. Just recently one of these came up on the refpolicy mailinglist. But even with UBAC enabled (I have it enabled on most of my systems, but considering that I only have a couple of users to manage and am administrator on these systems to quickly "update" rules when necessary) it does not provide equal functionality as DAC controls.

As mentioned before, SELinux does not know group membership. In order to create something group-like, we will probably need to consider roles. But in SELinux, roles are used to define what types are transitionable towards - it is not a membership approach. A type which is usable by two roles (for instance, the mozilla_t type which is allowed for staff_r and user_r) does not care about the role. This is unlike group membership.

Also, roles only focus on transitionable types (known as domains). It does not care about accessible resources (regular file types for instance). In order to allow one person to read a certain file type but not another, SELinux will need to control that one person can read this file through a particular domain while the other user can't. And given that domains are part of the SELinux policy, any situation that the policy has not thought about before will not be easily adaptable.

So, we can't do it?

Well, I'm pretty sure that a very extensive policy and set of rules can be made for SELinux which would make a number of DAC permissions obsolete, and that we could theoretically remove DAC from the Linux kernel.

End users would require a huge training to work with this system, and it would not be reusable across other systems in different environments, because the policy will be too specific to the system (unlike the current reference policy based ones, which are quite reusable across many distributions).

Furthermore, the effort to create these policies would be extremely high, whereas the DAC permissions are very simple to implement, and have been proven to be well suitable for many secured systems.

So no, unless you do massive engineering, I do not believe it is possible to substitute DAC with SELinux-only controls.

Posts for Friday, August 7, 2015

avatar

Filtering network access per application

Iptables (and the successor nftables) is a powerful packet filtering system in the Linux kernel, able to create advanced firewall capabilities. One of the features that it cannot provide is per-application filtering. Together with SELinux however, it is possible to implement this on a per domain basis.

SELinux does not know applications, but it knows domains. If we ensure that each application runs in its own domain, then we can leverage the firewall capabilities with SELinux to only allow those domains access that we need.

SELinux network control: packet types

The basic network control we need to enable is SELinux' packet types. Most default policies will grant application domains the right set of packet types:

~# sesearch -s mozilla_t -c packet -A
Found 13 semantic av rules:
   allow mozilla_t ipp_client_packet_t : packet { send recv } ; 
   allow mozilla_t soundd_client_packet_t : packet { send recv } ; 
   allow nsswitch_domain dns_client_packet_t : packet { send recv } ; 
   allow mozilla_t speech_client_packet_t : packet { send recv } ; 
   allow mozilla_t ftp_client_packet_t : packet { send recv } ; 
   allow mozilla_t http_client_packet_t : packet { send recv } ; 
   allow mozilla_t tor_client_packet_t : packet { send recv } ; 
   allow mozilla_t squid_client_packet_t : packet { send recv } ; 
   allow mozilla_t http_cache_client_packet_t : packet { send recv } ; 
 DT allow mozilla_t server_packet_type : packet recv ; [ mozilla_bind_all_unreserved_ports ]
 DT allow mozilla_t server_packet_type : packet send ; [ mozilla_bind_all_unreserved_ports ]
 DT allow nsswitch_domain ldap_client_packet_t : packet recv ; [ authlogin_nsswitch_use_ldap ]
 DT allow nsswitch_domain ldap_client_packet_t : packet send ; [ authlogin_nsswitch_use_ldap ]

As we can see, the mozilla_t domain is able to send and receive packets of type ipp_client_packet_t, soundd_client_packet_t, dns_client_packet_t, speech_client_packet_t, ftp_client_packet_t, http_client_packet_t, tor_client_packet_t, squid_client_packet_t and http_cache_client_packet_t. If the SELinux booleans mentioned at the end are enabled, additional packet types are alloed to be used as well.

But even with this default policy in place, SELinux is not being consulted for filtering. To accomplish this, iptables will need to be told to label the incoming and outgoing packets. This is the SECMARK functionality that I've blogged about earlier.

Enabling SECMARK filtering through iptables

To enable SECMARK filtering, we use the iptables command and tell it to label SSH incoming and outgoing packets as ssh_server_packet_t:

~# iptables -t mangle -A INPUT -m state --state ESTABLISHED,RELATED -j CONNSECMARK --restore
~# iptables -t mangle -A INPUT -p tcp --dport 22 -j SECMARK --selctx system_u:object_r:ssh_server_packet_t:s0
~# iptables -t mangle -A OUTPUT -m state --state ESTABLISHED,RELATED -j CONNSECMARK --restore
~# iptables -t mangle -A OUTPUT -p tcp --sport 22 -j SECMARK --selctx system_u:object_r:ssh_server_packet_t:s0

But be warned: the moment iptables starts with its SECMARK support, all packets will be labeled. Those that are not explicitly labeled through one of the above commands will be labeled with the unlabeled_t type, and most domains are not allowed any access to unlabeled_t.

There are two things we can do to improve this situation:

  1. Define the necessary SECMARK rules for all supported ports (which is something that secmarkgen does), and/or
  2. Allow unlabeled_t for all domains.

To allow the latter, we can load a SELinux rule like the following:

(allow domain unlabeled_t (packet (send recv)))

This will allow all domains to send and receive packets of the unlabeled_t type. Although this is something that might be security-sensitive, it might be a good idea to allow at start, together with proper auditing (you can use (auditallow ...) to audit all granted packet communication) so that the right set of packet types can be enabled. This way, administrators can iteratively improve the SECMARK rules and finally remove the unlabeled_t privilege from the domain attribute.

To list the current SECMARK rules, list the firewall rules for the mangle table:

~# iptables -t mangle -nvL

Only granting one application network access

These two together allow for creating a firewall that only allows a single domain access to a particular target.

For instance, suppose that we only want the mozilla_t domain to connect to the company proxy (10.15.10.5). We can't enable the http_client_packet_t for this connection, as all other web browsers and other HTTP-aware applications will have policy rules enabled to send and receive that packet type. Instead, we are going to create a new packet type to use.

;; Definition of myhttp_client_packet_t
(type myhttp_client_packet_t)
(roletype object_r myhttp_client_packet_t)
(typeattributeset client_packet_type (myhttp_client_packet_t))
(typeattributeset packet_type (myhttp_client_packet_t))

;; Grant the use to mozilla_t
(typeattributeset cil_gen_require mozilla_t)
(allow mozilla_t myhttp_client_packet_t (packet (send recv)))

Putting the above in a myhttppacket.cil file and loading it allows the type to be used:

~# semodule -i myhttppacket.cil

Now, the myhttp_client_packet_t type can be used in iptables rules. Also, only the mozilla_t domain is allowed to send and receive these packets, effectively creating an application-based firewall, as all we now need to do is to mark the outgoing packets towards the proxy as myhttp_client_packet_t:

~# iptables -t mangle -A OUTPUT -p tcp --dport 80 -d 10.15.10.5 -j SECMARK --selctx system_u:object_r:myhttp_client_packet_t:s0

This shows that it is possible to create such firewall rules with SELinux. It is however not an out-of-the-box solution, requiring thought and development of both firewall rules and SELinux code constructions. Still, with some advanced scripting experience this will lead to a powerful addition to a hardened system.

Posts for Wednesday, August 5, 2015

avatar

My application base: Obnam

It is often said, yet too often forgotten: taking backups (and verifying that they work). Taking backups is not purely for companies and organizations. Individuals should also take backups to ensure that, in case of errors or calamities, the all important files are readily recoverable.

For backing up files and directories, I personally use obnam, after playing around with Bacula and attic. Bacula is more meant for large distributed environments (although I also tend to use obnam for my server infrastructure) and was too complex for my taste. The choice between obnam and attic is even more personally-oriented.

I found attic to be faster, but with a small supporting community. Obnam was slower, but seems to have a more active community which I find important for infrastructure that is meant to live quite long (you don't want to switch backup solutions every year). I also found it pretty easy to work with, and to restore files back, and Gentoo provides the app-backup/obnam package.

I think both are decent solutions, so I had to make one choice and ended up with obnam. So, how does it work?

Configuring what to backup

The basic configuration file for obnam is /etc/obnam.conf. Inside this file, I tell which directories need to be backed up, as well as which subdirectories or files (through expressions) can be left alone. For instance, I don't want obnam to backup ISO files as those have been downloaded anyway.

[config]
repository = /srv/backup
root = /root, /etc, /var/lib/portage, /srv/virt/gentoo, /home
exclude = \.img$, \.iso$, /home/[^/]*/Development/Centralized/.*
exclude-caches = yes

keep = 8h,14d,10w,12m,10y

The root parameter tells obnam which directories (and subdirectories) to back up. With exclude a particular set of files or directories can be excluded, for instance because these contain downloaded resources (and as such do not need to be inside the backup archives).

Obnam also supports the CACHEDIR.TAG specification, which I use for the various cache directories. With the use of these cache tag files I do not need to update the obnam.conf file with every new cache directory (or software build directory).

The last parameter in the configuration that I want to focus on is the keep parameter. Every time obnam takes a backup, it creates what it calls a new generation. When the backup storage becomes too big, administrators can run obnam forget to drop generations. The keep parameter informs obnam which generations can be removed and which ones can be kept.

In my case, I want to keep one backup per hour for the last 8 hours (I normally take one backup per day, but during some development sprees or photo manipulations I back up multiple times), one per day for the last two weeks, one per week for the last 10 weeks, one per month for the last 12 months and one per year for the last 10 years.

Obnam will clean up only when obnam forget is executed. As storage is cheap, and the performance of obnam is sufficient for me, I do not need to call this very often.

Backing up and restoring files

My backup strategy is to backup to an external disk, and then synchronize this disk with a personal backup server somewhere else. This backup server runs no other software beyond OpenSSH (to allow secure transfer of the backups) and both the backup server disks and the external disk is LUKS encrypted. Considering that I don't have government secrets I opted not to encrypt the backup files themselves, but Obnam does support that (through GnuPG).

All backup enabled systems use cron jobs which execute obnam backup to take the backup, and use rsync to synchronize the finished backup with the backup server. If I need to restore a file, I use obnam ls to see which file(s) I need to restore (add in a --generation= to list the files of a different backup generation than the last one).

Then, the command to restore is:

~# obnam restore --to=/var/restore /home/swift/Images/Processing/*.NCF

Or I can restore immediately to the directory again:

~# obnam restore --to=/home/swift/Images/Processing /home/swift/Images/Processing/*.NCF

To support multiple clients, obnam by default identifies each client through the hostname. It is possible to use different names, but hostnames tend to be a common best practice which I don't deviate from either. Obnam is able to share blocks between clients (it is not mandatory, but supported nonetheless).

Posts for Sunday, August 2, 2015

avatar

Don't confuse SELinux with its policy

With the increased attention that SELinux is getting thanks to its inclusion in recent Android releases, more and more people are understanding that SELinux is not a singular security solution. Many administrators are still disabling SELinux on their servers because it does not play well with their day-to-day operations. But the Android inclusion shows that SELinux itself is not the culprit for this: it is the policy.

Policy versus enforcement

SELinux has conceptually segregated the enforcement from the rules/policy. There is an in-kernel enforcement (the SELinux subsystem) which is configured through an administrator-provided policy (the SELinux rules). As long as SELinux was being used on servers, chances are very high that the policy that is being used is based on the SELinux Reference Policy as this is, as far as I know, the only policy implementation for Linux systems that is widely usable.

The reference policy project aims to provide a well designed, broadly usable yet still secure set of rules. And through this goal, it has to play ball with all possible use cases that the various software titles require. Given the open ecosystem of the free software world, and the Linux based ones in particular, managing such a policy is not for beginners. New policy development requires insight in the technology for which the policy is created, as well as knowledge of how the reference policy works.

Compare this to the Android environment. Applications have to follow more rigid guidelines before they are accepted on Android systems. Communication between applications and services is governed through Intents and Activities which are managed by the Binder application. Interactions with the user are based on well defined interfaces. Heck, the Android OS even holds a number of permissions that applications have to subscribe to before they can use it.

Such an environment is much easier to create policies for, because it allows policies to be created almost on-the-fly, with the application permissions being mapped to predefined SELinux rules. Because the freedom of implementations is limited (in order to create a manageable environment which is used by millions of devices over the world) policies can be made more strictly and yet enjoy the static nature of the environment: no continuous updates on existing policies, something that Linux distributions have to do on an almost daily basis.

Aiming for a policy development ecosystem

Having SELinux active on Android shows that one should not confuse SELinux with its policies. SELinux is a nice security subsystem in the Linux kernel, and can be used and tuned to cover whatever use case is given to it. The slow adoption of SELinux by Linux distributions might be attributed to its lack of policy diversification, which results in few ecosystems where additional (and perhaps innovative) policies could be developed.

It is however a huge advantage that a reference policy exists, so that distributions can enjoy a working policy without having to put resources into its own policy development and maintenance. Perhaps we should try to further enhance the existing policies while support new policy ecosystems and development initiatives.

The maturation of the CIL language by the SELinux userland libraries and tools might be a good catalyst for this. At one point, policies will need to be migrated to CIL (although this can happen gradually as the userland utilities can deal with CIL and other languages such as the legacy .pp files simultaneously) and there are a few developers considering a renewal of the reference policy. This would make use of the new benefits of the CIL language and implementation: some restrictions that where applicable to the legacy format no longer holds on CIL, such as rules which previously were only allowed in the base policy which can now be made part of the modules as well.

But next to renewing existing policies, there is plenty of room left for innovative policy ideas and developments. The SELinux language is very versatile, and just like with programming languages we notice that only a few set of constructs are used. Some applications might even benefit from using SELinux as their decision and enforcement system (something that SEPostgreSQL has tried).

The SELinux Notebook by Richard Haines is an excellent resource for developers that want to work more closely with the SELinux language constructs. Just skimming through this resource also shows how very open SELinux itself is, and that most of the users' experience with SELinux is based on a singular policy implementation. This is a prime reason why having a more open policy ecosystem makes perfect sense.

If you don't like a particular car, do you ditch driving at all? No, you try out another car. Let's create other cars in the SELinux world as well.

avatar

Switching to Pelican

Nothing beats a few hours of flying to get things moving on stuff. Being offline for a few hours with a good workstation helps to not be disturbed by external actions (air pockets notwithstanding).

Early this year, I expressed my intentions to move to Pelican from WordPress. I wasn't actually unhappy with WordPress, but the security concerns I had were a bit too much for blog as simple as mine. Running a PHP-enabled site with a database for something that I can easily handle through a static site, well, I had to try.

Today I finally moved the blog, imported all past articles as well as comments. For the commenting, I now use disqus which integrates nicely with Pelican and has a fluid feel to it. I wanted to use the Tipue Search plug-in as well for searching through the blog, but I had to put that on hold as I couldn't get the results of a search to display nicely (all I got were links to "undefined"). But I'll work on this.

Configuring Pelican

Pelican configuration is done through pelicanconf.py and publishconf.py. The former contains all definitions and settings for the site which are also useful when previewing changes. The latter contains additional (or overruled) settings related to publication.

In order to keep the same links as before (to keep web crawlers happy, as well as links to the blog from other sites and even the comments themselves) I did had to update some variables, but the Internet was strong on this one and I had little problems finding the right settings:

# Link structure of the site
ARTICLE_URL = u'{date:%Y}/{date:%m}/{slug}/'
ARTICLE_SAVE_AS = u'{date:%Y}/{date:%m}/{slug}/index.html'
CATEGORY_URL = u'category/{slug}'
CATEGORY_SAVE_AS = u'category/{slug}/index.html'
TAG_URL = u'tag/{slug}/'
TAG_SAVE_AS = u'tag/{slug}/index.html'

The next challenges were (and still are, I will have to check if this is working or not soon by checking the blog aggregation sites I am usually aggregated on) the RSS and Atom feeds. From the access logs of my previous blog, I believe that most of the aggregation sites are using the /feed/, /feed/atom and /category/*/feed links.

Now, I would like to move the aggregations to XML files, so that the RSS feed is available at /feed/rss.xml and the Atom feed at /feed/atom.xml, but then the existing aggregations would most likely fail because they currently don't use these URLs. To fix this, I am now trying to generate the XML files as I would like them to be, and create symbolic links afterwards from index.html to the right XML file.

The RSS/ATOM settings I am currently using are as follows:

CATEGORY_FEED_ATOM = 'category/%s/feed/atom.xml'
CATEGORY_FEED_RSS = 'category/%s/feed/rss.xml'
FEED_ATOM = 'feed/atom.xml'
FEED_ALL_ATOM = 'feed/all.atom.xml'
FEED_RSS = 'feed/rss.xml'
FEED_ALL_RSS = 'feed/all.rss.xml'
TAG_FEED_ATOM = 'tag/%s/feed/atom.xml'
TAG_FEED_RSS = 'tag/%s/feed/rss.xml'
TRANSLATION_FEED_ATOM = None
AUTHOR_FEED_ATOM = None
AUTHOR_FEED_RSS = None

Hopefully, the existing aggregations still work, and I can then start asking the planets to move to the XML URL itself. Some tracking on the access logs should allow me to see how well this is going.

Next steps

The first thing to make sure is happening correctly is the blog aggregation and the comment system. Then, a few tweaks are still on the pipeline.

One is to optimize the front page a bit. Right now, all articles are summarized, and I would like to have the last (or last few) article(s) fully expanded whereas the rest is summarized. If that isn't possible, I'll probably switch to fully expanded articles (which is a matter of setting a single variable).

Next, I really want the search functionality to work again. Enabling the Tipue search worked almost flawlessly - search worked as it should, and the resulting search entries are all correct. The problem is that the URLs that the entries point to (which is what users will click on) all point to an invalid ("undefined") URL.

Finally, I want the printer-friendly one to be without the social / links on the top right. This is theme-oriented, and I'm happily using pelican-bootstrap3 right now, so I don't expect this to be much of a hassle. But considering that my blog is mainly technology oriented for now (although I am planning on expanding that) being able to have the articles saved in PDF or printed in a nice format is an important use case for me.

Posts for Wednesday, July 15, 2015

The Web We Should Transcend

There’s a very popular article being shared right now titled “The Web We Have To Save” written by Hossein Derakhshan who was incarcerated for 6 years by the Iranian Government for his writing, his blog. His well-written and beautifully illustrated article is a warning, a call to arms for each and every Internet user to bring back that old web. The web before the “Stream”, before social media, images and videos. The web in its best form according to Derakhshan. But I fear it’s not my web.

Derakhshan’s article rubs me the wrong way for a bunch of different reasons which I’ll try to outline later but maybe this sentence sums it up best. He writes:

Everybody I linked to would face a sudden and serious jump in traffic: I could empower or embarrass anyone I wanted.

Maybe it’s just a clunky way to phrase things, maybe it’s not his intention to sound like one of the people that communities and platforms such as Reddit suffer from. But he does. I’m all for using one’s standing and position to empower others, especially those with maybe less access, people with less socioeconomic standing, people who usually do not get a voice to speak to the public and be heard. But considering the opportunity to embarrass anyone at whim one of the glorious things about the Internet is seriously tone deaf given the amount of aggressive threats and harassment against a multitude of people.

It’s that libertarian, that sadly very male understanding of the web: The Wild West of ideas and trolls and technology where people shoot not with a revolver but duel each other with words and SWAT teams send each other’s way. A world where the old gatekeepers and authorities made of flesh and iron have been replaced by new gatekeepers and authorities sitting in their own homes while trying to become just as powerful as the old entities of power.

Because that’s his actual complaint.

People used to carefully read my posts and leave lots of relevant comments, and even many of those who strongly disagreed with me still came to read. Other blogs linked to mine to discuss what I was saying. I felt like a king.

It is about power and reach. About being heard within the small elite of people who did invest time and money to access the Internet. To create reputation and relevance in that community. In a way to “win”. But when you obey that very market/capitalist mindset, you can’t plateau, you need to grow. You need to become a gatekeeper.

Those days, I used to keep a list of all blogs in Persian and, for a while, I was the first person any new blogger in Iran would contact, so they could get on the list. That’s why they called me “the blogfather” in my mid-twenties — it was a silly nickname, but at least it hinted at how much I cared.

I remember the time of these lists of links. Carefully curated by individuals with power in small or sometimes bigger communities. Where Derakhshan very vocally complains about algorithms forming and directing attention, he or people like him doing the same is the utopia we lost.

If you print the article and hold it to your ear you can hear a silent reading of the Declaration of the Independence of Cyberspace. The so called home of the mind where it was supposed to be all about ideas and text and high-level philosophical debate. Again a starkly male point of view. But the web got richer. Video is important these days and many creative people experiment with how to not just put forward ideas there but represent themselves and their lives. Instagram is huge and awesome because it (or platforms like it) can depict life in forms that were hidden in the past. Our public common consciousness gets richer by every How to do your Nails tutorial, every kid’s pictures from him or her hanging out with their friends. But that’s not “right” we learn:

Nearly every social network now treats a link as just the same as it treats any other object — the same as a photo, or a piece of text — instead of seeing at as a way to make that text richer.

Yes links are treated as object people share and comment on just as videos, pictures, podcasts, soundtracks, GIFs and whatever. Why should that one way of representing culture be the best one. The one we should prefer before everything else. Is it because that’s the way men used to communicate before kids and women entered sometimes not liking that made up world of supposedly objective and true words and links?

Hm. So what is the world today like online? Channeling the recently deceased German publicist Frank Schirrmacher Derakhshan takes all his old-man-yelling-at-clouds-anger about the modern world and hides it behind a mostly undefined term, in this case: The Stream.

The Stream now dominates the way people receive information on the web. Fewer users are directly checking dedicated webpages, instead getting fed by a never-ending flow of information that’s picked for them by complex –and secretive — algorithms.

Obviously not every stream is algorithmicly filtered: Twitter and Instagram for example just show you the stuff of the people you manually chose to follow. But let’s not nitpick here. The complaint is that we are no longer doing it right. We are not reading things or liking them for their imminent value, their objective content, but:

In many apps, the votes we cast — the likes, the plusses, the stars, the hearts — are actually more related to cute avatars and celebrity status than to the substance of what’s posted. A most brilliant paragraph by some ordinary-looking person can be left outside the Stream, while the silly ramblings of a celebrity gain instant Internet presence.

But is the Internet that’s not showing me cute cat GIFs any better? What kind of weird chauvinism towards any culture he doesn’t like or understand or value is that? My web has twerking videos, cat GIFs, philosophical essays and funny rants as well as whatever the hell people like, whatever makes them happy to make or read or view or share (unless it harasses people then GTFO with your crap).

91sn32Q

There is some meat to what Derakhshan writes and I encourage you to read his essay. He makes some valid points about centralization in the way certain Platforms like Facebook try to AOLify the Internet, how certain platforms and services use data about their readers or consumers unfairly and unethically and how it’s getting harder and harder to port data from one platform to another. He has some very very good arguments. But sadly he doesn’t get to the point and instead focuses on attacking things that “kids these days” do that do not support his understanding and vision of the web as a bunch of small kingdoms of digital quasi-warlords.

Dear Hossein Derakhshan.
Thanks for your article, I did enjoy reading it even if I don't agree with everything. The club of people who don't enjoy how our mostly unleashed and rabid form of capitalism ruins everything from human relationships to technology to funny cat videos meets every day down at the docks in the old factory building with the huge red star painted at its wall. Hope to see you there, soon.
Cheers

The Web Derakhshan wants to save is not the digital Garden of Eden. It sounds like what Ayn Rand and some technocrats would come up with for a western movie. And the days of John Wayne are past. Get over it.

Title Image by sciencefreak (Pixabay)

Flattr this!

avatar

Loading CIL modules directly

In a previous post I used the secilc binary to load an additional test policy. Little did I know (and that's actually embarrassing because it was one of the things I complained about) that you can just use the CIL policy as modules directly.

With this I mean that a CIL policy as mentioned in the previous post can be loaded like a prebuilt .pp module:

~# semodule -i test.cil
~# semodule -l | grep test
test

That's all that is to it. Loading the module resulted in the test port to be immediately declared and available:

~# semanage port -l | grep test
test_port_t                    tcp      1440

In hindsight, it makes sense that it is this easy. After all, support for the old-style policy language is done by converting it into CIL when calling semodule so it makes sense to immediately put the module (in CIL code) ready to be taken up.

Posts for Saturday, July 11, 2015

Protecting the Cause

People ain’t perfect. We forget and neglect, actively ignore, don’t care enough, don’t live up to what we’d like to be or know we could be or should be. Now don’t get me wrong, I’m no misanthropist. I believe that people in general are nice and try their best given their own perception of the world in spite of sometimes things ending badly.

In the last weeks I’ve had a bunch of very different reasons to think about a general problem that has infected many human rights/civil liberties activist circles, organizations, subcultures or structures. Maybe even infection isn’t the right word. Maybe it’s more of a problem of our individual views on the world with are … well … too individualistic. But I’m getting ahead of myself.1

I kept on coming back to a song by the British HipHop artist Scroobius Pip called “Waiting for the beat to kick in”2. The song describes the artist’s dream of walking through New York (“but not New York in real life the New York you see in old films”) and meeting different characters telling him their idea of how to live your life.

<iframe allowfullscreen="true" class="youtube-player" frameborder="0" height="390" src="https://www.youtube.com/embed/i5e5FUvRzNQ?version=3&amp;rel=1&amp;fs=1&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" type="text/html" width="640"></iframe>

So after meeting a bunch of positive, inspiring people the protagonist’s walk ends with meeting someone named “Walter Nepp” who wants to “bring him back down to earth“:

You may think yourself in general to be a nice guy,
But I’m telling you now – that right there is a lie,
Even the nicest of guys has some nasty within ’em,
You don’t have to be backlit to be the villain,
Whether it be greed lust or just plain vindictiveness,
There’s a level of malevolence inside all of us

I’ve been coming back to those lines over and over again in the last weeks and not just for personal reasons thinking about my own actions. Which is something I need to do. A lot. Sometimes I talk and write faster than I think which lets me end up in situations where I am not just wrong due to ignorance but actually harmful to people, causes or values I claim to support. These things don’t happen. I cause them through my ignorance, stupidity, carelessness or just not giving enough of a shit. I might behave racist, sexist or any other number of -ists in spite of me being an OK guy3. Everybody does these things because nobody is perfect – though some do it way more frequently or more hurtfully than others. Which is no and can be no excuse: The fact that we all behave hurtfully doesn’t make that behavior OK or normal just frequent. A chronic head- and heartache in our collective consciousness.

But I can consider myself lucky. My friends, peers, contacts keep me honest by calling me out on my bullshit. Which is more of a gift than many probably realize. I am living a privileged life and as much as you try, sometimes it is hard to break the veil that position casts on the world. To make a long story short: I need people telling me when I mess up to comprehend not just the situation at hand but also fix my thinking and perception as good as possible. Rinse and repeat. It’s a process.

I am not especially smart so many people do a lot better than I do. And many of them invest tons of their energy, time and life into making the world a better place. They work – often even without payment – in NGOs and activist structures fixing the world. Changing public perception and/or policy. Fighting the good fights on more fronts than I can count (and my Elementary School report card said that I wasn’t half bad at counting!).

As humans relying on the rights or legislations these groups and individuals fight for we are very lucky to have all that on our side. Networks of networks of activists helping each other ( and us) out. Cool. But sadly, very rarely are things as simple as they might seem.

In the last months I keep hearing very similar stories from the inside of many of the (digital) human rights/civil liberties NGOs. Stories contradicting the proposed mission statement. Stories of discrimination, sexism, harassment and communicative violence. Structural failure in organisations completely screwing up providing a safe environment for many trying to contribute. These stories usually have very visible, prominent men within those organisations at their core. But nothing changes.

Obviously there is not one singular cause for the deafening silence the stories drown in.

People might not want to speak up against their (past) employer to protect their present or future which is very understandable. And without witnesses and proof here is no story. It would be easy for me to ask for victims and witnesses to blow the whistle and speak up but what would I do in a similar situation? How can we expect people to destroy their actual lives for some lofty idea of truth, transparency and ethical righteousness?

But I believe there is something that we can change even from the fringes or even outside: We can stop to let people use – explicitly or implicitly – the cause as their shield.

I have seen this mechanic numerous times: Some civil rights dude fucks up but instead of calling him out people keep shtum to not hurt the cause. Few have mastered that mechanic as well as Julian Assange who basically finds new ways to break with the norms of almost everything in any given week, but he’s not alone just probably more intense than the rest. You can see these things in probably any one of the heroes of the scene and as I showed earlier: We all fuck up why shouldn’t they? But with the cause so intimately intertwined with their personas, their public identity any of their personal lapses or transgressions could cause damage for that cause.

Sadly people don’t just shut up but actually target those not shutting up. When questions came up about Assange’s antisemite friends and peers it was quickly labeled a smear campaign against whatever Wikileaks’ cause is4. And it’s not just a Wikileaks thing.

There is a lot to learn from this issue, about the lack of sustainability in chaining a cause or structure to one or very few public superstars. But mostly it’s about understanding that – especially as an NGO/structure fighting for human rights or similar things – you have to be better than most of the people and organisations you stand against.

Recently Wikileaks published a bunch of data from an Italian Cyber(war|security)5 company who seemingly even sold 0-days and exploits to oppressive regimes. Very cool, let’s see what journalists can dig up as stories about who builds these tools used to take other human beings’ rights and sometimes lives. Who sells them. And what we can do. But then I see people – fierce and vocal supporters of privacy as human rights – just randomly digging out personal information about people working for the German secret service from some list published without context (that also included all kinds of random addresses which did not all appear in the emails also leaked). When the data of millions of US government employees leaked people from similar groups were just too happy and quick in dragging out information about individuals and contextualizing them – often without proper research.

Sure. You can use any data you can get your hands on. But you are damaging your own cause, creating contradictions and becoming less credible. And as an activist/NGO credibility is your main asset. So you need to be called out on it. Even if your opponents might pick up that criticism and use it against you. I know it sucks but as an NGO/activist you don’t have the luxury of not caring how you do what you want to do. For example: If you claim privacy to be a human right (which is a popular argument in the civil rights scene) you just can’t republish (and thereby recontextualize) individual’s data on your Twitter feed or whatever. Because that would mean that you either don’t actually believe that privacy is a human right because you don’t grant it to people in certain jobs or that you just don’t respect their rights which would lead to the question why people should listen to you arguing for them.

Fighting for ethics and standards means that you have to live them as well: You can’t fight for transparency while being intransparent, can’t fight for right to privacy while just throwing “evil people”‘s data out there just cause they deserve it for some reason.

And that is where we as a networked subculture fail so often. We don’t call people out for that kind of crap. Because we like them. Because they were or are our friends or peers. Because we are afraid of how it might endanger the cause or the campaign or whatever. Sadly two failures (by the original fucker-up and by us not calling them out on it) don’t make one success. But helping people see their mistakes can make future successes possible.

Maybe it’s just one of these nights, sitting alone in front of a computer thinking, writing instead of anything else but I am just soo tired of hearing digital martyrs and saviors being glorified while seeing them discriminating against people for their gender or religion or acting according to different rules than what they demand for everyone else. I’m tired of all the secret stories being exchanged in the dark between so many members of our communities that they have basically become public knowledge and still nothing getting better.

‘Ethical behavior is doing the right thing when no one else is watching- even when doing the wrong thing is legal.’ Aldo Leopold

Photo by CJS*64 A man with a camera

  1. Excuse my strange stream of consciousness kind of writing here, I’m basically writing while thinking here.
  2. which incidentally shares some sort of stream of consciousness lyrics style with this way clunkier text
  3. Obviously my own tainted opinion 😉
  4. and let’s not even start with the accusations against Assange in Sweden
  5. as if there was a difference really

Flattr this!

avatar

Restricting even root access to a folder

In a comment Robert asked how to use SELinux to prevent even root access to a directory. The trivial solution would be not to assign an administrative role to the root account (which is definitely possible, but you want some way to gain administrative access otherwise ;-)

Restricting root is one of the commonly referred features of a MAC (Mandatory Access Control) system. With a well designed user management and sudo environment, it is fairly trivial - but if you need to start from the premise that a user has direct root access, it requires some thought to implement it correctly. The main "issue" is not that it is difficult to implement policy-wise, but that most users will start from a pre-existing policy (such as the reference policy) and build on top of that.

The use of a pre-existing policy means that some roles are already identified and privileges are already granted to users - often these higher privileged roles are assigned to the Linux root user as not to confuse users. But that does mean that restricting root access to a folder means that some additional countermeasures need to be implemented.

The policy

But first things first. Let's look at a simple policy for restricting access to /etc/private:

policy_module(myprivate, 1.0)

type etc_private_t;
fs_associate(etc_private_t)

This simple policy introduces a type (etc_private_t) which is allowed to be used for files (it associates with a file system). Do not use the files_type() interface as this would assign a set of attributes that many user roles get read access on.

Now, it is not sufficient to have the type available. If we want to assign it to a type, someone or something needs to have the privileges to change the security context of a file and directory to this type. If we would just load this policy and try to do this from a privileged account, it would fail:

~# chcon -t etc_private_t /etc/private
chcon: failed to change context of '/etc/private' to 'system_u:object_r:etc_private_t:s0': Permission denied

With the following rule, the sysadm_t domain (which I use for system administration) is allowed to change the context to etc_private_t:

allow sysadm_t etc_private_t:{dir file} relabelto;

With this in place, the administrator can label resources as etc_private_t without having read access to these resources afterwards. Also, as long as there are no relabelfrom privileges assigned, the administrator cannot revert the context back to a type that he has read access to.

The countermeasures

But this policy is not sufficient. One way that administrators can easily access the resources is to disable SELinux controls (as in, put the system in permissive mode):

~# cat /etc/private/README
cat: /etc/private/README: Permission denied
~# setenforce 0
~# cat /etc/private/README
Hello World!

To prevent this, enable the secure_mode_policyload SELinux boolean:

~# setsebool secure_mode_policyload on

This will prevent any policy and SELinux state manipulation... including permissive mode, but also including loading additional SELinux policies or changing booleans. Definitely experiment with this setting without persisting (i.e. do not use -P in the above command yet) to make sure it is manageable for you.

Still, this isn't sufficient. Don't forget that the administrator is otherwise a full administrator - if he cannot access the /etc/private location directly, then he might be able to access it indirectly:

  • If the resource is on a non-critical file system, he can unmount the file system and remount it with a context= mount option. This will override the file-level contexts. Bind-mounting does not seem to allow overriding the context.
  • If the resource is on a file system that cannot be unmounted, the administrator can still reboot the system in a mode where he can access the file system regardless of SELinux controls (either through editing /etc/selinux/config or by booting with enforcing=0, etc.
  • The administrator can still access the block device files on which the resources are directly. Specialized tools can allow for extracting files and directories without actually (re)mounting the device.

A more extensive list of methods to potentially gain access to such resources is iterated in Limiting file access with SELinux alone.

This set of methods for gaining access is due to the administrative role already assigned by the existing policy. To further mitigate these risks with SELinux (although SELinux will never completely mitigate all risks) the roles assigned to the users need to be carefully revisited. If you grant people administrative access, but you don't want them to be able to reboot the system, (re)mount file systems, access block devices, etc. then create a user role that does not have these privileges at all.

Creating such user roles does not require leaving behind the policy that is already active. Additional user domains can be created and granted to Linux accounts (including root). But in my experience, when you need to allow a user to log on as the "root" account directly, you probably need him to have true administrative privileges. Otherwise you'd work with personal accounts and a well-designed /etc/sudoers file.

Posts for Thursday, July 9, 2015

Let’s kill “cyborg”

I love a good definition: Very few things in life wield extreme power as elegantly as definitions do. Because every teeny, tiny definition worth its salt grabs the whole universe, all things, all ideas, all stories, feelings, objects and laws and separates them into distinct pieces, into two different sets: The set of things conforming to the definition and the set of things that don’t. If you’ve read enough definitions, super hero comics are basically boring.

Definitions structure our world. They establish borders between different things and allow us to communicate more precisely and clearly. They help us to understand each other better. Obviously many of them are in a constant development, are evolving and changing, adapting to our always fluid experience of our shared world.

And just as they change, sometimes definitions and the concepts they describe just stop being useful. Today I want to encourage you and me to put one definition and concept to rest that has outlived its usefulness: Let’s kill the term “cyborg”.

The term cyborg has been with us since the 1960s and has influenced more than just cheesy science fiction movies: Talking about cyborgs was a way to describe a future where human beings and machines would meld in order to describe our actual current lifestyles. Because for better or worse: We have been hybrid beings of part nature, part technology of sorts for many many decades now.

Still, the idea of “natural humans” put in contrast to machine-augmented humans was useful to elaborate the requirements that we as a society would need to postulate in order to integrate technology into ourselves, our bodies and our mental exoskeletons in a humane way. It has been a good conversation. But lately it really hasn’t been.

“Cyborg” has mostly been a word to single out and alienate “freaks”. It refers to body hackers that do things to their bodies that the mainstream doesn’t really understand but that they just love to watch like one of those strangely popular torture porn movies like Saw. It refers to people with disabilities in a way that does not include them in whatever the mainstream view of society is or that helps them make a case for how to design and engineer things in a way more accessible but as these weird “others”. I can’t count the amount of times that for example Neil Harbisson has been talking on conferences about his perception augmentation allowing him to hear colors with the gist of it in the media reception being mostly: Look how weird!

Instead of helping us to understand ourselves and our decision to intertwine our lives, worlds and bodies with all kinds of different technologies “Cyborg” just creates distance these days. It doesn’t build bridges for fruitful debates but in fact tears them down to look at freaks on the other side of the river without them coming closer.

We are all cyborgs and we have been for quite a while. But when everybody is a cyborg really nobody is. The distinction from whatever “norm” that the word, idea and definition provided is no longer helpful but actually hurtful for so many debates that we have to have in the next few years.

Thank you “cyborg”, it’s been an interesting ride. Rest in peace.

Photo by JD Hancock

Flattr this!

Posts for Sunday, July 5, 2015

avatar

Intermediate policies

When developing SELinux policies for new software (or existing ones whose policies I don't agree with) it is often more difficult to finish the policies so that they are broadly usable. When dealing with personal policies, having them "just work" is often sufficient. To make the policies reusable for distributions (or for the upstream project), a number of things are necessary:

  • Try structuring the policy using the style as suggested by refpolicy or Gentoo
  • Add the role interfaces that are most likely to be used or required, or which are in the current draft implemented differently
  • Refactor some of the policies to use refpolicy/Gentoo style interfaces
  • Remove the comments from the policies (as refpolicy does not want too verbose policies)
  • Change or update the file context definitions for default installations (rather than the custom installations I use)

This often takes quite some effort. Some of these changes (such as the style updates and commenting) are even counterproductive for me personally (in the sense that I don't gain any value from doing so and would have to start maintaining two different policy files for the same policy), and necessary only for upstreaming policies. As a result, I often finish with policies that I just leave for me personally or somewhere on a public repository (like these Neo4J and Ceph policies), without any activities already scheduled to attempt to upstream those.

But not contributing the policies to a broader public means that the effort is not known, and other contributors might be struggling with creating policies for their favorite (or necessary) technologies. So the majority of policies that I write I still hope to eventually push them out. But I noticed that these last few steps for upstreaming (the ones mentioned above) might only take a few hours of work, but take me over 6 months (or more) to accomplish (as I often find other stuff more interesting to do).

I don't know yet how to change the process to make it more interesting to use. However, I do have a couple of wishes that might make it easier for me, and perhaps others, to contribute:

  • Instead of reacting on contribution suggestions, work on a common repository together. Just like with a wiki, where we don't aim for a 100% correct and well designed document from the start, we should use the strength of the community to continuously improve policies (and to allow multiple people to work on the same policy). Right now, policies are a one-man publication with a number of people commenting on the suggested changes and asking the one person to refactor or update the change himself.
  • Document the style guide properly, but don't disallow contributions if they do not adhere to the style guide completely. Instead, merge and update. On successful wikis there are even people that update styles without content updates, and their help is greatly appreciated by the community.
  • If a naming convention is to be followed (which is the case with policies) make it clear. Too often the name of an interface is something that takes a few days of discussion. That's not productive for policy development.
  • Find a way to truly create a "core" part of the policy and a modular/serviceable approach to handle additional policies. The idea of the contrib/ repository was like that, but failed to live up to its expectations: the number of people who have commit access to the contrib is almost the same as to the core, a few exceptions notwithstanding, and whenever policies are added to contrib they often require changes on the core as well. Perhaps even support overlay-type approaches to policies so that intermediate policies can be "staged" and tested by a larger audience before they are vetted into the upstream reference policy.
  • Settle on how to deal with networking controls. My suggestion would be to immediately support the TCP/UDP ports as assigned by IANA (or another set of sources) so that additional policies do not need to wait for the base policy to support the ports. Or find and support a way for contributions to declare the port types themselves (we probably need to focus on CIL for this).
  • Document "best practices" on policy development where certain types of policies are documented in more detail. For instance, desktop application profiles, networked daemons, user roles, etc. These best practices should not be mandatory and should in fact support a broad set of privilege isolation. With the latter, I mean that there are policies who cover an entire category of systems (init systems, web servers), a single software package or even the sub-commands and sub-daemons of that package. It would surprise me if this can't be supported better out-of-the-box (as in, through a well thought-through base policy framework and styleguide).

I believe that this might create a more active community surrounding policy development.

Posts for Thursday, July 2, 2015

avatar

Moved blog to hugo, fastly and comma

  • I noticed what a disservice I was doing my readers when I started monitoring my site using litmus. A dynamic website in python on a cheap linode… What do you expect? So I now serve through fastly and use a static site generator.

  • pyblosxom was decent while it lasted. It can generate sites statically, but the project never got a lot of traction and is slowly fading out. There were a bit too many moving parts, so …

  • I now use the hugo static site generator, which is powerful, quite complete and gaining momentum. Fast and simple to use.

  • Should also keep an eye on the caddy webserver since it has some nice things such as git integration which should work well with hugo.

  • Trying to get disqus going was frustrating. Self hosted options like talkatv and isso were too complex, and kaiju is just not there yet and also pretty complex. I wrote comma which is a simple comment server in Go. Everything I need in 100 lines of Go and 50 lines of javascript! Let me know if you see anything funky.

  • pyblosxom-to-hugo.py migrated all content.

Digitally organic

There’s a new kid in music streaming town: Apple has entered this market of different services who provide all basically the same 30ish million tracks of music for a monthly fee. And there was much rejoicing™.

Some services (like for example Google’s Play Music and obviously Apple’s) derive a market advantage from their tight integration into the respective eco systems: You already use $company’s mail, calendar and mobile operating system, why not just throw in the music subscription, right? But still, as a service trying to win over new clients (and maybe even migrate them to your other offerings) you need a unique selling proposition, something that you have and nobody else. Something that ideally is even hard to clone and copy.

Apple decided to opt for the curation road: Apple’s service provides its customers with playlists or radio stations whose content was artfully crafted, chosen and curated by actual human beings, not algorithms. The selection of DJs is sorta problematic creating a sort of Audible Gentrification as David Banks called it, but the idea itself reasonates with the dominating digital zeitgeist.

On Twitter I casually remarked

but I had the feeling some people didn’t understand it in the way I had intended it so I’ll try to explain it in a little more detail here.

The “organic” movement (for lack of better term) consists of people who are willing to pay a premium for food items (and in extension other things) if they are producted organic: No genetically modified crops, no or very little “chemicals” (as in artificial fertilizer or pesticide) and produced locally if possible. Many of the aspects embodied in that ideology are smart: Why drive milk thousands of kilometers to supermarkets instead of buying local milk? Being more careful and conscious about how to fertilize soil and how to combat dangers to the harvest is also very important to help our environment not collapse. And nobody wants barely tested genetic modifications in our food chain (or any other significant untested changes for that matter).

But the focus on “nature” on the original is something else. It’s a conservative perspective in a way, trying to bring back the “good old days” (a broken sentiment that culminates in extremes such as the Paleo Diet). A way to live better by moving back in time while distancing oneself from the industrialized mainstream. This ideology branches out in many other areas and brought us the renaissance of the artisan.

While mainstream culture celebrates or at least quietly supports the technologically driven “progress”, certain people – more often than not people with a good income/social position – have shifted to conscious consumption of things manually crafted. “Realer” things made by people, not machines and industrial complexes. Things imbued with a vibe from a better, simpler, more honest time without all the complexities and nuances of today’s connected, industrialized lifestyle.

It’s no surprise that Apple would plug into this sentiment: The company whose products signify and embody the digital 1% like nothing else sets itself aside from the competition by using artists, craftsmen and -women to generate their playlists. With algorithms becoming increasingly better and running them becoming cheaper, having people choose your music is a weird throwback. On first sight.

On the second glance it becomes clear how “people not algorithms” can easily become a way for upper classes to distinguish from lower classes: If you can afford it, you have people choose your tunes, if you’re poor you have to live with whatever algorithms come up with. Recommendation algorithms are to industrialized mainstream as curators are to coffee artisans (and whatever else needs to be hipstered up to be sold for a premium).

The focus in the artisan is only very superficially a critique of the industrialized way: Drinking your hand-made artfully crafted coffee is only half the fun without your laptop or tablet computer, right? It’s a way of eating one’s cake and having it, too: Appear critical and reflected without losing the benefits of a globally networked industrial system.

We will see more of these “people, not algorithm” services in the future. Services that will probably cost a premium and provide a way for people to show their economic privilege without looking like a rich bastard but a conscious consumer.

P.S.: I do believe in conscious consumption. I buy local and fair trade if at all possible. This article is about framing. About how products are advertised and to whom.

Photo by asboluv

Flattr this!

Posts for Tuesday, June 30, 2015

removing old backups

Ever had to think about how to remove old backups? Like you want to keep the last 30 backups? Whenever i had to think about a solution to this i though about something with "find -mtime". However, this only works when backups were made constantly on a daily base.

But what happens if a backup fails or an external server doesn't have a connection to the storage? In my case my laptop only sporadically creates backups. If my laptop would be turned off for 30 days all of my previous backups would be deleted with "find -mtime".

Until now i had a huge script which checks for such cases. Just stupid...

Today i found THE solution!
A really easy and nice one-liner to always keep the last 30 backups. It's just so nice :D

Attention: Don't simple copy/paste this one-liner - it can remove files without asking!

find /backup/folder -type f -printf '%T@ %p\n' | sort -k1 -n | head -n-30 | cut -d' ' -f2 | xargs -r rm

I think i don't really have to explain it. It keeps the last 30 backups - doesn't matter how old they are, but they are always the newest one. In case you have multiple backups make sure to keep them in separated directories or filter them with "find -name "*keyword*"". And before using this one-liner i strongly suggest removing the last part (xargs -r rm) to see what would be removed.

Hope someone can find it useful. I've searched hours to find something like this and never found anything. (probably because i searched with the wrong keywords...)

Posts for Saturday, June 13, 2015

avatar

Where does CIL play in the SELinux system?

SELinux policy developers already have a number of file formats to work with. Currently, policy code is written in a set of three files:

  • The .te file contains the SELinux policy code (type enforcement rules)
  • The .if file contains functions which turn a set of arguments into blocks of SELinux policy code (interfaces). These functions are called by other interface files or type enforcement files
  • The .fc file contains mappings of file path expressions towards labels (file contexts)

These files are compiled into loadable modules (or a base module) which are then transformed to an active policy. But this is not a single-step approach.

Transforming policy code into policy file

For the Linux kernel SELinux subsystem, only a single file matters - the policy.## file (for instance policy.29). The suffix denotes the binary format used as higher numbers mean that additional SELinux features are supported which require different binary formats for the SELinux code in the Linux kernel.

With the 2.4 userspace, the transformation of the initial files as mentioned above towards a policy file is done as follows:

SELinux transformation diagram

When a developer builds a policy module, first checkmodule is used to build a .mod intermediate file. This file contains the type enforcement rules with the expanded rules of the various interface files. Next, semodule_package is called which transforms this intermediate file, together with the file context file, into a .pp file.

This .pp file is, in the 2.4 userspace, called a "high level language" file. There is little high-level about it, but the idea is that such high-level language files are then transformed into .cil files (CIL stands for Common Intermediate Language). If at any moment other frameworks come around, they could create high-level languages themselves and provide a transformation engine to convert these HLL files into CIL files.

For the current .pp files, this transformation is supported through the /usr/libexec/selinux/hll/pp binary which, given a .pp file, outputs CIL code.

Finally, all CIL files (together) are compiled into a binary policy.29 file. All the steps coming from a .pp file towards the final binary file are handled by the semodule command. For instance, if an administrator loads an additional .pp file, its (generated) CIL code is added to the other active CIL code and together, a new policy binary file is created.

Adding some CIL code

The SELinux userspace development repository contains a secilc command which can compile CIL code into a binary policy file. As such, it can perform the (very) last step of the file conversions above. However, it is not integrated in the sense that, if additional code is added, the administrator can "play" with it as he would with SELinux policy modules.

Still, that shouldn't prohibit us from playing around with it to experiment with the CIL language construct. Consider the following CIL SELinux policy code:

; Declare a test_port_t type
(type test_port_t)
; Assign the type to the object_r role
(roletype object_r test_port_t)

; Assign the right set of attributes to the port
(typeattributeset defined_port_type test_port_t)
(typeattributeset port_type test_port_t)

; Declare tcp:1440 as test_port_t
(portcon tcp 1440 (system_u object_r test_port_t ((s0) (s0))))

The code declares a port type (test_port_t) and uses it for the TCP port 1440.

In order to use this code, we have to build a policy file which includes all currently active CIL code, together with the test code:

~$ secilc -c 29 /var/lib/selinux/mcs/active/modules/400/*/cil testport.cil

The result is a policy.29 (the command forces version 29 as the current Linux kernel used on this system does not support version 30) file, which can now be copied to /etc/selinux/mcs/policy. Then, after having copied the file, load the new policy file using load_policy.

And lo and behold, the port type is now available:

~# semanage port -l | grep 1440
test_port_t           tcp      1440

To verify that it really is available and not just parsed by the userspace, let's connect to it and hope for a nice denial message:

~$ ssh -p 1440 localhost
ssh: connect to host localhost port 1440: Permission denied

~$ sudo ausearch -ts recent
time->Thu Jun 11 19:35:45 2015
type=PROCTITLE msg=audit(1434044145.829:296): proctitle=737368002D700031343430006C6F63616C686F7374
type=SOCKADDR msg=audit(1434044145.829:296): saddr=0A0005A0000000000000000000000000000000000000000100000000
type=SYSCALL msg=audit(1434044145.829:296): arch=c000003e syscall=42 success=no exit=-13 a0=3 a1=6d4d1ce050 a2=1c a3=0 items=0 ppid=2005 pid=18045 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001 egid=1001 sgid=1001 fsgid=1001 tty=pts0 ses=1 comm="ssh" exe="/usr/bin/ssh" subj=staff_u:staff_r:ssh_t:s0 key=(null)
type=AVC msg=audit(1434044145.829:296): avc:  denied  { name_connect } for  pid=18045 comm="ssh" dest=1440 scontext=staff_u:staff_r:ssh_t:s0 tcontext=system_u:object_r:test_port_t:s0 tclass=tcp_socket permissive=0

Posts for Wednesday, June 10, 2015

avatar

Live SELinux userspace ebuilds

In between courses, I pushed out live ebuilds for the SELinux userspace applications: libselinux, policycoreutils, libsemanage, libsepol, sepolgen, checkpolicy and secilc. These live ebuilds (with Gentoo version 9999) pull in the current development code of the SELinux userspace so that developers and contributors can already work with in-progress code developments as well as see how they work on a Gentoo platform.

That being said, I do not recommend using the live ebuilds for anyone else except developers and contributors in development zones (definitely not on production). One of the reasons is that the ebuilds do not apply Gentoo-specific patches to the ebuilds. I would also like to remove the Gentoo-specific manipulations that we do, such as small Makefile adjustments, but let's start with just ignoring the Gentoo patches.

Dropping the patches makes sure that we track upstream libraries and userspace closely, and allows developers to try and send out patches to the SELinux project to fix Gentoo related build problems. But as not all packages can be deployed successfully on a Gentoo system some patches need to be applied anyway. For this, users can drop the necessary patches inside /etc/portage/patches as all userspace ebuilds use the epatch_user method.

Finally, observant users will notice that "secilc" is also provided. This is a new package, which is probably going to have an official release with a new userspace release. It allows for building CIL-based SELinux policy code, and was one of the drivers for me to create the live ebuilds as I'm experimenting with the CIL constructions. So expect more on that later.

Posts for Monday, June 1, 2015

testing is fun (binpkg-multi-instance)

Since version 2.2.19 (now 2.2.20) portage implemented a feature called binpkg-multi-instance. This is a feature which i was looking for quite some time. In the last days i had some time and i decided to test it.
The feature itself brings the ability to portage to keep multiple versions (with different use settings) of a single package version.
Until now, if you created a binary package, portage could only keep exactly one binary-version of any package. If you build the package again with different use-settings and created a binary package the version prior would had gone.

Now this is probably not something many people were looking for. I was one of those who were really exited about it. When the feature hit git i was already tempted to test it directly from git head.

So why is that so exciting for me? Well, because some time ago i set up a nice test system where this feature helps _alot_ keeping compile times at a minimum.

Background:

The idea was simple. How to test many different setups and keep compile times at a minimum?
I wanted a base system which i could clone anytime i want so that i could install and test various package combination's and use already compiled packages as much as possible. That being said, a virtual machine with snapshots did come into mind. However, i had a dedicated hardware which had nothing to-do and thus there was no need for virtualization. Another candidate was btrfs with it's snapshot features. The problem here: What if i want to test another filesystem? ;)

Logically i decided to go with lvm.

The boot partition is shared with every system. Every other partition is on lvm. Every root partition is unique, only /home, /usr/portage and /usr/src are on a separate lvm partitions as those can be shared as well.
First i've created a "base" gentoo system. Basically a stage3 system with some additional programs and a few important settings.
EMERGE_DEFAULT_OPTS is one of the most important in this case. In my case it looks like following:

1
EMERGE_DEFAULT_OPTS="--binpkg-changed-deps=y --binpkg-respect-use=y --buildpkg --buildpkg-exclude \"virtual/* sys-kernel/*-sources\" -k --jobs=3"

It tells portage to always use, if possible, binary packages and, except for kernel-sources and virtual packages, to always create binary packages. Since this setting is my base system it's in every clone of it. (as long as i don't change anything by hand)

And that's were binpkg-multi-instance comes into mind. Since every system access the same binary package store, but every system might have different use setting for a particular package, every package now only has to build once in any case!

Compiling is really funny right now, cause it looks quite often similar like here:

Conclusion:

Sure, the whole setup is of course a bit more complex and while this setup works really great there are a few things to mentioned. For example, the kernel(s) needs a few features in every system (like lvm snapshot, openrc and systemd - if i want to test both, which i do). Also since home is shared with every system, testing various window managers (like kde,gnome,xlqt,...) could mess up their configurations. Also having different arches (x86 and amd64) need adjustments to the base configuration. (but it's working too!)

Besides that i've also wrote a small script which does most of the work. It clones and installs (grub) any system at any moment even with a different file-systems if desired. (plus it can also encrypt a cloned system).
For example, basically all i have to-do is:
1
./sysed -q
This clones the actual running system with the actual file-system and size and creates an grub entry which is called "${lvm_name}_testing".
The script can also backup, restore, delete and edit my lvm systems.

I'm using this script quite often as it's really simple cloning a whole system in about ~2 minutes. So far i already have 14 amd64 and 2 x86 systems. Below a list of my systems (from lvs).

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  gentoo_amd64_acp               vg0  -wi-a----- 10.00g
gentoo_amd64_base vg0 -wi-ao---- 10.00g
gentoo_amd64_base_selinux vg0 -wi-a----- 10.00g
gentoo_amd64_base_systemd vg0 -wi-a----- 10.00g
gentoo_amd64_cinnamon vg0 -wi-a----- 10.00g
gentoo_amd64_enlightenment vg0 -wi-a----- 10.00g
gentoo_amd64_gnome vg0 -wi-a----- 10.00g
gentoo_amd64_kde vg0 -wi-a----- 10.00g
gentoo_amd64_kde_testing vg0 -wi-a----- 10.00g
gentoo_amd64_lxqt vg0 -wi-a----- 10.00g
gentoo_amd64_mate vg0 -wi-a----- 10.00g
gentoo_amd64_qtile_systemd vg0 -wi-a----- 10.00g
gentoo_amd64_sec vg0 -wi-a----- 10.00g
gentoo_amd64_secure vg0 -wi-a----- 10.00g
gentoo_x86_base vg0 -wi-a----- 10.00g
gentoo_x86_kde vg0 -wi-a----- 10.00g

binpkg-multi-instance had an big effect here, especially when trying things like abi_x86_32 or selinux. From now on i won't have to compile any package a second time anymore as long as i already build it once!

Big thx to the gentoo portage team!

Posts for Thursday, May 28, 2015

less portage rsync output

Ever thought how to silence rsync when doing emerge --sync (or eix-sync). Sure, it's nice when we get lots of information. But it's like with compiling packages - The first few years it's amazing looking at the terminal while it's compiling the latest stuff. However, after a while these things become a bit boring.
While we have --quite-build for emerge, rsync per default outputs every single file which gets transferred and deleted. Luckily, recent versions of rsync, which also went already stable, support new ways of progress output and since i use them already on other scripts i decided to modify my portage rsync settings a bit:

1
PORTAGE_RSYNC_EXTRA_OPTS="--info=progress2,name0,del0"

The output looks similar like that:


Neat, isn't it?
BTW, in order this works correctly - the remote rsync server need to run a recent version of rsync as well.

Planet Larry is not officially affiliated with Gentoo Linux. Original artwork and logos copyright Gentoo Foundation. Yadda, yadda, yadda.