Posts for Thursday, July 8, 2010

Portage post-GLEP-55 - Moving Forward

GLEP 55 support has now officially been removed from Portage-2.2_rc, and now I'm going to focus on working with Zac Medico to improve the much-in-need-of-love Portage codebase.

Zac and I already have some projects we are working on that should reduce the size of the on-disk Portage tree by an order of magnitude and be a great benefit for many users. Other spiffy improvements are also in the works.

I realize that there is at least one extremely vocal proponent of GLEP 55 who thinks that it is the best and only reasonable solution for the EAPI problem. I respect your opinion but think you are wrong in this case. Rather than engage in marginally productive debate, I'd rather just move forward with improvements to Portage.

If we can implement a solution to the EAPI issue, we can also implement solutions for other related areas of Portage as well. I don't see any problem overcoming the logistical and technical hurdles you have raised thus far.

Dell: the aftermath

In a previous post I outlined the ways in which Dell's customer service sucks. I finally got my computer yesterday, a Studio XPS 9000. Here are my first impressions.

The bad

  1. This computer weighs so much I almost hurt my back lifting it. I thought computers were supposed to be getting smaller and lighter?

  2. The HD indicator light is tiny and on the top of the case. I can't see it with my computer under my desk.

    The optical drive is behind one of those stupid plastic flap door things. So there isn't even an indicator light for the DVD drive. I'm seriously considering taking a screwdriver the case to fix this.

  3. It didn't come with a Windows install disk or a driver disk. It only has a recovery partition on the HD.

    I found an order form which I think will get me my disks. In the mail. Seriously, Dell? Seriously? Why not come to my house and kick me in the balls while you're at it?

    The recovery partition doesn't help you worth a crap if you want to do things like repartition your drive to put Linux on it. Windows7's sucky partition shrinking app wouldn't shrink it lower than 500GB.

  4. The Dell recovery program is called DataSafe or something, and when you use it, it tries to upsell you like crazy to get a "pro" version that has a bunch of useless backup features. Uggggghhhhh.

  5. The side of the case is white. In 10 years it's either going to be yellow with age, or scuffed up beyond hope. Kind of ugly, but I don't care much. The front of the case looks OK though. Black with red highlights. About as good as I could expect.

  6. It came pre-installed with some crappy "Dell Dock" knockoff of Apple's Dock. Worthless and instantly uninstalled.

    This thing caused the desktop icons to be hidden by default. Who would possibly want to do this in Windows? I can image everyone and their grandmother being awfully confused by the missing icons.

    When quitting this dock, it said "Undo is not possible". I love a program that has no going back once you quit.

  7. I wanted to find drivers for the wireless card that came with the Dell. So I went to Dell's support site and typed in the tag number on my computer. It gave me a link to drivers for the wrong card. I had to google all over the place to find the right ones. Way to go.

    Dell's website is a labyrinth full of outdated information and dead pages in general.

  8. The instructions I got with the computer reference Vista. I don't have Vista.

  9. There's a "Windows inside" logo on the case. It will be removed shortly. They leave an awful lot of glue behind.

The good?

  1. The i7 is about as fast as I had hoped. It only took a half-dozen cores and 12 GB of RAM to let me watch full-screen flash videos on Youtube. I feel so modern.

  2. The inside of the case is OK. There are a lot of hard drive bays and lots of extra screws. It should be easy to expand if I need to.

  3. It came with bloatware and crapware, but actually far less than I was dreading. And most of it was trying to sell you Dell crap.

    In the olden days you'd get a hundred links to AOL and other 3rd-party crap. I saw a link to Skype and the obligatory nag to buy an anti-virus subscription (fat chance), but not much else.

  4. Dell delivered the computer 2 days past the original estimated delivery date. So in spite of all the bullcrap and phone-jockeying I had to go through for billing, I can't complain about how fast it got here. Two days late isn't bad.

    I've heard rumors that these computers are built in Malaysia, and mine was definitely shipped from the US (per the Purolator tracking site). So I'm surprised they can get these things delivered as fast as they can given that it was shipped halfway around the world first, and had to go through Customs at least once coming from the US to Canada.

    Purolator was the only shipping option for Canada. I would've preferred to rush it. But maybe that's not possible given that it's coming from the US. Oh well.

  5. It runs pretty quiet, given how huge the fans are. We'll see how hot it gets once I start putting some load on it.

  6. It came with a DVI to VGA adapater and a DVI to HDMI adapter. I thought that was a nice touch, though it could be that they come standard with any nVidia card nowadays.

  7. Works OK with Linux. It took 20 minutes to set up. (Not counting wiping the Windows partition and re-installing on a smaller partition from my own copy, minus the crapware. That took over an hour.) Sound, video and wireless work out of the box in Linux. All 12 GB of RAM are usable, given a 64 bit OS. (I discovered this the fun way, by unthinkingly trying a 32 bit OS first.)

  8. It didn't burst into flames (yet).

  9. It has a peanut tray on the top. Or MP3-player tray, I guess. But I really want to put peanuts in there.

Brian, you're stupid!

So why did I get a Dell? Because I had good experience with them in the past, at home and at work. Given, that experience was 5 years in the past, and a lot can change. And I'm new to Canada, and relatively unaware of what options exist here.

The other (main) reason was that they were far, far cheaper than going through newegg.ca to get the same hardware. But I guess you get what you pay for. Caveat emptor.

I wouldn't recommend Dell to anyone else, given how chaotic the whole buying process was. Too much uncertainty, too much room for mistakes.

Dave asked in my previous post why I didn't just a computer myself, like I had in the past. I said I didn't have time, but what I meant wasn't build time, which should be an hour or two max. I meant research time. Trying to match up compatible hardware, trying to find the best prices on all the components, checking for Linux compatibility, this takes forever and a half. I don't have hours / days to dork around with this any more.

On the other hand I can just google "xps 9000 linux" and see instantly what problems people had. I can be semi-confident that the hardware would all be compatible. And that did work out OK.

And the last reason I got Dell is that unfortunately I need Windows for work and gaming. Blarg. Paying the Windows tax to Dell is bad enough, let alone buying one off the shelf for $6,000 or however much they cost nowadays.

Sorry for the downtime

This isn't the kind of email I like to see when I wake up:

Our backend monitoring system has detected an error on the host where your Linode resides which could lead to a failure condition. In order to protect your Linode, we have scheduled an emergency migration to a different host which will commence shortly. Please note that there is currently no issue with your Linode - this is a proactive measure we are taking to avoid an issue in the future.

We apologize for any inconvenience this may cause. Should you have any questions or concerns, please do not hesitate to reply to this ticket.

The server rebooted and my sites didn't come back up, so there was some downtime. Sorry about that.

Paludis 0.48.1 Released

Paludis 0.48.1 has been released:

  • We now work around broken TLS support in certain monkey-patched GCC versions.
  • Various minor bug fixes.
  • Working compiler support for ‘extern template’ is now required.
  • The PALUDIS_IGNORE_HOOKS_NAMED environment variable can be used to skip executing hooks from specific files.

Filed under: paludis releases Tagged: paludis

Why GLEP 55 is a Good Idea and thus a Bad Thing

As it seems that ill-thought-out GLEP 55 bashing is back in fashion again, I thought I’d explain why GLEP 55 is a good idea, and thus a bad thing for Gentoo.

In short, GLEP 55 proposes moving the EAPI from being ordinary ebuild metadata to being encoded as part of the filename (e.g. .ebuild-4). The advantages of this are:

  • It allows global scope changes that affect metadata generation to be made. Right now, it is impossible for EAPIs to add new global scope functions, and it is impossible for EAPIs to change the behaviour of existing functions, since all ebuilds must be able to be sourced and have at least the EAPI part of their metadata generated by a package manager that does not support newer EAPIs.
  • It allows changes to the versioning rules. New EAPIs cannot introduce new package version formats or fix arbitrary limitations with existing formats, since as soon as an older package manager encounters what it thinks is an ill-formatted version, it will produce a noisy, user-visible warning. With GLEP 55, instead these new versions will be invisible.
  • It allows ebuilds to use newer bash features in ebuilds without breaking older package managers.
  • It makes a package’s EAPI consistently defined throughout the ebuild. Right now it’s legal to set EAPI deep inside a nested chain of eclasses (so long as doing so doesn’t break metadata invariance rules), which means any code encountered before EAPI is set will have a false impression of what the eventual EAPI will be.

Because it’s currently impossible to add per-package eclasses (due to the first item), ebuilds are doing all sorts of stupid things to get the same effect. Because the second item means it’s impossible to add sane versioning for SCM packages, maintainers are making do with lots of 1.2.9999 versions that don’t quite work with dependencies properly. Due to a combination of the first two, I had to write the hideous abomination that is the versionator eclass because it’s impossible to let Portage support versions like 1.23-alpha1 directly, and impossible to add package manager provided version parsing functions that can be used in global scope.

The third is a continual source of problems, as developers frequently accidentally use newer bash features, thus screwing up the upgrade path for anyone who hasn’t updated their box for a few months.

As for the fourth, it’s highly discouraged from a QA perspective these days, but people have set EAPI via an eclass in the past. This one’s more an illustration of icky design than anything else.

Because GLEP 55′s author once tried using a package manager that isn’t Portage, and is thus irrevocably tainted, some alternative ways of handling these deficiencies being proposed.

Alternative One: Repository Capabilities

First, it is suggested that repositories specify, at the repository level, their requirements (presumably via layout.conf). The implications of this are:

  • We can’t start using anything new until we’re sure that everyone is using a compliant package manager. That means yet another two year wait before any of this goes anywhere.
  • New capabilities can’t be used in the main tree for at least a year after their introduction, because users must be able to upgrade their package manager on a box they haven’t touched for a while.

This means that the cost of introducing a change is extremely high, and so developers will be encouraged to continue using bad solutions rather than fixing things properly.

Also, if repository capabilities replace rather than supplement EAPIs, as some have suggested, then it becomes impossible to change a repository’s capabilities without rewriting every single ebuild to the new standard. For example, one could not turn on a strict-s-checking feature without first checking every single existing package and fixing any that rely upon the older “S doesn’t have to exist” behaviour. Certain developers have proposed branching the tree once a year and rewriting everything to do this; quite how they plan to find the manpower to pull this off has yet to be answered.

Thus, assuming a mass rewrite is out of the question, repository capabilities must be combined with another solution to be of practical use, which brings us to:

Alternative Two: Magic Markers

Second, it is suggested that ebuilds include some kind of magic marker to allow their EAPI to be determined without going through the usual sourcing process. This does absolutely nothing to fix the version formats problem, and so must be combined with repository capabilities anyway. On top of that, none of the proposed magic marker methods are particularly pleasant.

Magic via Fixed EAPI Strings and Parsing Not Using Bash

The first kind of magic marker proposal is to require that ebuilds specify the EAPI string in a particular, fixed format. Proposals are typically along the lines of “it must be specified within the first N lines, and it must be exactly in the form EAPI=X”. So, depending upon the exact wording chosen, none of the following would be legal:

# Copyright 1999-2010 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

EAPI='4'

(Single quotes might not be legal)

# Copyright 1999-2010 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

EAPI="4"

(Nor double quotes)

# Copyright 2010 Joe Bloggs
# Derived from foo-1.23.ebuild from Gentoo, which is:
#     Copyright 1999-2010 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

EAPI=4

(Too many comment lines at the start)

# Copyright 1999-2010 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

    EAPI=4

(Indenting isn’t allowed)

# Copyright 1999-2010 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

inherit eclass-that-sets-eapi

(Eclasses can set every metadata variable except EAPI)

# Copyright 1999-2010 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $

SLOT=1
EAPI=4

(EAPI must be first)

To make matters worse, this exception would only apply to the EAPI variable, even though in every other way EAPI would remain a normal metadata variable. This restriction would not necessarily be enforceable by the package manager either, so sometimes screwups would lead to weird results rather than errors.

Also, existing ebuilds don’t follow this format — all of the above examples have occurred in real code. Thus, before switching to this, the entire tree and every single overlay everywhere would have to be fixed.

Magic via Special Comment Strings

The second magic marker variant is to shove a special comment line at the start of every single ebuild containing EAPI information. Again, this would require updating every single existing ebuild, and would also require yet another huge wait before it becomes usable.

This sort of resembles what’s done in certain fixed binary formats. However, ebuilds aren’t fixed binary formats; ebuilds are scripts. In addition, fixed binary formats that do this were designed from the start with different versions in mind; the ebuild format has instead been gradually grown over time, often by people who actively oppose any kind of standardisation.

Magic via Extended Attributes

The third magic marker variant is to store the marker in extended attributes, rather than in the file itself. This means:

  • It becomes impossible to store ebuilds anywhere that doesn’t support extended attributes, such as in most version control systems, on pastebins, on bugzilla and on pastebins.
  • The EAPI is effectively invisible except to people using obscure tools.

And, again, a huge wait.

Alternative Three: eapi Function

Third, make eapi a function. This will allow early exit for unsupported EAPIs, so global scope changes can be made so long as they occur after the eapi function is called. Again, this means a huge wait, and again, this requires an additional proposal to fix version formats.

Why none of this matters anyway

In other words, the choices come down to:

  • Put the EAPI in the filename. Be able to use new features straight away, including in the main repository.
  • Introduce repository capability support to package managers. Wait a year or more. Every time you want to use a new capability, release package manager support for it, then wait a year before you can use the new feature. In addition, either rewrite the entire tree every time you change capabilities, or also introduce a second solution.
  • Introduce some kind of magic marker. Rewrite every existing ebuild, including the ones in overlays, to use this marker. Wait a year. Still be unable to change version format rules, unless a second solution is also introduced.
  • Move eapi to be a function. Wait a year. Still be unable to change version format rules, etc.

Or, of course, the usual Gentoo solution can be applied: do nothing. Continue to be unable to support per-package eclasses, and continue adding extensive convoluted hacks into ebuilds to try sourcing things in FILESDIR to get the same result. Continue to use silly 9999 version numbers, which can’t be used in dependencies correctly, to support version branches of scm packages. Continue to arbitrarily not support certain upstream version formats just because they use -alpha instead of _alpha, and require ebuilds to carry on doing transformations to deal with this. Continue to reject proposals based upon the author rather than technical merit.

It’s obvious that doing nothing is the way best suited to Gentoo’s needs. Gentoo’s main priority is to avoid making any change that is in any way controversial or that could in any way be associated with the wrong people. Delivering a better product to users or making things easier for developers is irrelevant. Thus, maintaining the status quo whilst vaguely mentioning alternatives that can’t and won’t ever happen is the way forward. If things get desperate, and it starts to become impossible to fire any developer who asks for better tools, then at that point one of the “wait several years” solutions can be pulled out as proof that “something is being done”.

At this point, all that switching to a solution that would enable changes to be made would do is illustrate that Gentoo can’t deliver even the simplest of the changes developers need. By repeatedly hovering around proposals that will require long waits before their introduction, Gentoo has a plausible-looking excuse to mask an almost complete lack of progress in Portage; if GLEP 55 were to be introduced, developers might start asking why they still don’t have per-package eclasses, less arbitrarily restricted version numbers, package manager provided version parsing helpers or the ability to use newer bash features in ebuilds.


Filed under: gentoo Tagged: gentoo, glep 55

Posts for Tuesday, July 6, 2010

Hello again... and Why GLEP 55 Is A Bad Idea

Uh... hello there. I haven't posted to my blog lately. So what has been up in Funtoo land? Not much? Actually, quite a lot - tons - I've just been neglecting my blog. I'll try to catch you up in upcoming blog posts.

One very recent thing that I wanted to blog about is that I am going to be more active in helping to define the future of Portage. So I thought I'd start by telling you why GLEP 55 is a bad idea.

GLEP 55, currently in draft form, is somewhat supported in the current Portage 2.2_rc source code. It specifies a new ebuild file name extension that isn't just ".ebuild" - but ".ebuild-[number]<number>, where [number] <number>specifies the EAPI version, which is the revision of the ebuild API that is supported by the ebuild.</number></number>

The rationale is that Portage currently can't figure out the EAPI of the ebuild until it sources the ebuild using bash, and it needs to know the EAPI of the ebuild *before* it sources the ebuild, otherwise problems might occur when it sources the ebuild. So it can't use the current mechanism of sourcing the ebuild to find out what EAPI it uses. It can lead to problems. So far, this is true.

Another rationale for GLEP 55 is GLEP 54, which proposes the "-scm" extension to ebuilds. This extension is presumably going to be phased in in a future EAPI. The GLEP 55 proposal claims that Portage is going to need to figure out the EAPI of the ebuild even before it can try to grok the *version* specified in the ebuild filename, and therefore, the EAPI must be in the filename too...! Wow! Really? This certainly isn't true.

But before I go into more detail, let me digress for a moment and speak about a general observation I have regarding file formats. Did you know that there are actually many different versions of .jpg and .png files, but they all use the same file extension? This is true of many other different file types as well. Why is this? It appears that people have figured out a way to solve this problem in a more elegant way than GLEP 55 proposes. I'd like to also have Portage apply an elegant solution to this problem so that we do not need to uglify .ebuild filenames.

Here's another observation I have. UNIX already has a solution to this problem - used for shell scripts. You've probably noticed that the beginning of a shell script looks like this:

#!/bin/sh

...and likewise, the beginning of a Python script will often look like this:

#!/usr/bin/python

The GLEP 55 proposal suggests that using a mechanism like this to identify the proper interpreter for a file actually "hurts performance." Really? Well, in a general sense it does not, but I suppose if you think that you need to figure out the EAPI before you can properly parse the version string contained in the filename (which isn't true,) then maybe you might convince yourself that there are performance problems.

However, you don't need to figure out the EAPI before you parse the version string.

Let's take a look at a more elegant solution, shall we?

First, GLEP 54. The -scm extension should not be phased in via EAPI but as a Portage Repository-specific capability. In other words, a Portage repository-specific configuration file can define what capabilities are enabled in that particular Portage repository, and if GLEP 54 is used in the repository, a flag will be switched (in a file such as layout.conf, which contains repo-specific settings) to indicate that ebuilds can contain -scm extensions. Using this approach, we view the versions found in ebuild filenames, and the format that they use, as a *repository-specific* capability. GLEP 55 should really have focused on adding Portage Repository capabilities, which is a needed feature in Portage, and this would have paved the way for GLEP 54 without adding wacky ebuild filenames.

If we move the -scm extension in GLEP 54 so that it is a repository-specific capability, we can now figure out how to grok the version of the ebuild without having to know its EAPI. This solves the potential performance problem.

Now we can revisit the time-tested approach of storing the EAPI inside the filename, using the first few bytes of the file, something like:

#?ebb4?

This could mean "*EB*uild in *B*ash format, EAPI *4*." I am using a "?" instead of the more common "!" so that the following characters are not misinterpreted to be the path to an interpreter. The initial "#" is pretty much universally supported as a comment string. I bet you could probably think up an even classier-looking initial few bytes, or even a better, more feature-rich format for the first few bytes of the file that could be used for other things.

So there we go. No need to add the EAPI extension to each ".ebuild" file.

The only reason to add any additional data to a ".ebuild" filename is so that two files can exist independently on disk, next to each other. Since we're never going to need to have EAPI 3 and EAPI 4 versions of the same ebuild sitting alongside one another on disk, or in a git repo, there is no need to put EAPI data in the filename.

So that's why GLEP 55 is a bad idea. Portage Repository capabilities should be implemented instead.

Posts for Monday, July 5, 2010

In which border-radius is abused

I threw together a new blog layout today. I scaled back the level of cows a bit (just a bit, don't worry!) Criticism / feedback welcome. (IE-related feedback should be dropped off in the circular file by my desk.)

In what is surely a prelude to the future of the internets, I'm abusing border-radius pretty heavily in this layout. border-radius is likely to become the new marquee HTML tag or text-shadow CSS attribute: Maybe an OK idea at first, but then everyone uses it so much it makes your eyes bleed.

So I figured I'd best get my border-radiusing in early while it's still cool. IE8 users, you still get pointy corners. Sucks to be you.

Also, if you have any ideas for features I should implement for cow-blog, please let me know. I've been crawling the internet looking at blogs for ideas of things to implement and features to steal, but I'm running out of ideas. It does everything I want now, but I'm not a reader.

Posts for Sunday, July 4, 2010

Neglecting my blog

For a variety of reasons, I haven't blogged in a loooong time. So what have I been doing?

Firstly, Abbey gave birth to Theo Ray Marples on 3rd April 2010. Secondly, I have joined a  World of Warcraft  guild that raids  Icecrown Citadel on a regular basis (8/12 progression). These two facts have taken up a lot of my free time I normally devote to working on my OSS projects.

But do not fret - the few bug reports I've received have all been addressed and I have been releasing new versions when required. However, since I no longer use Gentoo I will not be maintaining  OpenRC anymore. That will need a new home. A big thanks to all the people who helped me work on it over the years :)

I may start blogging about  Twisted Fire, my Warlock which I play in Warcraft. I may also find the time to work on IPv6 support for  dhcpcd, which is hopefully being updated in  Debian from the ancient  dhcpcd-3 it currently uses.

avatar

Off to Toronto July 14-28, Archcon

As mentioned earlier, I'll be at Archcon in Toronto in a few weeks.
It's a very small conference, and the first of its kind. At the last FrOSCon we have been playing with the idea to hold an informal Arch conference in Europe, but those were just ideas. Dusty and Ricardo beat us with an actual implementation.
This is great, and one of the milestones in Arch Linux history. Which is why I want to be there and help making it better.

Archcon schedule. I'll be talking about AIF and uzbl.

My schedule:

  • wed. July 14: departure
  • July 14 - 21: hanging around in downtown Toronto. Not sure yet where I'll stay. probably some hostel.
  • July 21 - 28: bed and breakfast pretty close to the conference location, outside the city center
  • July 22 - 23 are the conference talks days, July 24-25 are the conference tourist days, where we'll do some touristy stuff with the conference participants
  • wed. July 28: flying back

There are many things to do in and around Toronto. We still need to make the planning, but the Niagara Falls are definitely on the list.

Finally, big thanks to Elysium Digital, a technology oriented consulting company for legal matters. Apparently they use Arch Linux for many things: on the desktop, in the server room, and in the forensic lab. They emailed me to propose sponsoring my plane tickets, and they did.

Posts for Saturday, July 3, 2010

Intro to programming with Falcon

I’ve spent a lot of my free time lately tinkering with Falcon. I’ve written a very small, and quite unusable, amount of code for the language, and I’ve written a couple of scripts using it as well. Most of those don’t work either. But the ones that do tend to work very well and I enjoy the time I spend with it. So to help me remember and to potentially help someone else learn I’m going to start writing my findings with it here.

Getting Started

Getting started for me always requires setting up my editor for use with whatever language I’m using. I try to use Vim as much as possible; so that generally means I have little to nothing to do. Languages like C, Ruby, Python, etc. are so finely tuned within Vim these days there is almost nothing left to setup. Falcon, on the other hand, is still very new and when I first starting playing with it had no files for Vim. I did end up finding a very basic syntax and indent file for Vim in the Falcon SVN but they were by no means good enough. So I’ve spent a lot of time making the language work in Vim. In fact I have probably spent twice the amount of time working on the Vim config files than I have programming in the language itself! To get started with Vim I suggest you check out my Github repo. The syntax is fairly feature rich at this point. It does just about everything I want it to do. The indent file on the other hand is still pretty bad. I mean it works, but it could do a lot better. If anyone is aware of other comparable vim scripts for Falcon, I’d love to compare notes!

Hello World

The hello world is, of course, the default program for every language. In Falcon it’s as simple as ever, and of course, like most languages, we can spend all day figuring out different ways to write it. I’ll keep it simple though.

#!/usr/bin/env falcon

> "Hello World!"

Finished!

Enjoy the Penguins!


Posts for Friday, July 2, 2010

Dell sucks

Why did I order a computer from Dell? I guess I had a good opinion from 6 years ago when I last bought something from them.

Let's count the ways in which their customer service has failed me. (And my computer isn't even here yet.)

  1. As documented, their website couldn't process my credit card without a phone call.

  2. After a week of my computer being "in production", I started getting more phone calls from an unidentified phone number that Google told me was Dell. Fearing another billing problem, I called back. And I was told "Thanks for calling, but our order tracking system is down. And we're all going home. Call back tomorrow morning.".

    If only Dell had some means to acquire reliable computer systems on which to build their order tracking database.

  3. I called the next day and was told my order was fine. I was also told (per script, I'm certain) that I could check my order status on Dell's website. Which of course I knew. I know it costs the company money every time someone calls, and they try to strongly discourage calls for that reason, but their script made it sound like I was an imbecile.

    I found it quite condescending. I dislike these canned scripts pander to the lowest common denominator of customer. They should be happy to take my call. I just spend upwards of a thousand dollars on their crap.

  4. Turns out the phone calls I was getting were from someone trying to give me "free internet from Shaw or Telus for 3 months", and I was eligible because I bought a Dell computer. So I was being telemarketed before my computer even got here.

    I said I already had internet service, and they said "Oh, too bad, it's for new customers only." I do not appreciate this.

  5. I got an email saying my order shipped. Joy! 20 minutes later I got an email saying my order was delayed, and if it didn't ship in 5 days I should call. What?

    It really did ship though, I have a tracking number. Why the contradictory emails?

All of my phone dealings with Dell were via some offshored far-eastern country, judging by the accents of the phone reps. I have nothing against this in principle; I'm not a xenophobe. But the phone connection is always so static-filled and laggy that it really puts a damper on communication.

My computer isn't here yet, and I just hope to God it works and doesn't break in a month. I kind of wish this article had come out a week earlier.

That'll teach me for trying to save time, I guess. Next time I'll build my own system from scratch. Dell goes onto my List of Companies Not to Buy From in the Future (LCNBFF), along with Westinghouse and oh so many others.

Posts for Thursday, July 1, 2010

Kate syntax highlighting for Linux New Media articles

Since I'm occasionally writing articles for Linux New Media and they have this bogus syntax they expect you to follow when writing articles, I decided to put an end to my suffering.

Most of the time I use Kate for writing articles (I use Vim mainly for administration). Therefore the logical solution was to wrap those syntax rules into a highlighting file for Kate. Now I can finally make heads and tails out of the whole mess!

So, in case you write for any of their magazines[1], you can now download my Kate syntax highlighting file and enjoy the goodness :]

BTW, if you want to write your own syntax highlighting for Kate, check out its online documentation and/or this article. Taking a peek at the already existing XML files in /usr/share/apps/katepart/syntax/ and ~/.kde4/share/apps/katepart/syntax/ might be a good idea as well

Someday I may even write the same highlighting for Vim.

hook out >> sipping tea and writing an article about FOSS solutions to cloud computing for Linux Magazine


[1] Linux New Media are releasing a huge amount of GNU/Linux magazines all over the world. Amongst others: Linux Magazine, Linux Pro Magazine, EasyLinux, Linux User, Linux Technical Review and many others.
<!--break-->

Posts for Wednesday, June 30, 2010

I am an edge case

I am an alien. An American who emigrated to Canada. This has resulted in a lot of fun and a bit of pain as I've managed to break the systems of many of the businesses I deal with.

As a programmer I can appreciate the importance (and sometimes difficulty) of handling edge cases. It's been an interesting experience living as an edge case myself.

H&R Block(heads)

Taxes are confusing enough when you don't have a wife from another country. The friendly folk at H&R Block had no idea how to handle my situation. Their computers demanded a Social Security Number for my wife, which she doesn't have, because she's Canadian. So they left it blank. (Leaving it blank was the consensus opinion of everyone at H&R Block, including the managers.)

To even be able to leave it blank, they had to print the forms, because the computer refused to submit my taxes electronically with a blank SSN. This should've been a red flag to me, looking back.

The correct thing to do was for her to get an ITIN from the US, which from what I know is a like an SSN that doesn't get you SS benefits or allow you to work in the US. It's just for tracking. But they didn't tell me that.

Moral of this story: That's the last time I'll ever use H&R Block. If I want to do my taxes wrong, I can do that myself for free.

IRS

The IRS refused to believe that my wife existed without a number to assign her, so they rejected my tax return and threatened to charge me tons of penalties. So I drove down to the friendly neighborhood IRS office. Surely they'd know how to fix this, right?

Wrong. The fellow at the IRS was familiar with filing taxes for Mexicans living in the US, but not for a "non-resident alien Canadian spouse" like mine.

The ITIN docs say that I need to submit a notarized copy of my wife's passport, to get her an ITIN. But notarized by whom? By a notary in Canada or the US? The IRS agent spent at least an hour reading his enormous IRS manual, looking up treaties and international law, trying to figure this out. Eventually he found a footnote scribbled into the margin of his book that said a Canadian notary was OK. So that's what I gave him.

He filled out the new tax forms himself, stapled on a photocopy of the page from his own IRS manual saying the Canada-notarized passport was OK, stamped everything all official-like, mailed it away himself, and... a month later it was rejected again. I needed to get a US notary to do it, or my wife had to drive to the freaking capital of her province to get it super-notarized or something. Nine months later and 3 more trips to the IRS office, I finally got it worked out.

As a US citizen, living in Canada, working for a US company, being paid by a Canadian payroll company, I live in constant fear of doing my taxes next spring.

Moral of this story: Even the IRS doesn't know their own tax code. Thanks again, US Government.

No credit

I have good credit in the US. I was able to get a car loan without a co-signer 6 years ago. I have credit cards.

In Canada, I don't exist. My credit score here is zero, as you might expect. I tried to buy a car, and they just flat-out wouldn't let me without a co-signer for a loan. I guess I can't blame them. (Except that I have money, they have something I want to buy, and we both ended up losing out.)

I went to the bank to open a checking (er, "chequing") account, and all hell broke loose. They wanted my business, but how could they justify giving an account to someone with "no credit"? Eventually the bank manager managed to run a US credit check on me (at least she said she did) and they let me open an account.

But I can't get a credit card here. Not even a $500-limit high-interest card like they give to kids in college. Not even after I had the bank write a letter of recommendation vouching for me. I have to wait a year and keep getting paychecks before I show up in the system. ("Paycheques"?) I'm still looking for other options in the meantime.

Moral of this story: Well, no moral. Sucks to be me, I guess.

Shopping

I tried to buy a computer online recently (from Dell, which I'm starting to regret). Not having a Canadian credit card, I used my US card. This is OK, my card works up here, with a small foreign transaction fee of %1-3.

But the website wouldn't take a US address as my billing address. I had to give a Canadian address. "Province" was a drop-down, mandatory field.

No matter, I'll just go to my bank's website and change my billing address to one in Canada. I checked with my bank before I moved, and they said it was no problem to keep the account even if I moved to another country. "We have lots of international customers!", said the teller.

Well, my bank's website won't let me specify a non-US billing address. "State" is a mandatory drop-down. Which is awesome. I emailed the bank and asked them to change my address for me, and they did. Now when I go to the "change my address" form on their website, half the fields are filled in and half are just blank. So if I ever use the form again, something will probably break.

Once I got on the phone with Dell, they were more than happy to take my US credit card as payment. Their online form just couldn't handle it.

Moral of this story: Text fields, not drop-downs.

An so on

When I fly across the border, I have to fill out a customs declaration form. There's a field asking "country of residence". Well, technically I am a resident of two countries. So I pick "US" when I fly down to the States, "Canada" when I fly back.

I tried to get a Costco membership in Canada, and they wanted a driver license. My license is from Oregon. Hilarity ensued, and they decided I didn't need one after all. Any time I need to give a driver license for any purpose, I end up breaking something.

Note that in every situation I've described, what I was trying to do was valid, and after some hassle, everything usually worked out OK. Computers just got in the way and slowed the process way down. And I'm not that much of an edge case. 250,000 people immigrate to Canada every year.

I suppose it might be better to optimize for the common cases, force people to pick their province from a drop-down. And then deal with the edge cases like mine manually later. Every text field is another opportunity for users to type in gibberish and chaos. But I wonder if the programmers actually thought about it this much, or if they were just being lazy.

I'm not really complaining. I don't expect the world to change to accommodate me. It's been more funny than annoying. But I do find it interesting to see the flaws in computer systems exposed. I get a certain sick satisfaction out of seeing people write "invalid" values into fields that I know are going to break someone's database down the line.

no more crashes after resuming from pm-suspend

motivation

source hp.com

hp elite book 8530w, source hp.com

i own a hp elitebook 8530w which is a good notebook if used with linux. it does have a nvidia card:

01:00.0 VGA compatible controller: nVidia Corporation G96M [Quadro FX 770M] (rev a1), according to lspci.

i use it with Linux ebooK 2.6.34-rc5 #4 SMP PREEMPT Wed Jun 30 10:48:22 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz GenuineIntel GNU/Linux.

ever since i bought this laptop it often stuck at resume,  using pm-suspend (sys-power/pm-utils).

it’s probably a x11-drivers/nvidia-drivers issue but i’m not certain since usually i can’t see anything on the screen (the screen remains totally black, no backlight) but the computer isn’t accessable over the network (can’t login via ssh, after a broken resume). so today i wanted to fix that by installing the nouveau driver. that failed however. but it seems that i fixed the resume issue.

HINT: that means i can use the proprietary driver, as it is not a problem anymore.

HINT: that also fixed the problem that after the first pm-suspend (or hibernate-ram) the consoles alt+f1 … to … alt+f12 weren’t accessable as they were blanked out completely.  i might experiment with nouveau soon, but for now let’s blog how i fixed the resume issue:

updating pm-utils

my old version: sys-power/pm-utils-1.2.5 -> pm-suspend
my current version: sys-power/pm-utils-1.3.0-r2
emerge pm-utils

[ebuild  N    ] sys-power/pm-quirks-20100316
[ebuild  NS   ] app-text/docbook-xml-dtd-4.5-r1 [4.1.2-r6, 4.2-r1, 4.3-r1, 4.4-r2]
[ebuild     U ] sys-power/pm-utils-1.3.0-r2 [1.2.5]

used with x11-drivers/nvidia-drivers-195.36.24
after updating pm-utils there was no crash after resuming. which felt like a miracle! however, the problem seems to be odd. after updating the crash was gone but the backlight was very dim. in fact it was the lowest setting possible. and the other problem was that i could not find any tool to fix that. searching in wikis and blogs for ages i remembered how i did it last time using my thinkpad (i had similar problems there).
there is a kernel interface for controlling the screen brightness using acpi.

how to increase the brightness?

ideal solution: using the Fn+F9 and Fn+F10 keys
ok, first check if these keys are mapped via acpi:
as root, i type:
“acpi_listen”
fn+f9    video/brightnessdown BRTDN 00000087 00000000
fn+f10   video/brightnessup BRTUP 00000086 00000000
fn+f11   –nothing–
so fn+f11 seems to be connected directly (i don’t know how that exactly works) but the other two keys work fine with acpi_listen. so we can use these!
NOTE: i had the ‘ambient light sensor’ disabled using the BIOS but this didn’t help either. fn+f11 will switch to using the ‘ambient light sensor’ but somehow i usually think it’s still to dim most of the time. it is far better on the ‘mac book pro’.
maybe nvidia-settings? NO! it does not control that.
even restarting X didn’t fix it. and restarting the computer not either!?

what next? -> the acpi interface

the acpi interface was incomplete as there was no /proc/acpi/video/
after endless googling i realized that hp-wmi and wmi (CONFIG_HP_WMI=m) support was already included in my kernel as modules and even loaded but:
CONFIG_ACPI_VIDEO=m
was missing!
so i added it, reinstalled a new kernel with the proper modules and after a reboot the video interface was there:
/proc/acpi/video/DGFX/
so experimenting with it:
cat /proc/acpi/video/DGFX/LCD/brightness
levels:  0 5 10 15 20 25 30 33 36 40 43 46 50 55 60 65 70 75 80 83 86 90 93 96 100
current: 30
now, let’s increase that value to the maximum:
echo “100″ > /proc/acpi/video/DGFX/LCD/brightness
and i’m finally there! i can actually see what i’m writing! ;-)

how to integrate this into the system?

we will combine the acpi with the /proc interface to set a proper brightness level. a nice reference for  doing so can be found at [1].

i’ve written a script in /etc/acpi/events/default first but it didn’t work!
NOTE: i did not restart acpid, this was probably the problem
anyway: i integrated the events into /etc/acpi/default.sh instead. this is better as one can check that for syntax errors and one can experiment with it, example:
first do a:
acpi_listen
and then create an acpi event, as for instance pressing fn+f9 (on my elitebook) to find how the event is called (which gets generated)
for instance: pressing fn+f3 here generates a “button/sleep SBTN 00000080 00000000
note this down, or copy it with the mouse. then invoke the default.sh script using it as parameter, leave the cryptic numbers aside. example:
./default.sh button/sleep
if this manual invocation works (no syntax errors in default.sh reported) you can also start using the acpi event by pressing fn+f3 (instead of manually invoking the script).

the /etc/acpi/events/brightness script

if [ "$1"x == "up"x ]; then

X=$( cat /proc/acpi/video/DGFX/LCD/brightness | grep current | awk ‘{print $2}’)

Y=$(echo  ”$X+20″ | bc)

for i in `seq $X $Y`; do

echo $i > /proc/acpi/video/DGFX/LCD/brightness

done

fi

if [ "$1"x == "down"x ]; then

X=$( cat /proc/acpi/video/DGFX/LCD/brightness | grep current | awk ‘{print $2}’)

Y=$(echo  ”$X-20″ | bc)

for i in `seq $X -1 $Y`; do

echo $i > /proc/acpi/video/DGFX/LCD/brightness

done

fi

the /etc/acpi/default.sh script

#!/bin/sh
# /etc/acpi/default.sh
# Default acpi script that takes an entry for all actions
set $*
group=${1%%/*}
action=${1#*/}
device=$2
id=$3
value=$4
log_unhandled() {
logger “ACPI event unhandled: $*”
}
case “$group” in
video)
case “$action” in
brightnessup)
/etc/acpi/events/brightness up
;;
brightnessdown)
/etc/acpi/events/brightness down
;;
esac
;;
button)
case “$action” in
power)
/sbin/init 0
;;
# if your laptop doesnt turn on/off the display via hardware
# switch and instead just generates an acpi event, you can force
# X to turn off the display via dpms.  note you will have to run
# ‘xhost +local:0′ so root can access the X DISPLAY.
#lid)
#       xset dpms force off
#       ;;
sleep)
/usr/sbin/pm-suspend
;;
*)      log_unhandled $* ;;
esac
;;
ac_adapter)
case “$value” in
# Add code here to handle when the system is unplugged
# (maybe change cpu scaling to powersave mode).  For
# multicore systems, make sure you set powersave mode
# for each core!
#*0)
#       cpufreq-set -g powersave
#       ;;
# Add code here to handle when the system is plugged in
# (maybe change cpu scaling to performance mode).  For
# multicore systems, make sure you set performance mode
# for each core!
#*1)
#       cpufreq-set -g performance
#       ;;
*)      log_unhandled $* ;;
esac
;;
*)      log_unhandled $* ;;
esac

links


Ricers and logic

The first post on this blog was called “Why Gentoo” and outlined my reasons for using that distribution. Today I was talking to Flameeyes out the context of his post on the recent libpng-1.4 stabilization. The problem with Gentoo is, was and probably always will be its image and parts of its user base.

On Gentoo you compile every package according to your own setup, allowing you to easily disable features in software you don’t want or need. This is killer feature one. The other killer feature is that it’s bloody simple to bring new software into Gentoo with ebuilds: They are trivial to write and can handle basically every kind of package. Awesome.

But recently we see some talk about how compiling stuff with Gentoo is so much faster than all the other distros. And that really pisses me off because it is retarded.

It reminds me of the “portage” vs. “paludis” discussions. For those with less knowledge about Gentoo’s past: When some people decided that the package manager “portage” was too slow (portage is written in Python) they started using paludis (a package manager written in C++). The same people would then brag that searching for a package would take 5 seconds less that with portage (or similar numbers). The point is: paludis takes ages to compile, how often do you have to search in order to have a net benefit?

Let’s do some math. You have two programs Prog_A and Prog_B. Prog_A takes 10 seconds for a job, Prog_B only takes 5 seconds. On the other hand Prog_B takes 30 Minutes longer to compile than Prog_A. If we save 5 seconds on every execution of Prog_B we are saving time if we run Prog_B more than 360 times. “Oh yeah”, you say, “I probably use a lot of software that often!” but think about updates! This equation holds water as long as you use Prog_B more than 30 times without updating.

Now look at some programs you might really run a lot. Mozilla Firefox (including its runtime Xulrunner) takes quite a lot of time to compile and do you really think considering all the network laggyness in its domain that whatever you compile makes such a big deal speed-wise? Do you compile OpenOffice yourself and spend hours upon hours on that? How fast do you type? I mean seriously?

Gentoo is not about speed. It’s a meta-distribution that allows you to build your own custom system for exactly your use case. And that’s not about setting retarded CXXFLAGS in order to have some package manager run in fewer milliseconds.

avatar

Command-line warriors, part one

TerminalThis is a post about some things you might have used a graphical tool for in the past, but which can be done just as well using command-line tools. Since I keep finding these little gems I hope I can continue this kind of post in the future. I’ll try to categorize the tips and I’ll only post the functionality I recently discovered and use myself, so this is not a reference for any of the programs mentioned.

Photography
You accidentally selected “Delete All” on your digital cameras and now your photos are gone? You even fear they might have been overwritten by photos taken after the accident? Fear not, the same thing happened to me, and I was able to restore all the photos from the event plus photos going back as far as three months from my CF card in my 400d using the PhotoRec software.

Cameras today often have an orientation sensor, so pictures taken in portrait mode will automatically be rotated. But this usually happens in the viewer, while the orientation is simply an EXIF tag. Of course this takes up precious computing time and not every viewer supports it. jhead can not only display EXIF information in a consistent manner but can also rotate pictures lossless and clear the orientation-tag afterwards. The call is jhead -autorot *.JPG
If you want to rename your images using EXIF-information, exiftool is the way to go. The command exiftool '-FileName<Party_ ${CreateDate}_${filename}' -d %Y%m%d *.jpg would prefix all the photos with “Party” and the create-date, while keeping the original filename as the suffix. exiftool can shift dates too, if you ever forget to adjust for daylight saving.

When uploading the images you’ll often want to resize them. The ImageMagick collection can do just that, and many other things (like sharpening etc.). If you mogrify -resize 1600x1600 your photos will be resized to 1600px maximum edge length, while keeping the ratio. But be careful, mogrify overwrites the pictures in place!

Music
Thanks to services like video2mp3.net/ you can download a lot of music from YouTube. But sometimes there is silence at the end or the beginning of a song, and a whole collection of songs might have differing levels of volume. To cut an mp3 you can use mp3splt like this to cut everything after 03:30 minutes:mp3splt file.mp3 00.00 03.30The sound levels can be normalized, losslessly, using mp3gain:mp3gain -rk *

Misc
When pretty-printing flat text files I use a2ps to format them nicely and get a PostScript-document. a2ps however does not support UTF-8, while the only characters I care about (German Umlauts) can be represented using Latin1 just fine. So, one can use iconv to convert them on the fly like this:iconv --from-code=UTF-8 --to-code=LATIN1 textfile.txt|a2ps --font 9 -E -B -r --column=1 > out.ps

To export images from a PDF-file I discovered pdfImages, which is part of the xpdf-suite. Use it like this:pdfimages -j foo.pdf bar

avatar

Well, let’s learn to cook.

Never having (really) cooked a meal in my life (no, macaroni and cheese doesn’t count), I decided to spend this holiday stocking up on asian and western recipes to aid me in my upcoming university life.

That quote comes from the recently initiated "Learning to cook" project on WIPUP. The full details as well as the stuff I’ve learned to cook is on WIPUP itself, but this project gave me an idea. Why hasn’t anybody made a cooking show run not by professional chefs, but instead by amateurs and university students? Given what has taken place in my kitchen over the past few days I’m quite sure it’s quality comedy material.

From "wait, aren’t we meant to defrost that first?" to "does the recipe mention all these leftover ingredients?", I must say it’s really been fun learning how to cook. Recipes and your own experiences welcome in comments. Here’s a picture of that awesome lasagna.

Well, back to work!

Related posts:

  1. Exams over!
  2. Rapid Fire
  3. Countdown to KDE 4.4 and the new KDE website: 3 days left

Posts for Tuesday, June 29, 2010

Goodbye Tokyo Cabinet, hello PostgreSQL?

The first version of this blog used MySQL; then I switched to Tokyo Cabinet. But now I've switched back to PostgreSQL. Here's why.

Why did I switch to TC to begin with?

  1. There weren't any good ORM-type libraries for Clojure at the time (over a year ago). So there was a bit of an impedance mismatch trying to query and work with my data. In the DB I have separate tables for posts, comments, tags, categories. But 90% of the time I want to fetch a post and end up in Clojure with all related tags, comments etc. (A JOIN won't work here without a lot of work to un-mangle the results; you usually need multiple queries.)

    With TC I could store anything in the DB, so I just dumped a post into the DB as serialized hash-maps with all the comments, tags, and categories as sub-keys. So querying was easy. (Or was it? More below.)

  2. MySQL was really slow. This is largely because my queries were terrible, as I tried to solve the problem from #1 via brute force. TC on the other hand is fast.

  3. I tried to solve #2 by using a Clojure ref as a cache. But tying the STM to a database's transaction system is as far as I know difficult or impossible right now (per many threads on the mailing list). I had a lot of potential race conditions, which (as far as I know) never bit me, but probably would've eventually. I had to deal with keeping the cache up-to-date as comments were posted and posts were added and deleted and renamed. Remember:

"There are only two hard problems in Computer Science: cache invalidation and naming things." --Phil Karlton

So why did I stop using TC?

  1. I have no idea how to use a key/value store database properly. TC will take anything you dump into it, which is both a strength and a weakness.

    There's a lot of crap you have to do by hand that a proper database does for you. Consider checking for null values, for example; I ended up with a lot of nils in my data because my validations weren't 100% foolproof, or because I imported data via code that didn't run the validations and I never noticed.. Or enforcing uniqueness of values; I had tag objects in the database with the same key but different values (due to capitalization differences), which screwed up a lot of stuff.

    On the other hand, there's a lot of information about how to use RDBMS properly, and I have a lot of experience with it already. Constraints are easy to set up. Columns have types, which is nice. (Strange that I gravitate toward statically-type databases while I gravitate toward dynamically-typed programming languages.)

  2. I have to compile and install Tokyo Cabinet by hand on my Linux distro. It's probably not worth distro maintainers to maintain a package that so few people use. MySQL and PostgreSQL have lots of people working on keeping them running OK on most Linux distros.

  3. Some kinds of queries were still awkward in TC. "Give me post X" was great: I'd also get all the tags, categories etc. for free. But then how do you query to get all tags across all posts? Or all comments? Fetch all the posts and iterate over them, collecting their tags, then uniquify the resulting list? Not so pretty, and not so fast. So I was back to caching again, which still gave me nightmares about race conditions and dirty data.

So now why am I using an RDBMS again?

  1. An RDBMS is exactly what I really need, if I could just query the thing concisely and get it to run fast. Thankfully there are some ORM-like libraries for Clojure in the works nowadays, already usable for a hobby project like this blog. There are clj-record, Carte, ClojureQL, and my own Oyako, and possibly others in the works.

  2. For my tiny blog's database, Oyako gives me slightly slower performance than TC, but along the same order of magnitude, which is good enough.

  3. Via Oyako I can (fairly concisely) fetch posts and get the associated tags, comments etc. But I can also easily fetch all tags, or all comments, since they're in their own tables. The "relational" part of RDBMS does come in handy sometimes.

Summary version

I switched to TC to begin with because I was using SQL wrong, and it was too slow and clumsy. Once I figured out how to use SQL correctly, it was a no-brainer to go back.

Introducing Gaka

The CSS for my blog is now being generated via gaka, a CSS-generating library I wrote this afternoon. It's extremely simple, but it got the job done for me. I turned around 600 lines of CSS into around 250 lines of Clojure without much effort. It looks like this:

user> (require '(gaka [core :as gaka]))
nil
user> (def rules [:div#foo
                  :margin "0px"
                  [:span.bar
                   :color "black"
                   :font-weight "bold"
                   [:a:hover
                    :text-decoration "none"]]])
#'user/rules
user> (println (gaka/css rules))
div#foo {
  margin: 0px;}

  div#foo span.bar {
    color: black;
    font-weight: bold;}

    div#foo span.bar a:hover {
      text-decoration: none;}

Gaka is partly inspired by Sass, which I found very pleasant to work with recently. And it's partly inspired by Hiccup, which is a delicious way to generate HTML in Clojure.

There's more info and more examples on github.

Posts for Monday, June 28, 2010

spamassassin / procmail with vpopmail accounts

copied from http://spamassassin.apache.org/logo/

motivation

i’ve had lots of trouble with mail the last two years with lots of spam still passing spamassassin. but that wasn’t so bad since we already had greylisting [1] running. that however still meant 8 spam mails per day with subjects as:

  • Hot news for js! 70% off all June!
  • Totaler Ausverkauf von Top Zeitmesser
  • Yes, js, today -80% to all prices. Office accessible free
  • This crysis never ends
  • VIAGRA ® Official Site -95%

however, that changed after i started to collect the spam to train the bayesian filter [2] used by spamassassin. this posting is about how i’ve done it!

in contrast: after using bayes i receive one spam mail every 4th day.

i still have to monitor the spam, which is filed in a folder called ‘spam’ in one of my ‘imap’ folders but since it can be identified as spam by 100% most of the time it’s simply a copy’n'past operation of all files in ‘spam’ to ‘spam_train’. the ‘spam-train’ folder is used in order to train spamassassin. of course there are false-positives and false-negatives as well. but handling those is very easy.

is my setup special in any way?

no i don’t think so at all. however: most sites do encourage admins not to use bayes filters per vpopmail account. and because of that and other factors there is not much documentation of how to do so. since i had some issues with how to call spamassassin from procmail it did not work for a very long time. recently i bought my nokia n900 mobile phone and using mail on this device is very nice. receiving a spam mail per hour means a notification on the n900 which IS annoying if the notification mostly contains spam slogans.

my .procmailrc

# qmail Lazydog procmailrc file
SHELL=”/bin/bash”
VHOME=”/var/vpopmail/domains/lastlog.de/js”
VERBOSE=”no”
LOGFILE=”/var/vpopmail/domains/lastlog.de/js/procmail.log”
:0fw
| spamassassin –siteconfigpath=/var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/local.cf
:0:
* ^X-Spam-Flag: YES
/var/vpopmail/domains/lastlog.de/js/.maildir/.spam/new
:0:
* ^(From|Cc|To).*news@aktuell.conrad.de
/dev/null
# (other rules below)

my local.cf file, the only config file i changed:

# cat local.cf

required_score          3.2

rewrite_header subject _SCORE(0)_ |

use_bayes 1

use_dcc 1

use_pyzor 1

use_razor2 1

bayes_auto_learn 1

bayes_path /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes

bayes_file_mode 0777

report_safe 1

#add_header all Flag _YESNOCAPS_

ok_languages            en de

ok_locales              en de

# Google SafeBrowsing Plugin

loadplugin Mail::SpamAssassin::Plugin::GoogleSafeBrowsing

#body GOOGLE_SAFEBROWSING eval:check_google_safebrowsing_blocklists()

google_safebrowsing_apikey ABQIAA#VJ#J#VAQFbnTQ4uqKBRgArBR3gWufhSOf_c-vzV4UEN0steDDKD

google_safebrowsing_dir /var/cache/spamassassin

# scores for each url hit in message body

google_safebrowsing_blocklist goog-black-hash 0.3

google_safebrowsing_blocklist goog-malware-hash 0.5

how to experiment with spam

no blog or documentation talks about this but i think it’s very important. say you have a spam mail and you want to test your spamassassin configuration do this:

ls -la /var/vpopmail/domains/lastlog.de/js/.maildir/cur

-rw-rwx—+ 1 vpopmail vpopmail     2630 26. Dez 2009  msg.zQJ68:2,RS*

-rw-rwx—+ 1 vpopmail vpopmail    21905 16. Dez 2009  msg._zr78:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     7433  9. Mär 2009  msg.zssm8:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     2332  6. Mai 17:27 msg.ZuFm8:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     3418 21. Nov 2008  msg.ZvFh8:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     3237 23. Feb 13:07 msg.zZcg9:2,S*

let’s pick one of these messages: ‘msg.zf8y8:2,S

now, let’s see what spamassassin thinks about this message with:

cat msg.zf8y8:2,S | spamassassin -t -D -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/local.cf >mail 2>debug

-t -D are only important for debugging

also have a look at:

tail -n 200 /var/vpopmail/domains/lastlog.de/js/procmail.log

creating the configuration for vpopmail usage of spamassassin

the message is processed by spamassassin and written to a file called ‘mail’ and the debug output going to stderr is written to a file called ‘debug’. this helped me a lot to verify that spamassassin was reading the right config files.

cat msg.zf8y8:2,S | spamassassin -t -D –siteconfigpath=/var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/local.cf >mail 2>debug

watch for things like this:

[7582] dbg: conf: finish parsing

[7582] dbg: plugin: Mail::SpamAssassin::Plugin::GoogleSafeBrowsing=HASH(0x14e3a10) implements ‘finish_parsing_end’, priority 0

[7582] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_toks

[7582] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_seen

[7582] dbg: bayes: found bayes db version 3

[7582] dbg: bayes: DB journal sync: last sync: 0

[7582] dbg: config: score set 3 chosen.

[7582] dbg: message: main message type: multipart/mixed

check: no loaded plugin implements ‘check_main’: cannot scan! at /usr/lib64/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line 164.

it seems that if i use –siteconfigpath spamassassin expects all config files to be in ‘/var/vpopmail/domains/lastlog.de/js/.spamassassin/’ but usually they are in ‘/etc/spamassassin/’. so when using –siteconfigpath make sure you copy all files from ‘/etc/spamassassin/*’ to your own configuration directory ‘/var/vpopmail/domains/lastlog.de/js/.spamassassin/’.
WARNING: do not overwrite your customized files as there might be a local.cf in both paths!
to check if the right bayes path is used, look at the debug file we created:

cat debug | grep bayes
[20792] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_toks
[20792] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_seen
[20792] dbg: bayes: found bayes db version 3
[20792] dbg: bayes: DB journal sync: last sync: 0
in this case everything is right.
also look if the mail is quallified as spam, just have a look at the ‘mail’ file:

Subject: 16.6 | VIAGRA ® Official Site -95%
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
bonker.serverkommune.de
X-Spam-Level: ****************
X-Spam-Status: Yes, score=16.6 required=5.0 tests=AWL,BAYES_99,
HTML_IMAGE_ONLY_08,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,MISSING_DATE,
MISSING_MID,MISSING_SUBJECT,NO_RELAYS,URIBL_AB_SURBL,URIBL_BLACK,
URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_WS_SURBL shortcircuit=no
autolearn=unavailable version=3.2.1
MIME-Version: 1.0

initial bayes, sa-learn

since i’ve been learning the db wrong for some time i decided to relearn everything with:

cd /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/

ls -la

-rw-rw—-+ 1 joachim users    7824 21. Jun 15:17 bayes_journal

-rw-rwx—+ 1 joachim users  643072 21. Jun 15:09 bayes_seen*

-rw-rwx—+ 1 joachim users 5177344 21. Jun 15:09 bayes_toks*

just remove all 3 files (or create backups if in doubt) and then use sa-learn:

cat spamcommand

o “=== <SPAM Train> ====”

echo “/var/vpopmail/domains/lastlog.de/js/.maildir/.spam_train/cur”

nice sa-learn -D 5 -u vpopmail –dbpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/db –siteconfigpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/user_prefs -C /var/vpopmail/domains/lastlog.de/js/.spamassassin –spam –dir /var/vpopmail/domains/lastlog.de/js/.maildir/.spam_train/cur

echo “=== </SPAM Train> ===”

echo “”

echo “=== <HAM Train> ===”

for i in cur .js@dune2.de/cur .notice/cur; do

echo “learning from: /var/vpopmail/domains/lastlog.de/js/.maildir/${i}”

nice sa-learn -D 5 -u vpopmail –dbpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/db –siteconfigpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/user_prefs -C /var/vpopmail/domains/lastlog.de/js/.spamassassin –ham –dir ”/var/vpopmail/domains/lastlog.de/js/.maildir/${i}”

echo “=== </HAM Train> ===”

i execute this script every night using a cronjob, and so far it is working great! handling false positives and false negatives is easy as well since you only have to copy the wrong mails into the right folder and the next cronjob probably learns it right. i’m not sure on this but so far it is working.

./spamcommand

==== <SPAM Train> ===

/var/vpopmail/domains/lastlog.de/js/.maildir/.spam_train/cur

[7913] info: archive-iterator: skipping large message

Learned tokens from 2515 message(s) (2531 message(s) examined)

=== </SPAM Train> ===

=== <HAM Train> ===

[22465] info: archive-iterator: skipping large message

Learned tokens from 1100 message(s) (1171 message(s) examined)

[19782] info: archive-iterator: skipping large message

….

Learned tokens from 191 message(s) (199 message(s) examined)

=== </HAM Train> ===

./spamcommand  291,01s user 12,00s system 12% cpu 40:05,37 total

the first time this script run about 40 minutes

rights management

since there might be multiple users accessing the bayes dbs you have to check permissions. most likely these users are: vpopmail and your login name.

final words

remove -t and -D if you have finished debugging as it will flood your logs. also have a look at [3], [4] and [5].

links

[1] http://de.wikipedia.org/wiki/Greylisting

[2] http://wiki.apache.org/spamassassin/BayesFaq

[3] http://wiki.apache.org/spamassassin/BayesInSpamAssassin

[4] http://standbytux.blogspot.com/2005/07/using-spamassassin-and-procmail-to.html

[5] http://faisal.com/docs/salearn


Posts for Wednesday, June 23, 2010

New blog engine up and running

Well, my new blog is up and running. Sorry for the temporary lack of cows in my layout. I'm dogfood-testing the blog engine in a fairly vanilla state until I work out some of the bugs. This layout is based upon barecity, a minimalist Wordpress theme that I adapted easily enough to my blog.

As a bonus, I applied a dirty hack to my RSS feed that I think should help avoid screwing up people's RSS readers with duplicate entries.

I'll write again soon with some info about the blog engine and some things I learned writing it.

(As mentioned previously, here's the code.)

Managing ZIP-based file formats in git

git rocks, everybody knows that, but this morning I ranted about the problems with putting binary files into it due to that removing the option to actually see the differences. Splitbrain came to the comments and pointed out that Microsoft’s new file formats (like pptx, docx and such) are just zipped packages of XML files so at least some diffing should be possible. Remembering what I wrote about versioning PDF files in git I thought it should be a quick hack and in fact it really is:

Open your ~/.gitconfig file (create if not existing already) and add the following stanza:

[diff "zip"]
textconv = unzip -c -a

What it does is using “unzip -c -a FILENAME” to convert your zipfile into ASCII text (unzip -c unzips to STDOUT). Next thing is to create/modify the file REPOSITORY/.gitattributes and add the following

*.pptx diff=zip

which tells git to use the zip-diffing description from the config for files mathcing the given mask (in this case everything ending with .pptx). Now git diff automatically unzips the files and diffs the ASCII output which is a little better than just “binary files differ”. On the other hand to to the convoluted mess that the corresponding XML of pptx files is, it doesn’t help a lot but for ZIP-files including text (like for example source code archives) this is actually quite handy.

speed up firefox (reloaded)

I’ve posted before about this little hack on speeding up firefox. The key is that you actually move the entire .mozilla folder from disk to memory.

First you have to mount /tmp to memory (some linux distributions may do this by default) adding this line to /etc/fstab and rebooting:

none /tmp tmpfs size=512M,nr_inodes=200k,mode=01777 0 0

Then it’s safe to do:

cp -R /home/user/.mozilla /home/user/.mozilla_save
mv /home/user/.mozilla /tmp/mozilla
ln -s /tmp/mozilla /home/user/.mozilla

I updated the script so I can use it as a system init script:

#!/bin/bash

start() {
    mkdir -p /tmp/mozilla
    rsync -avi --delete /home/user/.mozilla_save/ /home/user/.mozilla/
}

stop() {
    size=`du -xs /home/user/.mozilla/ | awk '{print $1}'`
    digits=`expr length $size`
    if [ $digits -gt 4 ]; then
        rsync -avi --delete /home/user/.mozilla/ /home/user/.mozilla_save/
    else
        echo 'no!'
        exit 0
    fi
}

case "$1" in
start)
     start
     ;;
stop)
     stop
     ;;
*)
     echo $"Usage: $0 {start|stop}"
     exit 1;;
esac

So I placed this script at /etc/init.d/, made it executable and created a link inside /etc/rc.5/

ln -s /etc/init.d/ffsync.sh /etc/rc5.d/S99ffsync

rc5 because the default runlevel on Fedora is 5. You can see yours from /etc/inittab.

So every time my system boots the above script runs with the start option (executing the start function) and every time it halts/reboots it runs with the stop option (stop function).

In order to be sure that no data loss will occur in the unlikely event of sudden shutdown, I have a cronjob that saves the mozilla folder every 15 minutes.

*/15 * * * * /etc/init.d/ffsync.sh stop

Trust me. With the above hack you’ll see a significant difference on firefox’s speed. Especially if you’re you using firefox’s awesome address bar to search through your browser’s history.

Breaking links is easy to do

I apologize in advance to everyone who subscribes to my blog's RSS feed, but this week your RSS reader is probably going to suddenly find 25 "new" posts from me.

My blog currently uses /blog/title as the URL scheme, with similar URLs for categories and tags etc. Soon, I'm probably going to change it to /blog/123/title, as part of the impending release of version 0.2 of my blog engine. (The code-in-progress is in a branch on github, for the daring and foolish among you.)

This way, I can change the title of a post without breaking everything. I have heretofore lacked this ability. It's easy to code, you just tell Compojure to ignore everything after the number in a route. Something like this:

(defroutes foo
  (GET ["/blog/:id:etc" :id #"\d+" :etc #"(/[^/]*)?"] [id]
    (pages/post-page id)))

It's only a few lines of code to change, but the ramifications are widespread. It'll instantly break every link to my blog, for example. At least it's pretty easy to set up a bunch of redirects in Compojure to avoid that. I think this'll work:

(require (blog [db :as db]
               [link :as link])
         (oyako [core :as oyako])
         (ring.util [response :as response]))

(defn redirect-post [name]
  (when-let [post (oyako/fetch-one db/posts :url name)]
    (response/redirect (link/url post))))

(defroutes redirect-routes
  (GET ["/blog/:name" :name #"[^/]+$"] [name]
    (redirect-post name)))

(Oyako here is the experimental ORM-like library I'm using to interface with PostgreSQL nowadays, having ditched Tokyo Cabinet.)

Changing my URL scheme is also going to mess up RSS though, because I (foolishly) used post URLs as the GUIDs in my RSS feed up to this point. This problem I don't know how to avoid. I might reduce the number of posts included in my feed temporarily, to limit the damage.

avatar

Excellent Post About Free Software Purists

Edit: Almost forgot to mention: I heard about this blog post from flameeyes.

There was a really, really good blog post that actually clarified to me what was for so long a nuisance in discussions with Free Software purists (and I couldn’t reason out why at the time). Read this quote:

A long time ago, Larry Wall gave a “State of the Onion” speech. Larry is the guy behind Perl, the programming language, and every year he gave a talk about what he believes Perl is all about and where it’ll be going over the following twelve months. Larry’s not really a programmer; he’s a linguist who does a lot of programming. So his view on what a computer language should be is uniquely interesting, and should be read. Anyway, he spoke of freedom, and characterised the ends of the spectrum on the subject as “Bill” and “Richard”, those two being Bill Gates (at the time head of Microsoft) and Richard Stallman (head of the Free Software Foundation): the idea here was that those two represented the ultimate expressions of their philosophies. Richard Stallman was at the far extreme of belief in free software; Bill Gates was at the other far end, decrying the very concept of freedom. It was a throwaway joke, at the time; a caricature for the purpose of a laugh in a presentation.

I encourage everyone to read the original post, since it really brings some light out on the unreasonableness and extreme nature of the Free Software purist position.


Planet Larry is not officially affiliated with Gentoo Linux. Original artwork and logos copyright Gentoo Foundation. Yadda, yadda, yadda.