Posts for Saturday, July 3, 2010

Intro to programming with Falcon

I’ve spent a lot of my free time lately tinkering with Falcon. I’ve written a very small, and quite unusable, amount of code for the language, and I’ve written a couple of scripts using it as well. Most of those don’t work either. But the ones that do tend to work very well and I enjoy the time I spend with it. So to help me remember and to potentially help someone else learn I’m going to start writing my findings with it here.

Getting Started

Getting started for me always requires setting up my editor for use with whatever language I’m using. I try to use Vim as much as possible; so that generally means I have little to nothing to do. Languages like C, Ruby, Python, etc. are so finely tuned within Vim these days there is almost nothing left to setup. Falcon, on the other hand, is still very new and when I first starting playing with it had no files for Vim. I did end up finding a very basic syntax and indent file for Vim in the Falcon SVN but they were by no means good enough. So I’ve spent a lot of time making the language work in Vim. In fact I have probably spent twice the amount of time working on the Vim config files than I have programming in the language itself! To get started with Vim I suggest you check out my Github repo. The syntax is fairly feature rich at this point. It does just about everything I want it to do. The indent file on the other hand is still pretty bad. I mean it works, but it could do a lot better. If anyone is aware of other comparable vim scripts for Falcon, I’d love to compare notes!

Hello World

The hello world is, of course, the default program for every language. In Falcon it’s as simple as ever, and of course, like most languages, we can spend all day figuring out different ways to write it. I’ll keep it simple though.

#!/usr/bin/env falcon

> "Hello World!"

Finished!

Enjoy the Penguins!


Posts for Friday, July 2, 2010

Dell sucks

Why did I order a computer from Dell? I guess I had a good opinion from 6 years ago when I last bought something from them.

Let's count the ways in which their customer service has failed me. (And my computer isn't even here yet.)

  1. As documented, their website couldn't process my credit card without a phone call.

  2. After a week of my computer being "in production", I started getting more phone calls from an unidentified phone number that Google told me was Dell. Fearing another billing problem, I called back. And I was told "Thanks for calling, but our order tracking system is down. And we're all going home. Call back tomorrow morning.".

    If only Dell had some means to acquire reliable computer systems on which to build their order tracking database.

  3. I called the next day and was told my order was fine. I was also told (per script, I'm certain) that I could check my order status on Dell's website. Which of course I knew. I know it costs the company money every time someone calls, and they try to strongly discourage calls for that reason, but their script made it sound like I was an imbecile.

    I found it quite condescending. I dislike these canned scripts pander to the lowest common denominator of customer. They should be happy to take my call. I just spend upwards of a thousand dollars on their crap.

  4. Turns out the phone calls I was getting were from someone trying to give me "free internet from Shaw or Telus for 3 months", and I was eligible because I bought a Dell computer. So I was being telemarketed before my computer even got here.

    I said I already had internet service, and they said "Oh, too bad, it's for new customers only." I do not appreciate this.

  5. I got an email saying my order shipped. Joy! 20 minutes later I got an email saying my order was delayed, and if it didn't ship in 5 days I should call. What?

    It really did ship though, I have a tracking number. Why the contradictory emails?

All of my phone dealings with Dell were via some offshored far-eastern country, judging by the accents of the phone reps. I have nothing against this in principle; I'm not a xenophobe. But the phone connection is always so static-filled and laggy that it really puts a damper on communication.

My computer isn't here yet, and I just hope to God it works and doesn't break in a month. I kind of wish this article had come out a week earlier.

That'll teach me for trying to save time, I guess. Next time I'll build my own system from scratch. Dell goes onto my List of Companies Not to Buy From in the Future (LCNBFF), along with Westinghouse and oh so many others.

Posts for Thursday, July 1, 2010

Kate syntax highlighting for Linux New Media articles

Since I'm occasionally writing articles for Linux New Media and they have this bogus syntax they expect you to follow when writing articles, I decided to put an end to my suffering.

Most of the time I use Kate for writing articles (I use Vim mainly for administration). Therefore the logical solution was to wrap those syntax rules into a highlighting file for Kate. Now I can finally make heads and tails out of the whole mess!

So, in case you write for any of their magazines[1], you can now download my Kate syntax highlighting file and enjoy the goodness :]

BTW, if you want to write your own syntax highlighting for Kate, check out its online documentation and/or this article. Taking a peek at the already existing XML files in /usr/share/apps/katepart/syntax/ and ~/.kde4/share/apps/katepart/syntax/ might be a good idea as well

Someday I may even write the same highlighting for Vim.

hook out >> sipping tea and writing an article about FOSS solutions to cloud computing for Linux Magazine


[1] Linux New Media are releasing a huge amount of GNU/Linux magazines all over the world. Amongst others: Linux Magazine, Linux Pro Magazine, EasyLinux, Linux User, Linux Technical Review and many others.
<!--break-->

Posts for Wednesday, June 30, 2010

I am an edge case

I am an alien. An American who emigrated to Canada. This has resulted in a lot of fun and a bit of pain as I've managed to break the systems of many of the businesses I deal with.

As a programmer I can appreciate the importance (and sometimes difficulty) of handling edge cases. It's been an interesting experience living as an edge case myself.

H&R Block(heads)

Taxes are confusing enough when you don't have a wife from another country. The friendly folk at H&R Block had no idea how to handle my situation. Their computers demanded a Social Security Number for my wife, which she doesn't have, because she's Canadian. So they left it blank. (Leaving it blank was the consensus opinion of everyone at H&R Block, including the managers.)

To even be able to leave it blank, they had to print the forms, because the computer refused to submit my taxes electronically with a blank SSN. This should've been a red flag to me, looking back.

The correct thing to do was for her to get an ITIN from the US, which from what I know is a like an SSN that doesn't get you SS benefits or allow you to work in the US. It's just for tracking. But they didn't tell me that.

Moral of this story: That's the last time I'll ever use H&R Block. If I want to do my taxes wrong, I can do that myself for free.

IRS

The IRS refused to believe that my wife existed without a number to assign her, so they rejected my tax return and threatened to charge me tons of penalties. So I drove down to the friendly neighborhood IRS office. Surely they'd know how to fix this, right?

Wrong. The fellow at the IRS was familiar with filing taxes for Mexicans living in the US, but not for a "non-resident alien Canadian spouse" like mine.

The ITIN docs say that I need to submit a notarized copy of my wife's passport, to get her an ITIN. But notarized by whom? By a notary in Canada or the US? The IRS agent spent at least an hour reading his enormous IRS manual, looking up treaties and international law, trying to figure this out. Eventually he found a footnote scribbled into the margin of his book that said a Canadian notary was OK. So that's what I gave him.

He filled out the new tax forms himself, stapled on a photocopy of the page from his own IRS manual saying the Canada-notarized passport was OK, stamped everything all official-like, mailed it away himself, and... a month later it was rejected again. I needed to get a US notary to do it, or my wife had to drive to the freaking capital of her province to get it super-notarized or something. Nine months later and 3 more trips to the IRS office, I finally got it worked out.

As a US citizen, living in Canada, working for a US company, being paid by a Canadian payroll company, I live in constant fear of doing my taxes next spring.

Moral of this story: Even the IRS doesn't know their own tax code. Thanks again, US Government.

No credit

I have good credit in the US. I was able to get a car loan without a co-signer 6 years ago. I have credit cards.

In Canada, I don't exist. My credit score here is zero, as you might expect. I tried to buy a car, and they just flat-out wouldn't let me without a co-signer for a loan. I guess I can't blame them. (Except that I have money, they have something I want to buy, and we both ended up losing out.)

I went to the bank to open a checking (er, "chequing") account, and all hell broke loose. They wanted my business, but how could they justify giving an account to someone with "no credit"? Eventually the bank manager managed to run a US credit check on me (at least she said she did) and they let me open an account.

But I can't get a credit card here. Not even a $500-limit high-interest card like they give to kids in college. Not even after I had the bank write a letter of recommendation vouching for me. I have to wait a year and keep getting paychecks before I show up in the system. ("Paycheques"?) I'm still looking for other options in the meantime.

Moral of this story: Well, no moral. Sucks to be me, I guess.

Shopping

I tried to buy a computer online recently (from Dell, which I'm starting to regret). Not having a Canadian credit card, I used my US card. This is OK, my card works up here, with a small foreign transaction fee of %1-3.

But the website wouldn't take a US address as my billing address. I had to give a Canadian address. "Province" was a drop-down, mandatory field.

No matter, I'll just go to my bank's website and change my billing address to one in Canada. I checked with my bank before I moved, and they said it was no problem to keep the account even if I moved to another country. "We have lots of international customers!", said the teller.

Well, my bank's website won't let me specify a non-US billing address. "State" is a mandatory drop-down. Which is awesome. I emailed the bank and asked them to change my address for me, and they did. Now when I go to the "change my address" form on their website, half the fields are filled in and half are just blank. So if I ever use the form again, something will probably break.

Once I got on the phone with Dell, they were more than happy to take my US credit card as payment. Their online form just couldn't handle it.

Moral of this story: Text fields, not drop-downs.

An so on

When I fly across the border, I have to fill out a customs declaration form. There's a field asking "country of residence". Well, technically I am a resident of two countries. So I pick "US" when I fly down to the States, "Canada" when I fly back.

I tried to get a Costco membership in Canada, and they wanted a driver license. My license is from Oregon. Hilarity ensued, and they decided I didn't need one after all. Any time I need to give a driver license for any purpose, I end up breaking something.

Note that in every situation I've described, what I was trying to do was valid, and after some hassle, everything usually worked out OK. Computers just got in the way and slowed the process way down. And I'm not that much of an edge case. 250,000 people immigrate to Canada every year.

I suppose it might be better to optimize for the common cases, force people to pick their province from a drop-down. And then deal with the edge cases like mine manually later. Every text field is another opportunity for users to type in gibberish and chaos. But I wonder if the programmers actually thought about it this much, or if they were just being lazy.

I'm not really complaining. I don't expect the world to change to accommodate me. It's been more funny than annoying. But I do find it interesting to see the flaws in computer systems exposed. I get a certain sick satisfaction out of seeing people write "invalid" values into fields that I know are going to break someone's database down the line.

no more crashes after resuming from pm-suspend

motivation

source hp.com

hp elite book 8530w, source hp.com

i own a hp elitebook 8530w which is a good notebook if used with linux. it does have a nvidia card:

01:00.0 VGA compatible controller: nVidia Corporation G96M [Quadro FX 770M] (rev a1), according to lspci.

i use it with Linux ebooK 2.6.34-rc5 #4 SMP PREEMPT Wed Jun 30 10:48:22 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz GenuineIntel GNU/Linux.

ever since i bought this laptop it often stuck at resume,  using pm-suspend (sys-power/pm-utils).

it’s probably a x11-drivers/nvidia-drivers issue but i’m not certain since usually i can’t see anything on the screen (the screen remains totally black, no backlight) but the computer isn’t accessable over the network (can’t login via ssh, after a broken resume). so today i wanted to fix that by installing the nouveau driver. that failed however. but it seems that i fixed the resume issue.

HINT: that means i can use the proprietary driver, as it is not a problem anymore.

HINT: that also fixed the problem that after the first pm-suspend (or hibernate-ram) the consoles alt+f1 … to … alt+f12 weren’t accessable as they were blanked out completely.  i might experiment with nouveau soon, but for now let’s blog how i fixed the resume issue:

updating pm-utils

my old version: sys-power/pm-utils-1.2.5 -> pm-suspend
my current version: sys-power/pm-utils-1.3.0-r2
emerge pm-utils

[ebuild  N    ] sys-power/pm-quirks-20100316
[ebuild  NS   ] app-text/docbook-xml-dtd-4.5-r1 [4.1.2-r6, 4.2-r1, 4.3-r1, 4.4-r2]
[ebuild     U ] sys-power/pm-utils-1.3.0-r2 [1.2.5]

used with x11-drivers/nvidia-drivers-195.36.24
after updating pm-utils there was no crash after resuming. which felt like a miracle! however, the problem seems to be odd. after updating the crash was gone but the backlight was very dim. in fact it was the lowest setting possible. and the other problem was that i could not find any tool to fix that. searching in wikis and blogs for ages i remembered how i did it last time using my thinkpad (i had similar problems there).
there is a kernel interface for controlling the screen brightness using acpi.

how to increase the brightness?

ideal solution: using the Fn+F9 and Fn+F10 keys
ok, first check if these keys are mapped via acpi:
as root, i type:
“acpi_listen”
fn+f9    video/brightnessdown BRTDN 00000087 00000000
fn+f10   video/brightnessup BRTUP 00000086 00000000
fn+f11   –nothing–
so fn+f11 seems to be connected directly (i don’t know how that exactly works) but the other two keys work fine with acpi_listen. so we can use these!
NOTE: i had the ‘ambient light sensor’ disabled using the BIOS but this didn’t help either. fn+f11 will switch to using the ‘ambient light sensor’ but somehow i usually think it’s still to dim most of the time. it is far better on the ‘mac book pro’.
maybe nvidia-settings? NO! it does not control that.
even restarting X didn’t fix it. and restarting the computer not either!?

what next? -> the acpi interface

the acpi interface was incomplete as there was no /proc/acpi/video/
after endless googling i realized that hp-wmi and wmi (CONFIG_HP_WMI=m) support was already included in my kernel as modules and even loaded but:
CONFIG_ACPI_VIDEO=m
was missing!
so i added it, reinstalled a new kernel with the proper modules and after a reboot the video interface was there:
/proc/acpi/video/DGFX/
so experimenting with it:
cat /proc/acpi/video/DGFX/LCD/brightness
levels:  0 5 10 15 20 25 30 33 36 40 43 46 50 55 60 65 70 75 80 83 86 90 93 96 100
current: 30
now, let’s increase that value to the maximum:
echo “100″ > /proc/acpi/video/DGFX/LCD/brightness
and i’m finally there! i can actually see what i’m writing! ;-)

how to integrate this into the system?

we will combine the acpi with the /proc interface to set a proper brightness level. a nice reference for  doing so can be found at [1].

i’ve written a script in /etc/acpi/events/default first but it didn’t work!
NOTE: i did not restart acpid, this was probably the problem
anyway: i integrated the events into /etc/acpi/default.sh instead. this is better as one can check that for syntax errors and one can experiment with it, example:
first do a:
acpi_listen
and then create an acpi event, as for instance pressing fn+f9 (on my elitebook) to find how the event is called (which gets generated)
for instance: pressing fn+f3 here generates a “button/sleep SBTN 00000080 00000000
note this down, or copy it with the mouse. then invoke the default.sh script using it as parameter, leave the cryptic numbers aside. example:
./default.sh button/sleep
if this manual invocation works (no syntax errors in default.sh reported) you can also start using the acpi event by pressing fn+f3 (instead of manually invoking the script).

the /etc/acpi/events/brightness script

if [ "$1"x == "up"x ]; then

X=$( cat /proc/acpi/video/DGFX/LCD/brightness | grep current | awk ‘{print $2}’)

Y=$(echo  ”$X+20″ | bc)

for i in `seq $X $Y`; do

echo $i > /proc/acpi/video/DGFX/LCD/brightness

done

fi

if [ "$1"x == "down"x ]; then

X=$( cat /proc/acpi/video/DGFX/LCD/brightness | grep current | awk ‘{print $2}’)

Y=$(echo  ”$X-20″ | bc)

for i in `seq $X -1 $Y`; do

echo $i > /proc/acpi/video/DGFX/LCD/brightness

done

fi

the /etc/acpi/default.sh script

#!/bin/sh
# /etc/acpi/default.sh
# Default acpi script that takes an entry for all actions
set $*
group=${1%%/*}
action=${1#*/}
device=$2
id=$3
value=$4
log_unhandled() {
logger “ACPI event unhandled: $*”
}
case “$group” in
video)
case “$action” in
brightnessup)
/etc/acpi/events/brightness up
;;
brightnessdown)
/etc/acpi/events/brightness down
;;
esac
;;
button)
case “$action” in
power)
/sbin/init 0
;;
# if your laptop doesnt turn on/off the display via hardware
# switch and instead just generates an acpi event, you can force
# X to turn off the display via dpms.  note you will have to run
# ‘xhost +local:0′ so root can access the X DISPLAY.
#lid)
#       xset dpms force off
#       ;;
sleep)
/usr/sbin/pm-suspend
;;
*)      log_unhandled $* ;;
esac
;;
ac_adapter)
case “$value” in
# Add code here to handle when the system is unplugged
# (maybe change cpu scaling to powersave mode).  For
# multicore systems, make sure you set powersave mode
# for each core!
#*0)
#       cpufreq-set -g powersave
#       ;;
# Add code here to handle when the system is plugged in
# (maybe change cpu scaling to performance mode).  For
# multicore systems, make sure you set performance mode
# for each core!
#*1)
#       cpufreq-set -g performance
#       ;;
*)      log_unhandled $* ;;
esac
;;
*)      log_unhandled $* ;;
esac

links


Ricers and logic

The first post on this blog was called “Why Gentoo” and outlined my reasons for using that distribution. Today I was talking to Flameeyes out the context of his post on the recent libpng-1.4 stabilization. The problem with Gentoo is, was and probably always will be its image and parts of its user base.

On Gentoo you compile every package according to your own setup, allowing you to easily disable features in software you don’t want or need. This is killer feature one. The other killer feature is that it’s bloody simple to bring new software into Gentoo with ebuilds: They are trivial to write and can handle basically every kind of package. Awesome.

But recently we see some talk about how compiling stuff with Gentoo is so much faster than all the other distros. And that really pisses me off because it is retarded.

It reminds me of the “portage” vs. “paludis” discussions. For those with less knowledge about Gentoo’s past: When some people decided that the package manager “portage” was too slow (portage is written in Python) they started using paludis (a package manager written in C++). The same people would then brag that searching for a package would take 5 seconds less that with portage (or similar numbers). The point is: paludis takes ages to compile, how often do you have to search in order to have a net benefit?

Let’s do some math. You have two programs Prog_A and Prog_B. Prog_A takes 10 seconds for a job, Prog_B only takes 5 seconds. On the other hand Prog_B takes 30 Minutes longer to compile than Prog_A. If we save 5 seconds on every execution of Prog_B we are saving time if we run Prog_B more than 360 times. “Oh yeah”, you say, “I probably use a lot of software that often!” but think about updates! This equation holds water as long as you use Prog_B more than 30 times without updating.

Now look at some programs you might really run a lot. Mozilla Firefox (including its runtime Xulrunner) takes quite a lot of time to compile and do you really think considering all the network laggyness in its domain that whatever you compile makes such a big deal speed-wise? Do you compile OpenOffice yourself and spend hours upon hours on that? How fast do you type? I mean seriously?

Gentoo is not about speed. It’s a meta-distribution that allows you to build your own custom system for exactly your use case. And that’s not about setting retarded CXXFLAGS in order to have some package manager run in fewer milliseconds.

avatar

Command-line warriors, part one

TerminalThis is a post about some things you might have used a graphical tool for in the past, but which can be done just as well using command-line tools. Since I keep finding these little gems I hope I can continue this kind of post in the future. I’ll try to categorize the tips and I’ll only post the functionality I recently discovered and use myself, so this is not a reference for any of the programs mentioned.

Photography
You accidentally selected “Delete All” on your digital cameras and now your photos are gone? You even fear they might have been overwritten by photos taken after the accident? Fear not, the same thing happened to me, and I was able to restore all the photos from the event plus photos going back as far as three months from my CF card in my 400d using the PhotoRec software.

Cameras today often have an orientation sensor, so pictures taken in portrait mode will automatically be rotated. But this usually happens in the viewer, while the orientation is simply an EXIF tag. Of course this takes up precious computing time and not every viewer supports it. jhead can not only display EXIF information in a consistent manner but can also rotate pictures lossless and clear the orientation-tag afterwards. The call is jhead -autorot *.JPG
If you want to rename your images using EXIF-information, exiftool is the way to go. The command exiftool '-FileName<Party_ ${CreateDate}_${filename}' -d %Y%m%d *.jpg would prefix all the photos with “Party” and the create-date, while keeping the original filename as the suffix. exiftool can shift dates too, if you ever forget to adjust for daylight saving.

When uploading the images you’ll often want to resize them. The ImageMagick collection can do just that, and many other things (like sharpening etc.). If you mogrify -resize 1600x1600 your photos will be resized to 1600px maximum edge length, while keeping the ratio. But be careful, mogrify overwrites the pictures in place!

Music
Thanks to services like video2mp3.net/ you can download a lot of music from YouTube. But sometimes there is silence at the end or the beginning of a song, and a whole collection of songs might have differing levels of volume. To cut an mp3 you can use mp3splt like this to cut everything after 03:30 minutes:mp3splt file.mp3 00.00 03.30The sound levels can be normalized, losslessly, using mp3gain:mp3gain -rk *

Misc
When pretty-printing flat text files I use a2ps to format them nicely and get a PostScript-document. a2ps however does not support UTF-8, while the only characters I care about (German Umlauts) can be represented using Latin1 just fine. So, one can use iconv to convert them on the fly like this:iconv --from-code=UTF-8 --to-code=LATIN1 textfile.txt|a2ps --font 9 -E -B -r --column=1 > out.ps

To export images from a PDF-file I discovered pdfImages, which is part of the xpdf-suite. Use it like this:pdfimages -j foo.pdf bar

avatar

Well, let’s learn to cook.

Never having (really) cooked a meal in my life (no, macaroni and cheese doesn’t count), I decided to spend this holiday stocking up on asian and western recipes to aid me in my upcoming university life.

That quote comes from the recently initiated "Learning to cook" project on WIPUP. The full details as well as the stuff I’ve learned to cook is on WIPUP itself, but this project gave me an idea. Why hasn’t anybody made a cooking show run not by professional chefs, but instead by amateurs and university students? Given what has taken place in my kitchen over the past few days I’m quite sure it’s quality comedy material.

From "wait, aren’t we meant to defrost that first?" to "does the recipe mention all these leftover ingredients?", I must say it’s really been fun learning how to cook. Recipes and your own experiences welcome in comments. Here’s a picture of that awesome lasagna.

Well, back to work!

Related posts:

  1. Exams over!
  2. Rapid Fire
  3. Countdown to KDE 4.4 and the new KDE website: 3 days left

Posts for Tuesday, June 29, 2010

Goodbye Tokyo Cabinet, hello PostgreSQL?

The first version of this blog used MySQL; then I switched to Tokyo Cabinet. But now I've switched back to PostgreSQL. Here's why.

Why did I switch to TC to begin with?

  1. There weren't any good ORM-type libraries for Clojure at the time (over a year ago). So there was a bit of an impedance mismatch trying to query and work with my data. In the DB I have separate tables for posts, comments, tags, categories. But 90% of the time I want to fetch a post and end up in Clojure with all related tags, comments etc. (A JOIN won't work here without a lot of work to un-mangle the results; you usually need multiple queries.)

    With TC I could store anything in the DB, so I just dumped a post into the DB as serialized hash-maps with all the comments, tags, and categories as sub-keys. So querying was easy. (Or was it? More below.)

  2. MySQL was really slow. This is largely because my queries were terrible, as I tried to solve the problem from #1 via brute force. TC on the other hand is fast.

  3. I tried to solve #2 by using a Clojure ref as a cache. But tying the STM to a database's transaction system is as far as I know difficult or impossible right now (per many threads on the mailing list). I had a lot of potential race conditions, which (as far as I know) never bit me, but probably would've eventually. I had to deal with keeping the cache up-to-date as comments were posted and posts were added and deleted and renamed. Remember:

"There are only two hard problems in Computer Science: cache invalidation and naming things." --Phil Karlton

So why did I stop using TC?

  1. I have no idea how to use a key/value store database properly. TC will take anything you dump into it, which is both a strength and a weakness.

    There's a lot of crap you have to do by hand that a proper database does for you. Consider checking for null values, for example; I ended up with a lot of nils in my data because my validations weren't 100% foolproof, or because I imported data via code that didn't run the validations and I never noticed.. Or enforcing uniqueness of values; I had tag objects in the database with the same key but different values (due to capitalization differences), which screwed up a lot of stuff.

    On the other hand, there's a lot of information about how to use RDBMS properly, and I have a lot of experience with it already. Constraints are easy to set up. Columns have types, which is nice. (Strange that I gravitate toward statically-type databases while I gravitate toward dynamically-typed programming languages.)

  2. I have to compile and install Tokyo Cabinet by hand on my Linux distro. It's probably not worth distro maintainers to maintain a package that so few people use. MySQL and PostgreSQL have lots of people working on keeping them running OK on most Linux distros.

  3. Some kinds of queries were still awkward in TC. "Give me post X" was great: I'd also get all the tags, categories etc. for free. But then how do you query to get all tags across all posts? Or all comments? Fetch all the posts and iterate over them, collecting their tags, then uniquify the resulting list? Not so pretty, and not so fast. So I was back to caching again, which still gave me nightmares about race conditions and dirty data.

So now why am I using an RDBMS again?

  1. An RDBMS is exactly what I really need, if I could just query the thing concisely and get it to run fast. Thankfully there are some ORM-like libraries for Clojure in the works nowadays, already usable for a hobby project like this blog. There are clj-record, Carte, ClojureQL, and my own Oyako, and possibly others in the works.

  2. For my tiny blog's database, Oyako gives me slightly slower performance than TC, but along the same order of magnitude, which is good enough.

  3. Via Oyako I can (fairly concisely) fetch posts and get the associated tags, comments etc. But I can also easily fetch all tags, or all comments, since they're in their own tables. The "relational" part of RDBMS does come in handy sometimes.

Summary version

I switched to TC to begin with because I was using SQL wrong, and it was too slow and clumsy. Once I figured out how to use SQL correctly, it was a no-brainer to go back.

Introducing Gaka

The CSS for my blog is now being generated via gaka, a CSS-generating library I wrote this afternoon. It's extremely simple, but it got the job done for me. I turned around 600 lines of CSS into around 250 lines of Clojure without much effort. It looks like this:

user> (require '(gaka [core :as gaka]))
nil
user> (def rules [:div#foo
                  :margin "0px"
                  [:span.bar
                   :color "black"
                   :font-weight "bold"
                   [:a:hover
                    :text-decoration "none"]]])
#'user/rules
user> (println (gaka/css rules))
div#foo {
  margin: 0px;}

  div#foo span.bar {
    color: black;
    font-weight: bold;}

    div#foo span.bar a:hover {
      text-decoration: none;}

Gaka is partly inspired by Sass, which I found very pleasant to work with recently. And it's partly inspired by Hiccup, which is a delicious way to generate HTML in Clojure.

There's more info and more examples on github.

Posts for Monday, June 28, 2010

spamassassin / procmail with vpopmail accounts

copied from http://spamassassin.apache.org/logo/

motivation

i’ve had lots of trouble with mail the last two years with lots of spam still passing spamassassin. but that wasn’t so bad since we already had greylisting [1] running. that however still meant 8 spam mails per day with subjects as:

  • Hot news for js! 70% off all June!
  • Totaler Ausverkauf von Top Zeitmesser
  • Yes, js, today -80% to all prices. Office accessible free
  • This crysis never ends
  • VIAGRA ® Official Site -95%

however, that changed after i started to collect the spam to train the bayesian filter [2] used by spamassassin. this posting is about how i’ve done it!

in contrast: after using bayes i receive one spam mail every 4th day.

i still have to monitor the spam, which is filed in a folder called ‘spam’ in one of my ‘imap’ folders but since it can be identified as spam by 100% most of the time it’s simply a copy’n'past operation of all files in ‘spam’ to ‘spam_train’. the ‘spam-train’ folder is used in order to train spamassassin. of course there are false-positives and false-negatives as well. but handling those is very easy.

is my setup special in any way?

no i don’t think so at all. however: most sites do encourage admins not to use bayes filters per vpopmail account. and because of that and other factors there is not much documentation of how to do so. since i had some issues with how to call spamassassin from procmail it did not work for a very long time. recently i bought my nokia n900 mobile phone and using mail on this device is very nice. receiving a spam mail per hour means a notification on the n900 which IS annoying if the notification mostly contains spam slogans.

my .procmailrc

# qmail Lazydog procmailrc file
SHELL=”/bin/bash”
VHOME=”/var/vpopmail/domains/lastlog.de/js”
VERBOSE=”no”
LOGFILE=”/var/vpopmail/domains/lastlog.de/js/procmail.log”
:0fw
| spamassassin –siteconfigpath=/var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/local.cf
:0:
* ^X-Spam-Flag: YES
/var/vpopmail/domains/lastlog.de/js/.maildir/.spam/new
:0:
* ^(From|Cc|To).*news@aktuell.conrad.de
/dev/null
# (other rules below)

my local.cf file, the only config file i changed:

# cat local.cf

required_score          3.2

rewrite_header subject _SCORE(0)_ |

use_bayes 1

use_dcc 1

use_pyzor 1

use_razor2 1

bayes_auto_learn 1

bayes_path /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes

bayes_file_mode 0777

report_safe 1

#add_header all Flag _YESNOCAPS_

ok_languages            en de

ok_locales              en de

# Google SafeBrowsing Plugin

loadplugin Mail::SpamAssassin::Plugin::GoogleSafeBrowsing

#body GOOGLE_SAFEBROWSING eval:check_google_safebrowsing_blocklists()

google_safebrowsing_apikey ABQIAA#VJ#J#VAQFbnTQ4uqKBRgArBR3gWufhSOf_c-vzV4UEN0steDDKD

google_safebrowsing_dir /var/cache/spamassassin

# scores for each url hit in message body

google_safebrowsing_blocklist goog-black-hash 0.3

google_safebrowsing_blocklist goog-malware-hash 0.5

how to experiment with spam

no blog or documentation talks about this but i think it’s very important. say you have a spam mail and you want to test your spamassassin configuration do this:

ls -la /var/vpopmail/domains/lastlog.de/js/.maildir/cur

-rw-rwx—+ 1 vpopmail vpopmail     2630 26. Dez 2009  msg.zQJ68:2,RS*

-rw-rwx—+ 1 vpopmail vpopmail    21905 16. Dez 2009  msg._zr78:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     7433  9. Mär 2009  msg.zssm8:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     2332  6. Mai 17:27 msg.ZuFm8:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     3418 21. Nov 2008  msg.ZvFh8:2,S*

-rw-rwx—+ 1 vpopmail vpopmail     3237 23. Feb 13:07 msg.zZcg9:2,S*

let’s pick one of these messages: ‘msg.zf8y8:2,S

now, let’s see what spamassassin thinks about this message with:

cat msg.zf8y8:2,S | spamassassin -t -D -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/local.cf >mail 2>debug

-t -D are only important for debugging

also have a look at:

tail -n 200 /var/vpopmail/domains/lastlog.de/js/procmail.log

creating the configuration for vpopmail usage of spamassassin

the message is processed by spamassassin and written to a file called ‘mail’ and the debug output going to stderr is written to a file called ‘debug’. this helped me a lot to verify that spamassassin was reading the right config files.

cat msg.zf8y8:2,S | spamassassin -t -D –siteconfigpath=/var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/local.cf >mail 2>debug

watch for things like this:

[7582] dbg: conf: finish parsing

[7582] dbg: plugin: Mail::SpamAssassin::Plugin::GoogleSafeBrowsing=HASH(0x14e3a10) implements ‘finish_parsing_end’, priority 0

[7582] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_toks

[7582] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_seen

[7582] dbg: bayes: found bayes db version 3

[7582] dbg: bayes: DB journal sync: last sync: 0

[7582] dbg: config: score set 3 chosen.

[7582] dbg: message: main message type: multipart/mixed

check: no loaded plugin implements ‘check_main’: cannot scan! at /usr/lib64/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/PerMsgStatus.pm line 164.

it seems that if i use –siteconfigpath spamassassin expects all config files to be in ‘/var/vpopmail/domains/lastlog.de/js/.spamassassin/’ but usually they are in ‘/etc/spamassassin/’. so when using –siteconfigpath make sure you copy all files from ‘/etc/spamassassin/*’ to your own configuration directory ‘/var/vpopmail/domains/lastlog.de/js/.spamassassin/’.
WARNING: do not overwrite your customized files as there might be a local.cf in both paths!
to check if the right bayes path is used, look at the debug file we created:

cat debug | grep bayes
[20792] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_toks
[20792] dbg: bayes: tie-ing to DB file R/O /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/bayes_seen
[20792] dbg: bayes: found bayes db version 3
[20792] dbg: bayes: DB journal sync: last sync: 0
in this case everything is right.
also look if the mail is quallified as spam, just have a look at the ‘mail’ file:

Subject: 16.6 | VIAGRA ® Official Site -95%
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
bonker.serverkommune.de
X-Spam-Level: ****************
X-Spam-Status: Yes, score=16.6 required=5.0 tests=AWL,BAYES_99,
HTML_IMAGE_ONLY_08,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,MISSING_DATE,
MISSING_MID,MISSING_SUBJECT,NO_RELAYS,URIBL_AB_SURBL,URIBL_BLACK,
URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_WS_SURBL shortcircuit=no
autolearn=unavailable version=3.2.1
MIME-Version: 1.0

initial bayes, sa-learn

since i’ve been learning the db wrong for some time i decided to relearn everything with:

cd /var/vpopmail/domains/lastlog.de/js/.spamassassin/db/

ls -la

-rw-rw—-+ 1 joachim users    7824 21. Jun 15:17 bayes_journal

-rw-rwx—+ 1 joachim users  643072 21. Jun 15:09 bayes_seen*

-rw-rwx—+ 1 joachim users 5177344 21. Jun 15:09 bayes_toks*

just remove all 3 files (or create backups if in doubt) and then use sa-learn:

cat spamcommand

o “=== <SPAM Train> ====”

echo “/var/vpopmail/domains/lastlog.de/js/.maildir/.spam_train/cur”

nice sa-learn -D 5 -u vpopmail –dbpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/db –siteconfigpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/user_prefs -C /var/vpopmail/domains/lastlog.de/js/.spamassassin –spam –dir /var/vpopmail/domains/lastlog.de/js/.maildir/.spam_train/cur

echo “=== </SPAM Train> ===”

echo “”

echo “=== <HAM Train> ===”

for i in cur .js@dune2.de/cur .notice/cur; do

echo “learning from: /var/vpopmail/domains/lastlog.de/js/.maildir/${i}”

nice sa-learn -D 5 -u vpopmail –dbpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/db –siteconfigpath /var/vpopmail/domains/lastlog.de/js/.spamassassin/ -p /var/vpopmail/domains/lastlog.de/js/.spamassassin/user_prefs -C /var/vpopmail/domains/lastlog.de/js/.spamassassin –ham –dir ”/var/vpopmail/domains/lastlog.de/js/.maildir/${i}”

echo “=== </HAM Train> ===”

i execute this script every night using a cronjob, and so far it is working great! handling false positives and false negatives is easy as well since you only have to copy the wrong mails into the right folder and the next cronjob probably learns it right. i’m not sure on this but so far it is working.

./spamcommand

==== <SPAM Train> ===

/var/vpopmail/domains/lastlog.de/js/.maildir/.spam_train/cur

[7913] info: archive-iterator: skipping large message

Learned tokens from 2515 message(s) (2531 message(s) examined)

=== </SPAM Train> ===

=== <HAM Train> ===

[22465] info: archive-iterator: skipping large message

Learned tokens from 1100 message(s) (1171 message(s) examined)

[19782] info: archive-iterator: skipping large message

….

Learned tokens from 191 message(s) (199 message(s) examined)

=== </HAM Train> ===

./spamcommand  291,01s user 12,00s system 12% cpu 40:05,37 total

the first time this script run about 40 minutes

rights management

since there might be multiple users accessing the bayes dbs you have to check permissions. most likely these users are: vpopmail and your login name.

final words

remove -t and -D if you have finished debugging as it will flood your logs. also have a look at [3], [4] and [5].

links

[1] http://de.wikipedia.org/wiki/Greylisting

[2] http://wiki.apache.org/spamassassin/BayesFaq

[3] http://wiki.apache.org/spamassassin/BayesInSpamAssassin

[4] http://standbytux.blogspot.com/2005/07/using-spamassassin-and-procmail-to.html

[5] http://faisal.com/docs/salearn


Posts for Wednesday, June 23, 2010

New blog engine up and running

Well, my new blog is up and running. Sorry for the temporary lack of cows in my layout. I'm dogfood-testing the blog engine in a fairly vanilla state until I work out some of the bugs. This layout is based upon barecity, a minimalist Wordpress theme that I adapted easily enough to my blog.

As a bonus, I applied a dirty hack to my RSS feed that I think should help avoid screwing up people's RSS readers with duplicate entries.

I'll write again soon with some info about the blog engine and some things I learned writing it.

(As mentioned previously, here's the code.)

Managing ZIP-based file formats in git

git rocks, everybody knows that, but this morning I ranted about the problems with putting binary files into it due to that removing the option to actually see the differences. Splitbrain came to the comments and pointed out that Microsoft’s new file formats (like pptx, docx and such) are just zipped packages of XML files so at least some diffing should be possible. Remembering what I wrote about versioning PDF files in git I thought it should be a quick hack and in fact it really is:

Open your ~/.gitconfig file (create if not existing already) and add the following stanza:

[diff "zip"]
textconv = unzip -c -a

What it does is using “unzip -c -a FILENAME” to convert your zipfile into ASCII text (unzip -c unzips to STDOUT). Next thing is to create/modify the file REPOSITORY/.gitattributes and add the following

*.pptx diff=zip

which tells git to use the zip-diffing description from the config for files mathcing the given mask (in this case everything ending with .pptx). Now git diff automatically unzips the files and diffs the ASCII output which is a little better than just “binary files differ”. On the other hand to to the convoluted mess that the corresponding XML of pptx files is, it doesn’t help a lot but for ZIP-files including text (like for example source code archives) this is actually quite handy.

speed up firefox (reloaded)

I’ve posted before about this little hack on speeding up firefox. The key is that you actually move the entire .mozilla folder from disk to memory.

First you have to mount /tmp to memory (some linux distributions may do this by default) adding this line to /etc/fstab and rebooting:

none /tmp tmpfs size=512M,nr_inodes=200k,mode=01777 0 0

Then it’s safe to do:

cp -R /home/user/.mozilla /home/user/.mozilla_save
mv /home/user/.mozilla /tmp/mozilla
ln -s /tmp/mozilla /home/user/.mozilla

I updated the script so I can use it as a system init script:

#!/bin/bash

start() {
    mkdir -p /tmp/mozilla
    rsync -avi --delete /home/user/.mozilla_save/ /home/user/.mozilla/
}

stop() {
    size=`du -xs /home/user/.mozilla/ | awk '{print $1}'`
    digits=`expr length $size`
    if [ $digits -gt 4 ]; then
        rsync -avi --delete /home/user/.mozilla/ /home/user/.mozilla_save/
    else
        echo 'no!'
        exit 0
    fi
}

case "$1" in
start)
     start
     ;;
stop)
     stop
     ;;
*)
     echo $"Usage: $0 {start|stop}"
     exit 1;;
esac

So I placed this script at /etc/init.d/, made it executable and created a link inside /etc/rc.5/

ln -s /etc/init.d/ffsync.sh /etc/rc5.d/S99ffsync

rc5 because the default runlevel on Fedora is 5. You can see yours from /etc/inittab.

So every time my system boots the above script runs with the start option (executing the start function) and every time it halts/reboots it runs with the stop option (stop function).

In order to be sure that no data loss will occur in the unlikely event of sudden shutdown, I have a cronjob that saves the mozilla folder every 15 minutes.

*/15 * * * * /etc/init.d/ffsync.sh stop

Trust me. With the above hack you’ll see a significant difference on firefox’s speed. Especially if you’re you using firefox’s awesome address bar to search through your browser’s history.

Breaking links is easy to do

I apologize in advance to everyone who subscribes to my blog's RSS feed, but this week your RSS reader is probably going to suddenly find 25 "new" posts from me.

My blog currently uses /blog/title as the URL scheme, with similar URLs for categories and tags etc. Soon, I'm probably going to change it to /blog/123/title, as part of the impending release of version 0.2 of my blog engine. (The code-in-progress is in a branch on github, for the daring and foolish among you.)

This way, I can change the title of a post without breaking everything. I have heretofore lacked this ability. It's easy to code, you just tell Compojure to ignore everything after the number in a route. Something like this:

(defroutes foo
  (GET ["/blog/:id:etc" :id #"\d+" :etc #"(/[^/]*)?"] [id]
    (pages/post-page id)))

It's only a few lines of code to change, but the ramifications are widespread. It'll instantly break every link to my blog, for example. At least it's pretty easy to set up a bunch of redirects in Compojure to avoid that. I think this'll work:

(require (blog [db :as db]
               [link :as link])
         (oyako [core :as oyako])
         (ring.util [response :as response]))

(defn redirect-post [name]
  (when-let [post (oyako/fetch-one db/posts :url name)]
    (response/redirect (link/url post))))

(defroutes redirect-routes
  (GET ["/blog/:name" :name #"[^/]+$"] [name]
    (redirect-post name)))

(Oyako here is the experimental ORM-like library I'm using to interface with PostgreSQL nowadays, having ditched Tokyo Cabinet.)

Changing my URL scheme is also going to mess up RSS though, because I (foolishly) used post URLs as the GUIDs in my RSS feed up to this point. This problem I don't know how to avoid. I might reduce the number of posts included in my feed temporarily, to limit the damage.

avatar

Excellent Post About Free Software Purists

Edit: Almost forgot to mention: I heard about this blog post from flameeyes.

There was a really, really good blog post that actually clarified to me what was for so long a nuisance in discussions with Free Software purists (and I couldn’t reason out why at the time). Read this quote:

A long time ago, Larry Wall gave a “State of the Onion” speech. Larry is the guy behind Perl, the programming language, and every year he gave a talk about what he believes Perl is all about and where it’ll be going over the following twelve months. Larry’s not really a programmer; he’s a linguist who does a lot of programming. So his view on what a computer language should be is uniquely interesting, and should be read. Anyway, he spoke of freedom, and characterised the ends of the spectrum on the subject as “Bill” and “Richard”, those two being Bill Gates (at the time head of Microsoft) and Richard Stallman (head of the Free Software Foundation): the idea here was that those two represented the ultimate expressions of their philosophies. Richard Stallman was at the far extreme of belief in free software; Bill Gates was at the other far end, decrying the very concept of freedom. It was a throwaway joke, at the time; a caricature for the purpose of a laugh in a presentation.

I encourage everyone to read the original post, since it really brings some light out on the unreasonableness and extreme nature of the Free Software purist position.


Reparenting QGraphicsItem during Mouse Drag

QGraphicsItem::setFlag(ItemIsMovable) is a wonderful feature. With one line of code, an item becomes dragable with the mouse for repositioning. Unfortunately, things get a bit tricky when reparenting during drags.

Let’s say you have two QGraphicsItem called plate1 and plate2, in a subclass called PlateGraphicsItem. Inside plate1 and plate2, you have a few child items, food1, food2, food3, food4, etc, in a subclass called FoodGraphicsItem. Foods are inside of plates. What if we want to move one piece of food from one plate to another, smoothly dragging it there? Intuitively, we should be able to do this (after remembering to run setFlag(ItemSendsGeometryChanges)):

QVariant FoodGraphicsItem::itemChange(GraphicsItemChange change, const QVariant &value)
{
	if (change == ItemPositionChange && scene()) {
		QRectF newRect = mapRectToScene(boundingRect());
		newRect.moveTo(parentItem()->mapToScene(value.toPointF()));
		foreach (QGraphicsItem *item, scene()->items()) {
			if (item != parentItem() &&
			    qgraphicsitem_cast< PlateGraphicsItem* >(item) &&
			    item->mapRectToScene(item->boundingRect()).contains(newRect)) {
				setParentItem(item);
				break;
			}
		}
	}
	return QGraphicsItem::itemChange(change, value);
}

As the user drags the food, we see if it’s boundingbox fits inside the bounding box of a different plate. If it does, we switch parents. But this doesn’t happen. As soon as the item is reparented, it leaps across the screen to the same relative position it was in while inside of the old plate. If it was on the bottom-left corner of the old plate, it moves to the bottom-left corner of the new plate. Not what we want. So it seems like we should be able to modify the inside of that loop like this:

				QPointF oldPos = scenePos();
				setParentItem(item);
				return QVariant(item->mapFromScene(oldPos));

But alas, this does not work either. As soon as you move the mouse again (while still dragging), the item leaps to where it was before the above fix. And remember: this leap puts the item far away from underneath the mouse cursor. So it works initially, but the next time Qt’s MouseMove event handler is called, we’re SOL.

After digging through the Qt source for quite a bit of time, it turns out that Qt keeps track of where the mouse was pressed originally, in the scene coordinates, and uses this to compute the new location relative to the old parent coordinates. Not good. Mixing coordinate systems like this is not okay once the item is reparented. It turns out, we have to simulate a mouse release event to clear Qt’s internal state. But this doesn’t completely do it, because the mouse event still has the scene coordinate of where the button was pressed, which means we also need to intercept and tamper this information before Qt sees it. Essentially what this amounts to is reversing Qt’s coordinate equation, solving for zero, and adding back the location that we want. And this has to happen every time the user moves the mouse, but it should only happen once the item has been reparented. And if the item is reparented back to the original plate after a series of drags, we still have to do this transformation, since we have already cleared Qt’s internal state, which means we need to store the click location once we reparent.

So all and all, this horrible hack amounts to adding this (and making the above modification):

void FoodGraphicsItem::mouseMoveEvent(QGraphicsSceneMouseEvent *event)
{
	//BIG PHAT UGLY HACK!
	if (event->buttonDownScenePos(Qt::LeftButton) == m_manualMouseReleaseAt ||
	    !parentItem()->contains(parentItem()->mapFromScene(event->buttonDownScenePos(Qt::LeftButton)))) {
		m_manualMouseReleaseAt = event->buttonDownScenePos(Qt::LeftButton);
		QGraphicsSceneMouseEvent mouseRelease(QEvent::GraphicsSceneMouseRelease);
		QGraphicsItem::mouseReleaseEvent(&mouseRelease);
		event->setButtonDownScenePos(Qt::LeftButton, parentItem()->mapToScene(pos() + transform().map(event->buttonDownPos(Qt::LeftButton))));
	}
	QGraphicsItem::mouseMoveEvent(event);
}

This works, but it’s awful. Really awful. And it could break in future Qt versions. But for now, it works.

So I’m wondering — is this behavior by design, or have I uncovered an odd Qt bug?

Update: It looks like others have encountered the same problem. Well, here’s a work around. But is there a more correct solution?

Posts for Monday, June 21, 2010

Vim regex - remove kind-of-matching lines

I have a file where every line starts with a number (followed by whitespace and a bunch of other stuff). Every number appears on either one or two lines, and if two, the second line always has a b after the number.

I need to delete every line for which there's a corresponding b line. But if there's no corresponding b line I want to leave the original line there.

Before:

123 foo bar
456 blarg
789 quux
123b foo baz
789b quux blurble

After:

123b foo baz
456 blarg
789b quux blurble

Except in my real file, I have a thousand lines and it'd take a year to do by hand. Vim to the rescue:

:sort
:%s/^\v((\d+).*\n)(\2b.*)/\3/

And that is why Vim is awesome. Can you think of a shorter way to do this, in Vim or Emacs?

Ads on license plates?

What if when your car stops at a red light, your license plate displays ad banners? What could possibly go wrong?

Quoth the person(?) who wrote this bill:

"We're just trying to find creative ways of generating additional revenues," he said. "It's an exciting marriage of technology with need, and an opportunity to keep California in the forefront."

The forefront of annoying the hell out of people. Certainly what I need is more distractions on the road. I mean, what if there's a new brand of toothpaste and I didn't find out yet? Someone somewhere needs to earn a dime for telling me about it by any means necessary.

I'm just waiting for the first company to propose paying new parents a few hundred dollars to tattoo ads on their babies.

Biting the Hand That Feeds

Granted, I’m not a Groklaw junkie. Lawsuits are the epitome of bureaucratic boring and are not a creational activity. So, keep in mind that I only read the occasional major headlines from The SCO Group’s escapade in futility. It does come as a relief that the show is finally over.

The situation in many ways paralleled the AT&T UNIX lawsuit of the ’90s. It is every bit as ironic considering, for instance, that BSD folks were largely responsible for the success of UNIX. In SCO’s case, their OS (SVr4 UNIX) is based largely on the work of Research UNIX, BSD, and even includes a wide selection of GNU tools in userland.

Except this time, no offending code was ever demonstrated, just a straw-man argument and utter defeat.

Still, I feel an overwhelming sense of sadness over the whole affair. Santa Cruz Operation and System V UNIX were respectable in their time. My guess is that SCO along with UnixWare (the natural evolution of System V) will fade into oblivion along side the countless other dead UNIX implementations. Linux zealots often take jabs at other implementations, but I think UnixWare could have held a viable, if niche, place in the enterprise had been under proper stewardship.

As time goes on, the UNIX diaspora seems to be waning. We are left with, essentially:

  • Linux, with a mostly GNU userland as the heavyweight contender
  • The BSDs, perhaps equal or greater in architectural quality but relatively unknown giants. We can however lump Mac OS X in here which is the most widely used.
  • Solaris, which might be considered an open source System V fork. An interesting OS that has a a great lineage and potential, but lacks in trust and certainty for contributors at the moment.
  • AIX, a System V and BSD hybrid with plenty of IBM thrown in for good measure. Perhaps the gold bar and only remaining competitive mid-iron standard due to IBM’s silicon prowess.
  • HP-UX, an older System V linage perhaps on a slow deathbed due to reliance on the vapid Itanium

History has been unforgiving to those companies that try to unfairly weasel programmers and users in this market. The lesson is to work with and encourage your development community and not bite the hand that feeds. My eyes are on Oracle for the time being. Sun had a hard enough time nurturing the Solaris community despite being a favorable company, and Oracle can just as easily kill this operating system through boneheaded maneuvers if it is not careful.

Share and Enjoy: Digg del.icio.us Slashdot Facebook Reddit StumbleUpon Google Bookmarks FSDaily Twitter email Print PDF

Related posts:

  1. Linux and the PC Mentaility Here’s a minor thought dump of the moment: no hard...
  2. Mobile phones spur cross-platform applications. Open Source is mainstream. I had an interesting day today.  At school, we had...
  3. My thoughts on software and complexity My thoughts on the growth of the Linux kernel and...

Posts for Sunday, June 20, 2010

Paludis 0.48.0 Released

Paludis 0.48.0 has been released:

  • --reset is now the default for git syncers. If Paludis is used to sync a git repository with modifications, those modifications will be lost. Use --no-reset to stay with the old (git pull) behaviour, but note that it is probably better to change your workflow such that Paludis never directly works with any repository it does not manage.
  • environment.conf is now general.conf. Your existing environment.conf should be renamed.
  • When switching UIDs, we now also pick up supplemental groups.
  • World is now updated for package moves.

Filed under: paludis releases Tagged: paludis

Posts for Thursday, June 17, 2010

Printer spam: what could possibly go wrong?

As further evidence that there are no depths to which companies won't stoop when it comes to advertising, HP has come up with a great idea: Get people to hook their printers up to the internet and then spew advertisements out of their printers.

Well, it's a win-win situation for the companies doing the advertising: Not only will people see your ads, they'll pay for the ink and paper to print them. Maybe not such a great situation for the end-user though.

And then there are the privacy implications of targeting ads based on geolocating the IP address of the printer. Which I find a bit disturbing, but I guess advertisers already do that with online ads. But wait, there's more:

Ads can also be targeted based on a user's behavior as well as the content, said Vyomesh Joshi, head of the HP's Imaging and Printing Group.

Looking at what I'm printing so you can try to sell me things? Just a bit creepy.

Most troubling to me is the intrusiveness of the whole thing. They're taking control of a physical object in my house and using it against me. May as well kidnap my cat and train him to spell out "BUY PEPSI" in his cat litter.

Quote some slimeball at HP:

"What we discovered is that people were not bothered by it [an advertisement]," Nigro said. "Part of it I think our belief is you're used to it. You're used to seeing things with ads."

Translation: "We know this is a really horrible idea, but if people are complacent enough to sit there and take it without complaint, what's stopping us?"

He's right though, people are used to it.

I guess TV, radio, internet, phones, product placement in movies and games, print media, billboards and the postal service just aren't enough. Clearly what the world really needs is another ad-delivery mechanism.

Posts for Wednesday, June 16, 2010

avatar

Restoring ssh connections on resume

I use pm-utils for hibernation support.
It has a hooks system which can execute stuff upon hibernate/suspend/thaw/resume/..., but they run as root.
If you want to run stuff as a regular user you could do something like

  1. su $user -c <command>

..but these commands have no access to your user environment.
In my user environment I have a variable which I need access to, namely SSH_AUTH_SOCK, which points to my agent which has some unlocked ssh keys. Obviously you don't want to reenter your ssh key passwords everytime you resume.
(In fact, I started using hibernate/resume because I got tired of having to enter 4 passwords on boot. - 1 for dm_crypt, 1 for login, 2 for ssh keys, not because it is much faster)

The solution is very simple. Use this:

  1. sudo pm-hibernate && do-my-stuff.sh

This way, do-my-stuff.sh will be executed when you resume, after the complete environment has been restored.
Ideal to kill old ssh processes, and setup tunnels and ssh connections again.
I'm probably gonna integrate this into my microDE

Posts for Tuesday, June 15, 2010

Adobe support, the FSF and double standards

Flameeyes has a great article over on his blog about the recent shift in the perception of Adobe in the FLOSS world.
So I am not the only one who sees some weird stuff going on

avatar

Effective data visualisation for the WIPUP dashboard.

Data visualisation is a big, big topic. It’s often underestimated by the majority of society. But what is data visualisation? It’s not so much how effectively you present data – no, it’s how effectively you deliver information. Today since work has resumed on WIPUP I decided to take a look at how to implement effective data visualisation in the WIPUP dashboard. Before we continue, let’s take a look at how the dashboard looks like now:

Obviously unfinished, the dashboard looks like somebody vomited. Disregarding the obvious visual glitches such as the 0% segments in the pie chart and questionable use of colours, the most obvious thing that immediately catches our eye is … nothing. That’s right. It’s a mosaic of data, no clear correlations or focus is delivered to the user. Well, what would you expect if 6 charts were thrown at you? It’s not hard to see that there is a problem here to be fixed.

The first step is to consider, based on the results of dogfooding, which graphs aren’t realy useful to the user at all. The two obvious ones are the comment activity and the popularity based on subscribers. If, hypothetically, there was a popular WIPUP user, the usefulness of these two will increase, but overall it’s quite agreed that the majority of users will not find it useful, and even if showing a variety of data, we have to look at the purpose behind comments – to initiate discussion. Such a qualitative concept is not well presented with quantiative means and thus should not be attempted. Looking at the subscribers chart, it may be useful, but doesn’t warrant its own graph.

The second step is to consider exactly what medium we are using to present the data – in this case we are using line charts and pie charts. Line charts are appropriate here as it is used with a timescale, but as for the pie charts – well, experts unanimously agree that there is absolutely no usecase for piecharts. Pie charts should not have been invented. They are useless at showing data. It is hard to determine which segment you should focus on, and the colours distract rather than emphasise the data. Even worse, I have commited the ultimate evil of creating a 3D pie chart. Here’s something you should try out – cover all the percentages of the “activity per project” pie chart, and look at the two large segments (purple and green). Looks about the same proportion, eh? Nope. There’s a stunning 10% difference between them. Try the same with the orange and yellow together – big relative difference again. This trick even works with groups – the orange and yellow segments together look around the same as the pink and lime green, no? Yep, again, there’s a big difference. Conclusion: piecharts suck. The only benefit is that they look pretty.

The third step is to search for correlations. The most obvious one is how many updates I have made and how many views I’ve received (duh). The important thing here is that from the user perspective, they want that correlation. They want to see that the more they update, the more views they get. Thus we should try and emphasize this correlation. How better than to merge the two graphs? This way any mismatch of this ideal correlation (ie, an update they thought would be interesting, but from the statistics, show otherwise) would be obvious.

After a quick brainstorm, I copied the raw data onto OpenOffice Calc (equivalent of Microsoft Excel), and tried this alternative:

As you can see, I’ve compressed the data of 5 graphs into 2 graphs (say, I didn’t know Calc could do 2 y-axes). On the left, we see a stacked bar chart showing activity, kudos and subscribers (trackers are added as a subscriber to all projects, which is why it is all equal). Immediately the user can create a focus and see correlations. For example, this allows us to sort the data in terms of most to least. In this case, I’ve spent the most of my updates on uncategorised updates. However, we can now easily determine the correlation that even though the activity spent on uncategorised work is the most, it doesn’t create as big an impact as say, the Evan project, of which it is clear that the Kudos+Subscribers:Activity ratio is much greater. Infinitely more useful, don’t you think?

On the right, we can see we’ve showed the correlation by merging the two line graphs. Now we can clearly see that there is a slight correlation misalignment in the update I did on the 07/06 – perhaps the update wasn’t very interesting? Or perhaps it’s simply because I wasn’t as actively giving out updates as before – all very interesting conclusions.

Well folks, I hope I shared some useful stuff today. Next step, implementing the changes and revamping the dashboard.

Related posts:

  1. The Eadrax Dashboard
  2. After the WIPUP release, the stats are in.
  3. The WIPUP 21.02.10 stats are out.

Planet Larry is not officially affiliated with Gentoo Linux. Original artwork and logos copyright Gentoo Foundation. Yadda, yadda, yadda.