Ramblings of a Linux Advocate

Wednesday, December 9, 2009

Cookies Banned In Europe? WTF?!

Ok, here's one for the ages. The EU has enabled a law that bans the use of cookies on a website unless the visitor has "opted in." You can read all you want about it here: http://www.theregister.co.uk/2009/11/25/cookie_law/

Do the people enacting this law have any idea how the web operates? Or what kind of enforcement strategy they would even use? What if the server is outside Europe but the visitor is inside? Do they know that every major browser today allows you to do this already?

Here is a humorous comment by Information Society Commissioner Viviane Reding:

"In the E-Privacy Directive it is made very clear that a user can only give out his private data if there is prior consent so if there are spy cookies there must be a prior consent of the user, very clearly so. But there are also the so-called technical cookies, those which make the whole infrastructure of the internet function. Those are not concerned by this rule, just to clarify, because there were some critics that this amendment would make it impossible for the internet to function. It does not, it is a guarantee for the rights of the consumers."

Sorry Viviane, but the distinction between "technical cookies" and "spy cookies", as you refer to them, is virtually impossible to separate. For instance, we employ multivariate testing on a variety of our websites to make sure that the best combination of a webpage is shown to certain types of users. The cookies involved are clearly used for tracking purposes, but without them our pages won't even load. Not to mention that if we increase our "conversion" we are actually creating a better web experience for all of the visitors to our page.

It's like they think marketers or websites use cookies to locate and eat citizen's puppies or something. I for one would love to see a new law enacted where every politician for any government has to pass a simple IT test before they take office. Then maybe we can avoid wasting time and taxpayer dollars on idiotic things like this and focus more on the real issues facing society today.

Wednesday, August 12, 2009

Fixing Firefox 3 with Shared /home

Ok, I finally made the plunge at the office and am moving all of our workstations over to a system where the /home directory is shared via GlusterFS (www.gluster.org) and I use NIS for central authentication. Everything works beautifully, but today I came upon a very strange issue related to Firefox 3. It turns out the FF is known to have issues with shared /home directories since they moved to SQLite. I found bugs reported with NFS, AFS, GlusterFS, etc.

Here is my down and dirty fix. Basically a script on login sets up a /tmp/firefox-$USER folder and symlinks ~/.mozilla/firefox there. On logout another script copies everything in /tmp back to a folder at ~/.mozilla/firefox-sync. I am using this on Ubuntu Jaunty and these instructions are for it.

Step 1: sudo nano /etc/gdm/PostLogin/Default

#!/bin/bash

# move the .mozilla/firefox directory if there
if [ ! -L "$HOME/.mozilla/firefox" ]; then
   if [ -d "$HOME/.mozilla/firefox" ]; then
           mv "$HOME/.mozilla/firefox" "$HOME/.mozilla/firefox-sync"
   fi
fi

if [ ! -d "/tmp/firefox-$USER" ]; then
   mkdir "/tmp/firefox-$USER"
else
   rm -rdf "/tmp/firefox-$USER/"
   mkdir "/tmp/firefox-$USER"
fi

chown -R $USER:$USER /tmp/firefox-$USER

# copy the users files over
cp -rpdf $HOME/.mozilla/firefox-sync/* /tmp/firefox-$USER/

# create the link
if [ ! -L "$HOME/.mozilla/firefox" ]; then
   ln -s "/tmp/firefox-$USER" "$HOME/.mozilla/firefox"
fi

Step 2: sudo chmod 755 /etc/gdm/PostLogin/Default

Step 3: sudo nano /etc/gdm/PostSession/Default

#!/bin/bash

#this moves all the data out of tmp back to the firefox.sync folder
if [ -d "/tmp/firefox-$USER" ]; then
        rsync -a --delete /tmp/firefox-$USER/ $HOME/.mozilla/firefox-sync/
fi

Step 4: sudo chmod 755 /etc/gdm/PostSession/Default

I hope this helps someone out there that is trying to remedy this behavior on their network

Saturday, November 15, 2008

SUDO CAR

About 5 months ago I finished my coding for the day and a thought ran into my mind that I thought would be rather funny. Here are the results:

*My wife reminds me that I am probably in the 1% of people that find this amusing.

Saturday, May 3, 2008

Microhoo and My 2 Cents

I thought I would make a new blog post. It has been a couple months, sorry to those of you that have commented awaiting a response. I will try to be more consistent about my posting times.

Well, this morning I was listening to 640 AM in the car, which out here in California is a talk radio station called KFI. I was very excited when I heard that Leo Laporte was the host. I had watched him all the time on TechTV when he was on shows like The Screensavers. In addition to the radio show, he is doing some other things with many of the folks from TechTV and new guys like Kevin Rose. His new screencasts can be seen/heard over at Twit.tv He also has appears in shows over at Revision3. I definitely recommend checking them out.

Anyways, Leo was talking about the recent Microsoft/Yahoo bid that I like to term Microhoo. Leo was making excellent points about how it doesn't seem like it makes sense to combine the companies and how it is in fact a bad idea. I have personal experience in the matter so I decided to call up and let him know from the "front lines" of the internet advertising world what we thought of the potential merger.

MSN and Reporting

As I told Leo, MSN is single handedly the WORST company I have ever dealt with. It's amazing to me that the largest tech company in the world is soooo far behind in the online space. I told a story about how I was on the phone with an MSN advertising rep because I needed a certain type of Keyword/Ad copy report on clicks and stuff. The type of report by the way, that all the other engines have by default. The rep's response to me over and over was and I quote, "Well, we're not Google." I DON'T CARE!!! Just let me know how much I am spending with you on particular creatives! This is only one of many stories I could tell about the horror that is MSN advertising.

Yahoo Search Sucks

Here is some proof in the pudding of how far behind Yahoo is from Google as far as search technology goes. I told Leo that Yahoo has a lot of bright minds working for them. For instance, I really respect Jeremy Zawodny who works at Yahoo and is nothing less than a MySQL guru. However, while I am sure their MySQL setup is simply amazing their search algorithms leave a lot to be desired.

We were bidding on a variety of loan keywords in Yahoo and paying a prince's ransom for them. Loans of any kind is an incredibly competitive market in pay per click and you have to duke it out with thousands of other advertisers to earn your keep. One of the terms we saw doing poorly was the term "lenders". Upon further review we found a variety of other words that Yahoo was listing us for with this word that were completely off topic. The most notable of the off topic words was "lender's bagels" I think anyone here can understand that buying bagel terms for north of $3 a click really isn't going to work out in the long run.

At first I thought it was our fault. Heck, maybe we put in the word as Advanced match and were supposed to get any related term. However, when I checked we were using Standard NOT Advanced match on the keyword. In Yahoo you can choose to use Standard or Advanced Match when choosing what keywords you would like traffic for. Google on the other hand has Broad, Phrase and Exact which is infinitely more clear. But let me take a moment and introduce you to the debacle that is Standard Match. Here is word for word what the Yahoo API documentation says about Standard match:

"Sponsored Search displays your ads to users who enter search queries related to your keywords. Sponsored Search has two match types:

* Standard - Displays ads for exact matches to your keywords, as well as for singular or plural variations, common misspellings, and topics that are relevant to your keywords, titles, and descriptions.
* Advanced - Displays ads for a broader range of searches relevant to your keywords, titles, descriptions, and web content."

Here is where my problem lies. Standard match apparently "Displays ads for EXACT matches to your keywords". Oh, except when we want to add singular plural variations, misspellings and topics relevant. You have to be kidding me! Do they not understand the definition of the word exact?! Last time I checked it meant there was NO variation. That exact was exactly what it was. :) Here is the dictionary.com definition:

"precise, as opposed to approximate" and "admitting of no deviation"

Guess what Yahoo? When I want to purchase an exact word there is a darn good reason why I want exactly that word. I don't even have to go into the psychological difference of someone searching for even a slight variation of a word, even plurals. We are having a wonderful time right now trying to purchase the word "window" but alas keep getting "windows". As you can imagine, most of the plural searches tend to be software related while the singular ones are more about the building materials. You should see how many "Negative Match Keywords" we have to use with Yahoo just to try to hone in on the words we want, it is crazy. In contrast, a Google Exact match is exactly that, just the word you want...go figure.

Yahoo and API Difficulties

Next up is Yahoo. I told Leo that Yahoo is the second WORST company I have ever dealt with. No one at that company is empowered to make decisions and it seems so disjointed it's a wonder to me that they are still in business. Here is a personal story to clue you in on what I am talking about.

Several months ago, we were spending a sizable amount of money every month with Yahoo. (It was in the 5 figures and is now in fact in the 6 figures) At that time were were using the API extensively. Side track: I founded a company about a year ago with 4 other guys and we have built our own technology that helps us find out what keyword with what landing page and what ad copy on what day at what time of the day is most effective. Basically like multivariate testing on crack. We then make decisions on keyword bidding and creatives using this data. Anyways, we were reaching our limit with the Yahoo API. We thought, no problem we can just call up customer service, explain the problem and then get the caps lifted...or so we thought.

After speaking with our rep we were told that we would have to speak with the API team about the problem and that she can't do anything about it. She let us know that someone from the team would contact us shortly because for some reason they can't be contacted. They can only contact us. A few days passed and we hadn't heard anything. A couple more calls to customer service let us know that a ticket was open and we should hear from someone soon. A few more days passed and still nothing.

One of our partners got the idea to start dialing for numbers at Yahoo. We knew that the local Yahoo numbers were all (454) 555-XXXX (That isn't the actual number but you get the idea). So he literally started randomly dialing numbers in that prefix. About ten calls into it we got a hold of a very nice man from some other division, I don't remember exactly which one). We told him about our problem with the API and he told us to hang on. He called the API team and once again said someone would contact us. However, this time it worked! In a couple of hours we had an email from a rep on the API team and were on the phone with them.

For those of you that don't know, Yahoo does their API a little bit differently than Google. With Google you get a certain amount for free and then anything above and beyond you pay a nominal fee. It makes sense because they don't want you endlessly banging away on their servers and the cost helps people make sure their code doesn't get stuck in endless loops. Because if it does, than you could end up paying a lot of money. Yahoo does it differently, they offer the API free of charge completely which is nice. However they restrict the quota so much that it is hard to do anything meaningful with it. If you have a valid reason to up the quota though, you can request a review and they will decide whether or not they will up your limits.

Next, they sent us a Microsoft Word document that is a couple pages long in which you need to detail all the reasons why you want more quota added to your account. A Word document?! If you are one of the largest internet companies in the world you would think you have enough developers to create a webform. Seriously, sending files back and forth through email doesn't give off the most polished image. Anyways, they "lost" the first file and so we had to send it again. 3 or 4 phone calls later our quota was finally lifted.

The whole time this was going on, our spend with them was stagnated. We couldn't grow because our API couldn't grow with us. This is a process that literally took weeks, when at the longest it should have taken hours.

In conclusion these are just a few of the examples of why I think these two companies would be a terrible combination. In fact, Leo said that if it went through it would be curtains for both of them. I couldn't agree more. While I have my differences with both, I do like the idea of other people jockeying for market share from Google. Competition is always better in my opinion. Albeit both Yahoo and MSN are tripping over their own shoelaces trying to catch up.

Tuesday, February 12, 2008

I Love My Wife

This was great. I opened the fridge this morning to find out that my wife had whipped up this lovely piece of artwork. For those of you that can't make out the bottom it says: "I love it when you speak code to me! XOXO" Man, I am such a geek.

Friday, January 18, 2008

Linux vs. Windows vs. OS X - The Kernel Debate

Let's talk about the underpinnings of an operating system. At the heart of all operating systems is something you may have heard of called the kernel. The kernel is the true brains of the OS and controls virtually all aspects of interacting with the underlying hardware. Basically without this essential component of the operating system your computer would just be an expensive paperweight with spinning fans.

There are two main approaches in regards to kernel design The first of which is the monolithic kernel. A monolithic kernel is one in which the entire kernel exists as one large file. The second approach is known as the microkernel. An OS built on the microkernel approach will have many smaller kernels that all communicate with each other.

Proponents of the micro approach make the case that the kernel becomes simpler when you split it up into smaller parts. In general, the kernel is a very complex body of code. (Many would say that is an understatement of Biblical proportions) The general idea is that you can make a very complex project much more palatable by breaking up its parts and working on those individually. In contrast, proponents of the monolithic approach argue that microkernels actually become more complex and have more exploits simply because the communication between all the parts becomes harder and harder to deal with as the kernel grows. They argue that the kernel should be treated as one entity so as to avoid this type of intricate communication network.

So, let's take a look at the 3 major desktop operating systems out there and find out what they are running under the hood.

Windows
With the development of Windows NT, the operating system moved to a microkernel approach. This means that Windows NT, 2000, XP and Vista are all based on a type of microkernel. In fact, NT was based on the MACH kernel, which was a microkernel project out of Carnegie Mellon University that was supposed to be the answer for all operating systems.

Mac OSX
Since the introduction of OSX Apple moved to a kernel based on an open source BSD kernel. In fact, this kernel takes a microkernel approach and is actually based on, yep you guessed it, the MACH kernel. The move was praised by many and in fact, I feel that it was the best move for the Mac.

Linux
Linux stands out from the other two in its monolithic approach. The Linux kernel is one large file that is made up of over 5.9 million lines of code. In fact, almost all of the hardware drivers are contained in the kernel itself. This makes installing "drivers" in Linux a non issue. Most hardware simply works out of the box.

As you might have guessed from the title of this blog, I personally endorse the monolithic Linux kernel to the other two. Specifically, the kernel design in Linux can be attributed with much of its many benefits. The monolithic approach that Mr. Torvalds has architectured scales, adapts, and performs surprisingly well. In fact, a big argument against microkernels has always been performance. Modern microkernels have become incredibly complex as they deal with new hardware and environments. This makes them more difficult to develop in my opinion. Every time a new piece is added to the kernel, the communication network that interconnects it to the rest of the pieces becomes exponentially more complex. This can many times lead to more security exploits as well as performance issues.

I hope this post can help people out there understand a little bit more about kernel design and the similarities and differences between the major operating systems.

Sunday, December 30, 2007

CentOS 5 + GFS

In our company, we recently decided that we needed to consolidate all our servers and add capacity. In the past we had 3 distinct "clusters" that all satisfied different needs. It was a bit of a mess with machines running Fedora Core 5, 6, and 7 with numerous points of failure amongst the machines. We were throwing all the resources of our company behind one major project and neeeded all our server horsepower to work together harmoniously and be easy to administer with lots of space to grow.

We quickly chose the CentOS distribution as our weapon of choice because you can't beat the price (it's free), it is built on very solid code (RedHat Enterprise Linux), and had a long support lifetime. If I am not mistaken CentOS 5 will be supported for something like 8 years.

Our next decision was to invest in a SAN of some kind so we can speed up data access while adding redundancy with room to grow. We eventually purchased the SR1521 device from Coraid (http://www.coraid.com), and let me tell you, we will buy more devices from them again and again. This machine is unbelievable and uses the ATA over Ethernet protocol to move data quickly over a gigabit network. I will make another post about this device in the near future.

All in all we had 12 servers and a new storage network to use, so we immediately began researching clustering file systems. Being that CentOS is a RedHat derivative we ultimately decided to use GFS as it is natively supported (that doesn't mean that it's easy to setup) and is used in some very large clusters worldwide (which tells us it is production ready). We use GFS to share things like web server directories, various configuration directories and so on. This makes it incredibly easy for us to add a new server into the fold and have it up and running quickly.

I noticed that there isn't a ton of info on this subject on the net and the RedHat documentation was a little confusing so I will share about how we got it working for us.

First things first, when you are installing CentOS 5 be sure to install the Cluster FS option. You can include whatever else you would like, but this package is absolutely necessary. After install I immediately do the following:

yum install ntp
chkconfig ntpd on
service ntpd start

It is vital that the machines in your cluster are in sync as far as time is concerned. If they are out of sync it can cause problems later when more than one machine is accessing the same file at the same time.

The next thing I do is add my GFS mount point folder to the /etc/updatedb.conf file. Basically, this file has a line of all folders to NOT include when updatedb runs. Updatedb is a very nice indexing service that allows you to use the "locate" command to search for files and directories on your machine. A very handy tool, but when you have 11 machines banging on every byte of a multi-terabyte SAN at the exact same time it causes massive problems, and in fact our cluster was crashing EVERY morning between the hours of 4 and 7 am. You can take a look at my frustration here:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=11432&forum=41

The mount point that we use is /san so I simply added this to the /etc/updatedb.conf file like so:

PRUNEPATHS = "/afs /media /net /sfs /tmp /udev /var/spool/cups /var/spool/squid /var/tmp /san"

When you are starting your cluster with your first machines there are a few files to setup. The first is /etc/lvm/lvm.conf You don't need to use lvm for a GFS filesystem, but we do.
The only thing I do to this file is change the scan directory. Since I am using AoE with the Coraid device I simply changed my scan line to look like this:

scan = [ "/dev/etherd" ]

In our cluster the only logical volume we are using is on the Coraid device and so I didn't feel like scanning all of /dev every time, but you could seemingly keep this file at the defaults or change it like I did to be more specific, and hopefully make boot time a little quicker.

The next file we are going to setup is a biggie and that is the /etc/cluster/cluster.conf Basically, this file is the grandaddy of them all in terms of GFS and tells the cluster who is a member how it should work together and so on. Here is a stripped down version of the file we use:

<?xml version="1.0"?>
<cluster config_version="25" name="san1">
<fence_daemon post_fail_delay="0" post_join_delay="120"/>
<clusternodes>
<clusternode name="db1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="apc1" port="db1"/>
</method>
</fence>
</clusternode>
<clusternode name="db2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="apc1" port="db2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_aoemask" name="fence-e1.0" shelf="1" slot="0" interface="eth1"/>
<fencedevice name="apc1" agent="fence_apc" ipaddr="192.168.2.247" login="somelogin" passwd="xxxxxx"/>
</fencedevices>
<rm>
<failoverdomains/>
</rm>
</cluster>

There are some important things to note here. You will see that I am using hostnames (db1 and db1 here). You use hostnames to identify certain nodes in your cluster and therefore every server needs to have the same entries for these hostnames in their /etc/hosts file. You may be thinking to yourself, "well I could just use DNS for that." It is recommended to use the hosts file because it is quicker (no network latency to look up host names) and inherently more reliable because you don't rely on a few machines to translate names, rather every machine in the cluster knows exactly where the others are. Whenever I modify my hosts file I simply scp the /etc/hosts file to every machine in the cluster.

The second item you will notice in the cluster.conf file is the fencing section. Fencing is ultra ultra important to a GFS cluster. Basically, the cluster needs a way to remove nodes that it deems as unsafe to the cluster as a whole from the data stored on GFS. This ensures reliability of your data. The recommended way to do fencing is through a power switch that can be controlled over the network, but you can also do it at the SAN level. The Coraid device has a way to filter by MAC address and we actually used that for a while, but then switched to the power option because it was easier to work with. I will talk about these specific fencing options in a later post.

I am embarrassed to admit it, but we actually ran our cluster with manual fencing for a while because I didn't know about the Coraid MAC option yet and we hadn't purchased APC power strips yet. Let me just say that you can do it, but you will undoubtedly run into a problem like this. With manual fencing, if a node is dies unexpectedly or is deemed unsafe, GFS doesn't know how to turn the node off, so it does the next best thing and locks EVERYONE out from the filesystem. NONE of your other machines will be able to read data from the GFS cluster until they ALL are rebooted. Oh yah, and CentOS 5 specifically won't respond to the reboot command. It will try to reboot, but will hang forever. You have to physically cut the power to the server or press the power button to bring the machines and the cluster back. Needless to say, this is not a good option if your datacenter is miles away and it is 3 in the morning.

Once you have your /etc/cluster/cluster.conf, /etc/lvm/lvm.conf and /etc/hosts files ready to go you can start your cluster with the following commands:

service cman start
service clvmd start
service gfs start

Then you can mount your logical volume like this:

mount -t gfs /dev/san1/lvol0 /san -o noatime

Another performance booster of note is the "-o noatime" section. atime is a timestamp for the time the file was last accessed. You may in fact need it for your apps, but ours and many others could care less when the last access was. With atime on you are forcing a small write for every read. If you don't need this parameter then using noatime will boost the performance of your GFS volumes.

I said before that we are using the Coraid device so the way we handle starting our GFS cluster at boot is by using the /etc/init.d/aoe-init script. Here is a sample version of this script:

#! /bin/sh
# aoe-init - example init script for ATA over Ethernet storage
#
# Edit this script for your purposes. (Changing "eth1" to the
# appropriate interface name, adding commands, etc.) You might
# need to tune the sleep times.
#
# Install this script in /etc/init.d with the other init scripts.
#
# Make it executable:
# chmod 755 /etc/init.d/aoe-init
#
# Install symlinks for boot time:
# cd /etc/rc3.d && ln -s ../init.d/aoe-init S99aoe-init
# cd /etc/rc5.d && ln -s ../init.d/aoe-init S99aoe-init
#
# Install symlinks for shutdown time:
# cd /etc/rc0.d && ln -s ../init.d/aoe-init K01aoe-init
# cd /etc/rc1.d && ln -s ../init.d/aoe-init K01aoe-init
# cd /etc/rc2.d && ln -s ../init.d/aoe-init K01aoe-init
# cd /etc/rc6.d && ln -s ../init.d/aoe-init K01aoe-init
#

case "$1" in
"start")
# load any needed network drivers here

# replace "eth1" with your aoe network interface
ifconfig eth1 up

# time for network interface to come up
sleep 4

modprobe aoe

# time for AoE discovery and udev
sleep 7

# add your raid assemble commands here
# add any LVM commands if needed (e.g. vgchange)
# add your filesystem mount commands here
service cman start
sleep 3
service clvmd start
sleep 3
service gfs start
sleep 3
mount -t gfs /dev/san1/lvol0 /san -o noatime

test -d /var/lock/subsys && touch /var/lock/subsys/aoe-init

# Bring up http after the filesystem is mounted
service httpd start
/usr/bin/memcached -d -m 512 -l 192.168.2.100 -p 11211 -u nobody
;;
"stop")
# Stop http before the filesystem is unmounted
service httpd stop

# add your filesystem umount commands here
umount /san

sleep 3
service gfs stop
sleep 3
service clvmd stop
sleep 3
service cman stop

# deactivate LVM volume groups if needed
# add your raid stop commands here
rmmod aoe
rm -f /var/lock/subsys/aoe-init
;;
*)
echo "usage: `basename $0` {start|stop}" 1>&2
;;
esac

I then do the following to make sure that these services are only started through the aoe-init script:

chkconfig gfs off
chkconfig cman off
chkconfig clvmd off

I hope this info was helpful to someone out there. I will edit and add to this post to make it more thorough, I am sure there are small elements I left out.