Posts

Showing posts from August, 2012

Selecting N random lines from a text file (Perl)

In many situations I need to sample a few lines from a big file, the basic approach is to take just the first N lines in the file, but that isn't correct, a real sampling needs a random selection of points to avoid some bias. Here is my solution in the Perl language: #!/usr/bin/perl =head1 NAME randomLines.pl =head1 DESCRIPTION Subsample some random lines from a text file. =head1 USAGE perl randomLines.pl [PARAM] Parameter Description Value Default -i --in Input file File STDIN -o --out Output file File STDOUT -n --num Number of lines to sample Integer 1000 -t --total Total lines in file Integer 1000000 -f --first Include first line Bool No -h --help Print this screen and exit -v --verbose Verbose mode --version Print version number and exit =head1 EXAMPLES 1. S

Moving to Fedora

After many years using Mandriva (even when it was Mandrake), I finally move my personal systems to Fedora . I always consider Mandriva an excellent Linux distribution, really easy to install and use (even more than Ubuntu), but the current situation of Mandriva (please google-it) gave me the signals to move on. My current situation is developing (Perl, C, R) using my Mac with Snow Leopard, but all my heavy work runs on linux servers (real and cloud) mainly CentOS, so I decided to keep me in the RedHat family. Besides my iMac, I own an Asus 1000HA netbook and an Acer Aspire 5517 laptop. First I tried Fedora 16 in the netbook, the XFCE remix, it works wonderful considering the low power of a netbook. This week I installed Fedora17 in the Acer, after a quick installation I'm using it o write this post. I have no complains with the last Gnome release, actually I like it, it's pretty. I only had a hardware problem, the wireless card, but amazingly after checking the status with dm