« DarkAdding complexity to reduce complexity »

Thu, Mar 22, 2012

[Icon][Icon]Begin with a single step

• Post categories: Omni, FOSS, Technology, My Life, Programming

A truism regularly trotted out by perl users is that one of perl's biggest strengths is its ability to process text.

This was put to the test recently, following an issue at work which lead to us needing to get a lot of data out of our logs in a hurry.

Specifically, we needed to know at any given time how many web requests we were getting, what type (web page, XML request, etc), what they were for, who they were for, who they were from, etc. And we needed all this graphed so we could quickly & reliably cross-reference what was happening on one server with what was happening on others.

This had to come from a variety of log files, with anything up to 10 million lines in them - hence why we had to make the data visual & automate it: Just reading the logs simply wasn't an option.

This task landed on my desk, with the sum total of guidance on it being "I've found Chart::Gnuplot useful before"

So I had to do two things: First; turn the vast amount of logged data into a useful summary. Second; turn the summarised data into a selection of useful graphs.

The first part was pure text processing; the second was trying to quickly learn how to use a new module.

And I have to say, I was completely blown away with just how easy it was to get this task done. A couple of hours after I started, I had a useful graph. An hour or so more, and I had a multitude of useful graphs with a variety of breakdowns of the data, generated by a script that could take several helpful command-line options.

I've been using perl for two years now, on a daily basis at work & spending quite a lot of my free time on it as well - I'm a fair way through the Camel book, the Llama book, and HOP, and a reasonable way into OOP as well. I'm just about what The Definitive Guide to Catalyst would call an intermediate user (another book I'm trying to read in what little spare time I have). I still tend to feel like I'm barely even a beginner: There's so many things I'm aware I don't yet know.

After all, perl has been in active development by a multitude of very clever people all around the world for quarter of a century. It'd be surprising if it DIDN'T have a lot to learn by now.

But it's amazing just how much you can do with just the core fundamentals, quickly & easily. Perl was created for text processing, they say, but this is the first time I've really appreciated just how true it is. To be able to take millions of lines of text; use a single regex to break the useful data out of each line into discreet variables; use a foreach nested within a foreach to create a hash of hashes of hashes to store that data in; and then pass that hash to a subroutine that turns it into an array of objects with all the right names and descriptions in all the right places..

Sadly, hardly anyone I know would be capable of understanding why I was so pleased at how elegantly simple perl made it to handle the data I needed it to. But that's what blogs are for :)

And the documentation for the module I suddenly needed to learn, along with its included examples, made it incredibly easy to work out how to take the data I had and turn it into the pretty pictures I needed.

(Admittedly, it was a familiarity with sshfs made getting those images from a remote server & viewing it locally with gimp trivial. But it was a Perl conference where I first heard of sshfs, so..)

It's taken two years to stop feeling like a perl novice. I dread to think how long it'll take before I feel like a competent, advanced user. But I was genuinely amazed at how fast & easy it was for me to turn huge text files into useful sets of graphical data; using a module I'd only just discovered the existence of.

It can occasionally be a little depressing to look at how much you've learned, look at how much there still is to learn, and feel disheartened that the former is so much smaller than the latter. Especially when you notice that even after you've learned something, actually putting it into practice is so hard you're using less than half what you know, which is less than half of what there IS to know.

So it's nice when you have the other occasions, where what you know (and can put quickly into practice) is more than enough to put together something powerful, reliable, and sophisticated enough that it seems even to you to be a little bit magical.

It reminds you that although it might take you a few years to feel like you really know what you're doing, it took hundreds of people a few decades to put together what you're using.

So stick with it, because if something that's so clever you can't comprehend it today becomes something you understand completely tomorrow, just think how much smarter that'll mean you are by then.


Jakub Narębski
Comment from: Jakub Narębski [Visitor]
What Perl modules did you use?
23/03/12 @ 11:25
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
Chart::Gnuplot and Getopt::Long
23/03/12 @ 12:56
Mike Schienle
Comment from: Mike Schienle [Visitor]
Similar story to how I started using Perl on the job the first time in 1998. Kinko's needed some capacity planning info based on their web logs. The shell script I wrote for processing logs would take three days to complete. 1/2 day later with a little help from a colleague, I was down to a 3 hour run. Perl impressed me then and every day since.
23/03/12 @ 13:53
Comment from: Hari [Member] · http://harishankar.org/blog/
Python is way easier than Perl in some ways, and I think the learning curve is about 1/3rd the duration, though Python code doesn't run as fast as Perl does, especially with heavy text crunching.

However, I have the same things to say about Python that you do about Perl: simple, elegant problem solving and amazing for creating complex data structures with minimum effort.

I would say Python and Perl are about equal, all said and done, with each having different type of strengths. By now, Python is as about as established as Perl is.
23/03/12 @ 15:39
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
@Mike - You parsed that volume of text with a shell script? Respect! :)

@Hari - It's important to emphasize the difference between how long it takes to learn perl, and how long it takes to learn enough perl. I'm planning a talk for the non-programmers of my local LUG, and I reckon I can get them up & running with 'core' perl in an hour or so.

But learning perl well enough to be one of those people who thinks nothing of creating a web framework, or a metaobject protocol, or a git interface.. that does take a while. In my own case, it'll take a year or two more yet :)

If it took that long because it was arcane, inconsistent, unintuitive, etc. then I'd have lost interest a long time ago. But it isn't - it takes that long because it's got quarter of a century of good ideas crammed into it, and no matter how smart you are it takes a while to get up to speed with that many concepts.

At some point I really must dust off my Python books and take another look at them. Maybe after I get through my current backlog of perl & haskell books...

My favourite observation on the "perl or python" subject has to be that python's philosophy is that there should be one right way to do something, which means the typical python coder believes the right way to write code is to do it all in python; whereas the perl motto is "there is more than one way to do it", so a perl programmer will readily concede that a good way to write a specific program might well be to write it in python :)
24/03/12 @ 19:30
Ron Collinson
Comment from: Ron Collinson [Visitor] Email
It can occasionally be a little depressing to look at how much you've learned, look at how much there still is to learn, and feel disheartened that the former is so much smaller than the latter.

I find it quite the opposite, if I thought I know everything about something, I would feel its time to move on.

Out of curiosity which OOP perl book are you looking at.
25/03/12 @ 23:44
Comment from: Hari [Member] · http://harishankar.org/blog/

My favourite observation on the "perl or python" subject has to be that python's philosophy is that there should be one right way to do something, which means the typical python coder believes the right way to write code is to do it all in python;

I am not sure that is the Python philosophy. Actually what it means is that in Python, there is one obvious way to solve a problem.

And moreover, this philosophy applies only to the language constructs and syntax, not to the algorithms. Obviously programmers can implement different algorithms to solve the same problem even in Python.
26/03/12 @ 03:09
Comment from: doubi [Visitor] Email
Hi oneandoneis2,

Thanks for the inspiring post (which I found through szabgab's Perl Weekly newsletter) :-)

About the talk you plan to give at your local LUG - any chance you could record it?

For a while now I've been trying to get around to recording something for Hacker Public Radio along the same lines, but since it sounds like you might beat me to it, why duplicate the effort? :-)

30/03/12 @ 00:05

[Links][icon] My links

[Icon][Icon]About Me

[Icon][Icon]About this blog

[Icon][Icon]My /. profile

[Icon][Icon]My Wishlist


[FSF Associate Member]

March 2017
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    


User tools

XML Feeds

eXTReMe Tracker

Valid XHTML 1.0 Transitional

Valid CSS!

[Valid RSS feed]

blogging soft