[1+1=2]

OneAndOneIs2

« Mobile with UbuntuTechnology: The cause of, and solution to, every problem »

Thu, Nov 13, 2008

[Icon][Icon]Fighting Fragmentation on Linux

• Post categories: Omni, FOSS, Technology, Helpful

Some time ago, I wrote what I expected to be a fairly uninteresting blog post explaining in a fairly non-technical way why it was that Linux doesn't need to be defragged the way Windows does.

It proved rather more popular than I expected (It hit Digg's front page twice before I put in anti-Digg measures to prevent my server getting melted) and still gets read hundreds of times a day.

I still keep an eye on where referrals to it come from, and go look at some of them occasionally. And I still occasionally see people who are adamant that fsck tells them that some of their files are non-contiguous (fragmented) and this is a problem and they want a solution.

So here's another blog post about file fragmentation on Linux.

Am I bovvered?
So you have some files that are fragmented. And this obviously is slowing your machine down. Right?

Wrong.

This is the first thing you need to get your head around if you're out to keep your hard drive performance as high as possible. A file being fragmented is not necessarily a cause of slowing down.

For starters, consider a movie file. Say, three hundred megabytes of file to be read. If that file is split into three and spread all over your hard drive, will it slow anything down?

No. Because your computer doesn't read the entire file before it starts playing it. This can be easily demonstrated by putting a movie onto a USB flash drive, starting playback, and then yanking the drive out.

So since your computer only reads the start of the file to start with, it matters not in the slightest that the file is fragmented: So long as the hard drive can open those 300MBs in under half an hour (and if it can't, throw it away or donate it to a museum) the fact that the file is fragmented is of no concern whatsoever.

Your computer has hundreds if not thousands of similar files. As well as multimedia files that you WANT to take several minutes to access, you have all kinds of small files whose access times are made irrelevant by the slowness of the application that reads them: Think about double-clicking a 100KB file to edit in Open Office - however long it takes to open the file, it's irrelevant considering how damn long it takes to get OOo loaded from a cold start.

You might like it
The next thing you need to bear in mind is that a file being all in one place doesn't necessarily mean that it'll get read faster than a file that's scattered around a bit.

Some people are adamant, having watched Windows defrag a FAT partition, that all the files should be crammed together at the start of the disk, unfragmented. This cuts down on the slowest part of the hard drive reading process, the moving of the head.

Except it doesn't.

Everything crammed together makes sense in certain applications. A floppy disk or read-only CD/DVD for example. Places where one file is being read from a single disk, and the files being crammed tightly together isn't going to guarantee that a single file edit will instantly re-fragment things.

However, this is the 21st century. Your hard disk is not a hard disk, it's a hard drive with multiple discs (AKA platters) inside it, and the times when you would only be reading or writing one file at a long time are long, long gone.

It is perfectly feasibly to think that in one single instant, my PC might be:

  • Updating the system log
  • Updating one or more IM chat logs
  • Reading an MP3/Ogg/Movie file
  • Downloading email from a server
  • Updating the web browser cache
  • Updating the file system's journal
  • Doing lots of other stuff

All of this could quite feasibly happen at the same time: Probably happens a hundred times a day, in fact. And every single one of these requires a file to be accessed on the hard drive.

Now, your hard drive can only access one file at a time. So it does clever things, holding writes in the memory for a while, reading files in the order they are on the drive rather than the order they were requested, etc. etc.

So the chances that your hard drive has nothing to do other than try to read a fragmented file are really pretty low. It's fitting that one file into a queue of file reads and writes that it's busy with.

Imagine this scenario: Your computer wants to read three files, A, B, and C. Here's a disk where they're non-fragmented:

   01       02       03       04       05       06
abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh
000AAAAA A0000000 0BBBBBB0 00000000 00CCCCCC 00000000

And here's one where they ARE fragmented:

   01       02       03       04       05       06
abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh
000AA000 0BBB00CC 0AA00BBB 000CCC00 AA00000C 00000000

Assuming your multi-tasking hard drive wants to read all three of these, which will be quicker to complete the job?

Answer: It makes no difference because the head still needs to go from '01 d' to '05 h' in one go, whether the files are fragmented or not.

In fact, the fragmented files might well be faster: The drive only has to read the first two blocks to get the first portions of each file. That might be enough that the applications accessing these files can begin their work with all three files at this point, whereas the non-fragmented version would only be working with one file at this point.

In this (highly simplified as usual) example, you gain a performance increase by scattering your files around the drive. Fragmentation is not necessarily a performance-killer.

But even so...
Okay, so even Linux's clever filesystems can't always keep you completely clear of performance-degrading fragmentation. The average user won't suffer from it, but certain types of file usage - particularly heavy P2P usage - can result in files scattered all over your drive.

How to keep this from causing problems?

Carve up your hard drive!

Logically speaking, that is: Partitions are your friend!

Being simplistic again, the main cause of fragmented files is large files that get written to a lot. The worst offenders are P2P-downloaded files, as these get downloaded in huge numbers of individual chunks. But documents that are frequently edited - word processing, spreadsheets, image files - can all start out small and get big and problematic.

So, the first and simplest thing to do: Have a seperate /home partition.

System files mostly just sit there being read. You don't make frequent updates to them: Your package manager or installation disk write them to disk, and they remain unchanged until the next upgrade. You want to keep these nice tidy, system-critical files away from your messy, frequently-written-to personal files.

Your system will not slow down due to large numbers of fragmented files if none of the system files are fragmented: A roomy dedicated root partition will ensure this.

But if your /home partition gets badly organised, then it could still slow you down: A pristine Firefox could still be slowed down by having to try and read a hideously-scattered user profile. So safeguard your /home as much as possible too: Create another partition for fragmentation-prone files to be placed in. P2P files, 'living' documents, images you're going to edit, dump them all in here.

This needn't be a significant hardship: You can have this partition mounted within your home directory if you like. So long as it keeps your own config files and the like away from the fragmentation-prone files, it'll help.

Backups
So partitioning can cut down on the influence fragmented files can have. But it doesn't actually stop the files being fragmented, does it?

These days, hard drives are cheap. Certainly they cost less than losing all your data. It makes a lot of sense to buy a second hard drive to backup your files to: Far quicker than burning files to DVDs, and more space to write to as well.

In fact, you've got so much space, you could even set up a script to do this:

  1. Backup the contents of your fragmentation-prone partition
  2. Verify that the files have been properly backed up (MD5 or whatever)
  3. Erase the original, heavily-fragmented files
  4. Copy the files from your backup disk to the original partition

As simple as that, you have your fragmented files both backed up and defragmented. And it's actually quicker and better to defrag like this: Writing all your files in one go to a blank partition is far quicker than having to shuffle bits of them all over the place trying to fit them all around each other; and you're not cramming them all together in one place like Windows does, so they have "room to grow" in future, again making them less prone to fragmenting - you're working with your filesystems' in-built algorithms, instead of against them.

A sensible partitioning strategy and occasional backup-defrags will keep your data secure and structured far better than one big partition with everything haphazardly dumped in it.

Don't look for a defrag utility to hide a poorly-thought-out hard drive arrangement. Invest some effort into organizing your data and you won't know or care if there's a defragmentation tool available.


14 comments

tian yuan
Comment from: tian yuan [Visitor]
this webpage give me much help
thanks
i have searched a long time,but no result.

i am a chinese student
in our country'website,it hard to find this kind of
article that explan so clear
04/12/08 @ 01:56
Kushal Koolwal
Comment from: Kushal Koolwal [Visitor] · http://blogs.koolwal.net
Hi,

Got your point. However I am still curious to see if there is any defrag tool available for ext3 File Systems.
26/01/09 @ 23:13
Mauricio Farell
Comment from: Mauricio Farell [Visitor] Email · http://www.cyberplaza.net
Check this out:

http://en.wikipedia.org/wiki/Ext3#Defragmentation
07/02/09 @ 01:15
Tom
Comment from: Tom [Visitor] Email
Wow!! Fantastic, that validates my feelings about this issue and explains it clearly.

Copying files to another location and deleting the original seemed a good way of improving performance on Win98 "back in the day". But explaining it to other people when i didn't understand 'the why' myself was tricky.

Now the only mystery to me is why does Windows suffer so badly when files are fragmented. This article suggests that you could expect things to remain much the same.

I have noticed that caching read/writes to Ram and Swap and indeed all the Ram usage is much more efficient in linux's (linices(?) or is linux one of those words that has it's plural in-built - eg like sheep, err, not a good example lol). Umm i've forgotten what i was asking now :( lol

Good luck and regards from
Tom :)

PS Thanks for a brilliantly clear article that 'laid some old skeletons to rest' for me :)
16/04/09 @ 11:59
freebirth one
Comment from: freebirth one [Visitor]
Nice One. Pushed my knowledge a bit closer to omniscience (But to quote a genius man grom Germany: Imagination is more important than knowledge :) ).

@Tom:
There are a couple of factors:
* fragmented Pagefile - because of this I early started having my pagefile on its own Partiton
* fragmented Registry-Hive - and yes, thats a really important factor. The config-Files of Linux are many in count, often changed and more or less often resized, but seldom fragmented. The registry-files are few in count, often changed, often fragmented, and very often containing gaps in it. This can slow down your system really nasty, especially this files, which you even can't defrag with tools (the ntuser.dat for giving it a name). THerefor the must-have pageDefrag
* and last but not least: normal defragmentation of system files, which normally habbens when you install or upgrade something and have disturbing chunks of other Files. Therefor the partition into System, Data, and perhaps other stuff.

So far my 2 Pence
06/07/09 @ 00:41
EatMe
Comment from: EatMe [Visitor] · http://paashaas.da.ru
Very clear and understandable article.

Well done.

Read it with pleasure.
25/07/09 @ 17:09
TheAsterisk!
Comment from: TheAsterisk! [Visitor]
@Tom, freebirth one: I simply prefer to set the pagefile (or swap file, or whatever else it goes by) in Windows to the same minimum and maximum size. It can't really pull anything over on you if it doesn't try to increase its size without warning that way. I've no simple ideas for the registry, though.

Nice article. This was one of those things where, after reading, I felt silly for not already thinking through to a similar end.
16/08/09 @ 16:56
Jennifer
Comment from: Jennifer [Visitor]
Thank you so much. I do performance testing for a software company, and our testing is suffering a performance 'variability' on EL5 due to the state of the disk (we think its fragmentation, but it might be location of the data on the drive).

I'm hoping that the partition solution will help, regardless of the root cause. Your site is by far the most helpful thing I've found.

Thanks again.
21/08/09 @ 22:09
Bob l'Úponge
Comment from: Bob l'éponge [Visitor]
Thank you for your article ! That's well done !
Give me a better vision over fragmentation on Linux Systems.
21/10/09 @ 11:51
Tom
Comment from: Tom [Visitor] Email
Great article, but I'm wondering, given your emphasis on partitioning: is not that one of the few times when one should actually do defrag on a linux FS? I.e., given the scatter strategy, shouldn't one defrag before repartitioning?
27/02/10 @ 17:15
Anon Idiot
Comment from: Anon Idiot [Visitor]
I have only one thing to say... I'm bored.

I also have to say that harddrives don't work in the manner you're acting like they do. (Nor computers for that matter).

Computers do NOT multi-task, they share processing time. As such, you're only doing one task at a given time (multi-core processors don't actually bypass this, they just use more hardware at a given time), and in that regards it becomes obvious that the head will have to make 3 passes with your fragmented drive rather than 1 pass non-fragmented.

Let's say you're right, however. Processor sends the addresses locations for each file, now your fragmented file will make any shorthand impossible (such as saying start at address 01d end 02a), thus more data will have to be transmitted.

That aside, the harddrive has done a single sweep, loaded it all into the buffer... but what now? It has to organize what it just read. So lets say that there is an onboard CPU that calculates the size of the data being read, allocates memory from its onboard ram unit, and dynamically sorts the data as it is being read on one pass into their respective addresses.


Now there is no reason why a harddrive CAN'T do all of this, but the question you should be asking is... why bother? The Kernel is the one who is suppose to be handing what files the harddrive reads and writes and at what time, trying to simultaneously read 3 files at once is absurdly overtaxing, better to break those files into smaller chunks and prioritize harddrive usage time.

It would, in fact, be harder for the harddrive to do this because then, instead of being fed deadlines, it'd have to send notifications on which file was done... and you'd start having memory fragmentation after a while (the on board ram)... so much easier having a stack instead of a heap.

Staying synchronized would mean files would be returned in the order they were requested... but if it's not then files would be returned when they're requested again, in the mean time here are more files to read.

If you get to the point where everything is synchronized, onboard (harddrive) cpu does all these calculations to successfully multi-task... you're still only going to have advantages with fragmented harddrives over non-fragmented harddrives.

Non-Multi-tasking harddrive simply reads the requested addresses, puts them into the buffer (stack, not heap), and returns those values as per request.


And, as usual, you blatantly ignore directory fragmentation (that is why you have the "shuffling"). It causes similar effects to file fragmentation, hence it is similarly important.



Final Note:
Only an idiot would say "Technology has advanced so much that we don't need to optimize our code." Don't make the same implications about hardware. If you don't have a video-buffer while reading the video, then that fragment would be visible (you'd see the break as the heads aligned to continue). Your implication is that BECAUSE of that buffer, there is no reason to OPTIMIZE your read speed.

The speed bonus from defragmenting a harddrive may seem small, it might even be inconsequential... but you'll never gain any speed bonus by intentionally fragmenting your harddrive.

Nor will you gain any support for linux if you can't even mantain your own opinions (By which, I mean saying that Linux is better cause it doesn't get fragmented, then turning around and saying "well, linux can get fragmented but fragmenting is good"... FAT has wonderful fragmentation ability right? Why stay with ext2?)
18/12/10 @ 08:38
Hanto
Comment from: Hanto [Visitor]
@Anon Idiot, I agree with you completely, but of course I've only been working with hard drives for over 20 years, military and commericial, plus electronics in general for over 40 years.
Partitioning for example reduces the need for defragmentation if properly configured, however in no fashion should it be considered as anything other then a technique for separating some types of files from other types of files, and partitioning can be rather good at preventing some types of files from being corrupted by virii, trojans, and the like due to the simple fact that most virii/trojans go after the system partition which in single partition drive is the whole drive, i.e. ALL platters.
Only a noob doesn't partition with as many partitions as needed to protect their data, and they all should be demanding that the O/S place personal data inside the folder/directory where it is needed, NOT ON THE SYSTEM PARTITION. Only the O/S in question should be placed on the sytem partition, all second and third party software should be placed on partitions other then the system partition. If programmers actually did this, almost no personal data would ever be lost except in extreme circumstances.
Thank you Anon Idiot for stating so eloquently the obvious.
24/02/11 @ 10:17
Perkins
Comment from: Perkins [Visitor] Email · http://alestan.publicvm.com
Eric Raymond did an excellent article some time ago on building the ideal linux computer and one on builing a linux computer on a budget. You can find the article burried in his site, catb.org/~esr, but I only want to bring up one point he made. Lots of people think getting a computer with a faster CPU and more memory is the key to getting a fast and responsive system, but it is often not the case. Depending on the type of work you do with your computer, your hardware needs will vary some. On most windows computers I've profiled, memory is the primary limited resource, with CPU or disk access time coming in second. On linux, especially with the recent wave of energy friendly low RPM drives, it is almost always disk access time. Splitting data onto multiple partitions to avoid fragmentation can help reduce this a lot, but if you can split it between disk drives, it works even better. Especially getting the swap partition onto a drive separate from the one with your frequently accessed files. But rather than guessing at why your computer is running slowly, grab atop or a similar program (unix only), and see what your resource useage is. Atop lets you see what programs use how much memory and cpu and which ones are reading to and writing from what filesystems.
19/10/11 @ 01:31
Dinko
Comment from: Dinko [Visitor]
I got to this blog by accident; nevertheless, I read comments and decided to share my point of view (and some technical background plus real life examples).
I must say that I find amusing how Linux fans feel for Linux as the best thing that happen to the mankind after discovering fire. So, let’s start.
Why is fragmentation bad? It’s because hard drive head arm has to travel point a to the point b to the point c to collect all the segments of fragmented file. What Linux OS does is scattering new created files over the hard drive – ideally cutting in half remaining empty space. Now what? We have no or minimal head arm movement, depending on the file size, during single file access, but we have a long travel to reach next file to be processed. While Linux fans can say “Linux is not fragmenting files” or, more accurate, “Fragmentation is not a problem in Linux” advantage of this approach is questionable and it depends on what exactly is going on – DOS/Windows files kept together save in overall seek time, file to file. Seek on the hard drive to reach data is that performance killer.
Now, the fact is that Linux (with associated file systems) can and does fragment files. I’d say that nobody dares to bite that apple and we don’t have defragmenter for Linux file systems. Result: “Linux have no problem with fragmentation”. I’ll give you example on the system that I have built and am maintaining. Project is proxy caching server, serving, on average, 1500 users a day. Hard drive size is 135GB and 100GB is allocated to cache files (1GB is swap area size; 6GB is held as “reserved” space). Number of objects in cache varies between 2.5 and 4 millions. Proxy server produces up to 1GB size in daily log, and so is the syslog. Logs are compressed to zip weekly and offloaded monthly. Access and munin statistic is kept for a year. Now, do your math and calculate how badly the file fragmentation is, in this case.
28/10/11 @ 14:14
 

[Links][icon] My links

[Icon][Icon]About Me

[Icon][Icon]About this blog

[Icon][Icon]My /. profile

[Icon][Icon]My Wishlist

[Icon]MyCommerce

[FSF Associate Member]


February 2017
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28          

Search

User tools

XML Feeds

eXTReMe Tracker

Valid XHTML 1.0 Transitional

Valid CSS!

[Valid RSS feed]

multiblog platform