Tuesday, June 24, 2014

GSoC Progress Report #2: Bucketed Global Cache

After my trip to France I'm back working on the Google Summer of Code for darcs.

My goals this week are to finish my patches for bucketed cache and garbage collection of the global cache.

Bucketed Cache

Some programs have problems with directories with a lot of files, for example the ls command in linux. In order to reduce the number of files in a folder in the global cache (which can grow considerably), we decided to implement a bucketed global cache.

The patch for the bucketed cache is this: http://bugs.darcs.net/patch1162.
This patch transforms the global cache into a bucketed global cache.
That is, instead of having all the cache files in a single folder, divided them into sub-folders according to the file name.
 
For example, using the first two digits of the hash, we put the patch named:

0000008516-0048abbb8a2b11870fe24fef48bc2ebb49cbd818a633b8250dc2023e4f6267c9

in the sub-folder /00/, i.e., in ~/.cache/darcs/patches/00/. The same with the inventories and pristine files.

However there are several ways to implement this patch:
  1. We can forget the old cache (which is located in ~/.darcs/cache/).  
    • The advantage of this approach is that the code is more cleaner and faster (not sure if significantly faster), because we don't need to look for patches at multiple locations.
    • The disadvantage is that the user will start with an empty cache again.
      Although we can create a command that is responsible for moving the old cache files to the new format.
  2. The other way is to read always both caches. The code of this version a little more complicated because it need to read in several locations. But the user cache remains intact.
Different versions of darcs use different global caches:
  • 2.9.9 (+45 patches) (last development version): this version can read the caches in ~/.darcs/cache/ and ~/.cache/darcs/ (new cache). In addition, each time the old cache is used, the files are linked with the new cache. However this new cache is not bucketed.
  • 2.8.4 (the latest stable release): used by end users, only uses the cache in ~/.darcs/cache/.

The patch in http://bugs.darcs.net/msg17565 change the code of the version 2.9.9 so that, when the old cache is used, darcs link the files read to the new bucketed cache in ~/.cache/darcs/. Also it provides a new command (darcs optimize cache) responsible for moving the files from old caches to the new bucketed cache.

Test were performed to see if the bucketed version of the cache improved the performance in darcs commands like clone, but no significant difference was found.

Global Garbage Collection

This patch should grant mechanisms to the user in order to reduce the size of the global cache according to its needs.

By now this patch only count the number of hard links of the files in the global cache. If a file has only one link, it's deleted.

This patch hasn't been sent to screened yet.