2.1.4 Caching Scheme

Next: 2.2 CDFs Up: 2.1 CDF Library Previous: Open CDFs

2.1.4 Caching Scheme

The CDF library reads and writes to open files in 512-byte blocks. A cache of 512-byte memory buffers is maintained by the CDF library for each open file. The CDF library attempts to keep in the cache the set of file blocks currently being accessed. This results in fewer actual I/O operations to the file if repeated accesses to these blocks would occur. When the cache is completely full and a new block of the file is accessed, one of the cache buffers is written back to the file (if it was modified) and the new block is read into that cache buffer (unless the file is being extended in which case the cache buffer is simply cleared). This process is known as paging. By optimizing the number of cache buffers for a file, improved performance can be achieved. There is tradeoff between having too few cache buffers and having too many. Having too few cache buffers will cause excessive paging while having too many cache buffers may slow performance because of the overhead involved in maintaining the cache.

The CDF library attempts to choose optimal default cache sizes based on a CDF's format and number of variables. This is difficult since the CDF library does not know how an application will access a CDF. For that reason an application may specify, via the Internal Interface, the number of cache buffers to be used for a file. The number of cache buffers may be changed as many times as necessary while a file is open (the first time will override the default used by the CDF library). Default cache sizes may be configured for your CDF distribution when it is built and installed. Consult your system manager for the values of these defaults.

The situations in which it will be necessary to specify a cache size will depend on how a CDF is accessed. For example, consider a variable in a multi-file, row-major CDF having a dimensionality of 2:[10,64], a data specification of CDF_REAL8/1, and variances of T/TT. This variable definition results in each record of the variable being written across 10 file blocks with the second dimension varying the fastest (since the CDF's variable majority is row-major). If single value reads were used to access this variable (see Section 2.3.13), only one cache buffer would be necessary if the second dimension were incremented the fastest (i.e., [1,1], [1,2],... [10,63], [10,64]). This is because the values of a record would be accessed sequentially from the first block to the last block. If, however, the first dimension were incremented the fastest (i.e., [1,1], [2,1],... [9,64], [10,64]), 10 cache buffers would improve performance. The values of a record are not being accessed sequentially but rather each read would be from a different block. Since the reads would be spread access 10 blocks, having 10 cache buffers would be optimal.

A similar situation arises when accessing variables in a single-file CDF. If values are accessed for each variable at a record number, then performance will be improved by setting the number of cache buffers to be greater than the number of variables. This is because the variable values will most likely be located in that many different file blocks for a particular record number.

Cache sizes are specified using the <SELECT_,CDF_CACHESIZE_> , <SELECT_,rVAR_CACHESIZE_> , <SELECT_,rVARs_CACHESIZE_> , <SELECT_,zVAR_CACHESIZE_> , and <SELECT_,zVARs_CACHESIZE_> , operations of the Internal Interface.

NOTE: The default cache sizes used by the CDF library are fairly conservative in order to minimize the problems that can arise due to memory limitations (especially on computers having limited memory such as the IBM PC and Macintosh). If the performance of your application is critical, it is very important to experiment with using larger cache sizes. Significant gains in performance can be achieved with the proper cache size. It is also important to allocate records if you know how many are to be written. This will reduce the fragmentation that can occur in a single-file CDF (which degrades performance because of the increased indexing that occurs for a variable). Allocating records is described in Section 2.3.8.

Next: 2.2 CDFs Up: 2.1 CDF Library Previous: Open CDFs

cdfsupport@nssdca.gsfc.nasa.gov