Comparison With Other Formats

The Scientific Data Format Information FAQ provides a somewhat dated description of other access interfaces and formats for scientific data, including CDF and HDF. A brief comparison of CDF, netCDF, and HDF is described below. Another comparison is in Jan Heijmans’s An Introduction to Distributed Visualization. John May’s book Parallel I/O for High Performance Computing includes a chapter on Scientific Data Libraries that describes netCDF and HDF5, with example source code for reading and writing files using both interfaces.

What are the differences between CDF and netCDF, and CDF and HDF? (from FAQ #7)

The differences between the following formats are based on a high level overview of each. The information was obtained from various publicly available documentation and articles. To the best of our knowledge, the information below is accurate, although the accuracy may deviate from time to time depending on the timing of new releases. The best and most complete way to evaluate what package best fulfills your requirements is to acquire a copy of the documentation and software from each institution and examine them thoroughly.

CDF vs. netCDF

CDF was designed and developed in 1985 by the National Space Science Data Center (NSSDC) at NASA/GSFC. CDF was originally written in FORTRAN and only available on the VAX/VMS environments. NetCDF was developed a few years later at Unidata, part of the University Corporation for Atmospheric Research (UCAR). The netCDF model was based on that of the CDF conceptual model but provided a number of additional features (such as C language bindings, portable to a number of platforms, machine-independent data format, etc.). Today both models and existing software have matured substantially since and are quite similar in most respects, although they do differ in the following ways:
- Although the interfaces do provide the same basic functionality they do differ syntactically. (See users guides for details.)
- NetCDF supports named dimensions (i.e., TEMP[x, y, …]) whereas CDF utilizes the traditional logical (i.e., TEMP[true, true, …]) method of indicating dimensionality.
- CDF supports both multi- and single file filing systems whereas netCDF supports only single file filing systems.
- CDF software can transparently access data files in any encoding currently supported by the CDF library (For example: a CDF application running on a Sun can read and write data encoded in a VAX format.) in addition to the machine-independent (XDR) encoding. netCDF-3 software reads and writes data in only the XDR data encoding, but netCDF-4 supports native encodings by default, using a “reader makes right” approach for portability.
- The CDF library supports an internal caching algorithm in which the user can make modifications (if so desired) to tweak performance.
- The netCDF data object is currently accessible via the HDF software; CDF is not.
- As part of the CDF distribution, there exist a number of easy-to-use tools and utilities that enable the user to edit, browse, list, prototype, subset, export to ASCII, compare, etc. the contents of CDF data files.
CDF vs. HDF4

CDF is a scientific data management software package and format based on a multidimensional (array) model. HDF is a Hierarchical Data Format developed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. The HDF4 data model is based on the hierarchical relationship and dependencies among data. Although the two models differ (in many ways like comparing apples to oranges) significantly in their level of abstraction and the way in which their inherent structures are defined and accessed, there exists a large overlap in the types of scientific data that each can support. Some of the obvious differences are as follows:
- The HDF4 structure is based on a tagged format, storing tag identifiers (i.e., utility, raster image, scientific dataset, and Vgroup/Vdata tags) for each inherent data object. The basic structure of HDF consists of an index with the tags of the objects in the file, pointers to the data associated with the tags, and the data themselves. The CDF structure is based on variable definitions (name, data type, number of dimensions, sizes, etc.) where a collection of data elements is defined in terms of a variable. The structure of CDF allows one to define an unlimited number of variables completely independent (loosely coupled) of one another and disparate in nature, a group of variables that illustrate a strong dependency (tightly coupled) on one another or both simultaneously. In addition CDF supports extensive meta-data capabilities (called attributes), which enable the user to define further the contents of a CDF file.
- HDF4 supports a set of interface routines for each supported object (Raster Image, Pallets, Scientific Datasets, Annotation, Vset, and Vgroup) type. CDF supports two interfaces from which a CDF file can be accessed: the Internal Interface and the Standard Interface. The Internal Interface is very robust and consists of one variable argument subroutine call that enables a user to utilize all the functionality supported via CDF software. The Standard Interface is built on top of the Internal Interface and consists of 23 subroutine calls with a fixed argument list. The Standard Interface provides a mechanism in which novice programmers can quickly and easily create a CDF data file.
- HDF4 currently offers some compression for storing certain types of data objects, such as images. CDF supports compression of any data type with a choice of run-length encoding, Huffman, adaptive Huffman, and Gnu’s ZIP algorithms.
- CDF supports an internal cache in which the user can modify the size through the Internal Interface to enhance performance on specific machines.
- HDF4 data files are difficult to update. Data records are physically stored in a contiguous fashion. Therefore, if a data record needs to be extended it usually means that the entire file has to be rewritten. CDF maintains an internal directory of pointers for all the variables in a CDF file and does not require all the data elements for a given variable to be contiguous. Therefore, existing variables can be extended, modified, and deleted, and new variables added to the existing file.
- In the late 1980’s the CDF software was redesigned and rewritten (CDF 2.0) in C. With little or no impact on performance, the redesign provided for an open framework that could be easily extended to incorporate new functionality and features when needed. CDF is currently at Version 3.0, and performance has been enhanced significantly.
- CDF supports both host encoding and the machine-independent (XDR) encoding. In addition, the CDF software can transparently access data files in any encoding currently supported by the CDF library (For example, a CDF application running on a Sun can read and write data encoded in a VAX format.) HDF4 supports both host encoding and the machine-independent (XDR) encoding.
CDF vs. HDF5

CDF is a scientific data management software package and format based on a multidimensional (array) model. HDF is a Hierarchical Data Format developed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. The data model of CDF is very similar to HDF5’s data model. They both have 2 basic objects: data and attribute. Data is an entity that represents data while attribute is a mechanism used to denote data. HDF5 allows to group similar objects into a group, but CDF doesn’t have a grouping mechanism.

Albeit HDF4 and HDF5 are developed by the same organization, the data model of HDF5 is totally different from HDF4 and their formats are incompatible.

What is the connection between netCDF and CDF?

CDF was developed at the NASA Space Science Data Center at Goddard, and is freely available. It was originally a VMS FORTRAN interface for scientific data access. Unidata reimplemented the library from scratch to use XDR for a machine-independent representation, designed the CDL (network Common Data form Language) text representation for netCDF data, and added aggregate data access, a single-file implementation, named dimensions, and variable-specific attributes.

NetCDF and CDF have evolved independently. CDF now supports many of the same features as netCDF (aggregate data access, XDR representation, single-file representation, variable-specific attributes), but some differences remain (netCDF doesn’t support native-mode representation, CDF doesn’t support named dimensions). There is no compatibility between data in CDF and netCDF form, but NASA makes available some translators between various scientific data formats. For a more detailed description of differences between CDF and netCDF, see the CDF FAQ.