Creating CDF Files

Common Data Format (CDF) is a conceptual data abstraction for storing, manipulating, and accessing multidimensional datasets. CDF is referred to as a data abstraction because the actual physical format in which datasets are stored is not discussed. Instead, the form of the datasets and the means (interface) by which they may be manipulated are described. The main CDF library software installer (or compilation/installation from the command line) will place various files such as executables, test programs, batch files, help information and documentation into a directory of choice, or platform-specific application directories. The CDF home page contains examples, documentation, FAQs, and tools. The CDF User Guide and CDF Language Reference Manuals can be found here.

CDFs can be created in many programming languages on many hardware platforms, e.g. Python, C, Fortran, Perl, Java, IDL, and MATLAB. Each language CDF library provides the routines to read attributes and variables in a variety of ways, see this page for information on all of the CDF language guides.

CDAWeb can also create custom CDFs from user-selected variables and time ranges if the data of interest are in the CDAWeb system (in addition to plotting and listing).

How do I make a CDF by writing custom software?

To write custom code, the process of creating a CDF dataset is similar to the process of building a house in several ways. Both require some planning before building commences; the better the plan, the better the final product. Both houses and CDFs have global features as well as detailed specific features that can dictate the design. For the ISTP CDF designer, the data itself is always crucial to the design of the CDF. The data and the guidelines together dictate the design. When the planning stage is complete, the architect will have a blueprint of the house, the CDF designer will have a skeleton table. Before the (CDF) building stage can begin, a designer must have a machine-readable dataset ready to put into the form of a CDF dataset. The building stage involves writing software in one of the CDF supported languanges, which is discussed in detail in the CDF User Guide or Reference Manuals. This step can be skipped by using the program makeCDF. When the data is put into the CDF (by running the generation software), the CDF dataset is complete and can be viewed as a list of numbers (e.g., using CDFlist), transferred to another location, manipulated, or visualized via plotting software (e.g., using CDAWeb).

What libraries will write CDFs?

All of the CDF libraries provided by SPDF allow developers/dataset creators to write data to a CDF file. Some other CDF libraries also will create CDFs, while other libraries only read CDFs.

What metadata should I use?

There are two sets of metadata. The set of attributes describing the CDF in global terms are global attributes. Some of these attributes under ISTP includes “Project”, “Discipline”, “Descriptor”, “PI_Name”, etc. Variable attributes are specifically for variables. Use of the SKTEditor tool to create a CDF will ensure the required metadata is included and validated in an ISTP-compliant CDF. Further information about metadata can be found in the ISTP Metadata Guidelines.

When describing datasets, users should use standard common terminology (metadata). The first is a series of records comprising a collection of variables consisting of scalars, vectors, and n-dimensional arrays. The second is a set of attribute entries (metadata) describing the CDF in global terms or specifically for a single variable. This dual function of CDF is what provides its “dataset independence.” Both the metadata (attributes) and the data objects (variables) are combined into an integrated dataset.

What are best practices for laying out a CDF and naming the variables and filenames? (ISTP and non-ISTP practices)

Datasets should be thought of as a time-contguous collection, with the same variables and structure such as the same array dimensions across all files. Use the maximum dimension size for the whole mission rather than changing dimension sizes file by file (and compress fill values). Any slowly varying information should be put into time-varying variables, rather than into global attributes or static variables even though they don’t change within a specific file. Slowly varying variables can point to their own time variables, or can use a CDF feature of “previous sparse” variables that are stored only when changed.

The CDF time variables, particularly CDF_TIME_TT2000, and supported conversion routines are preferred over Unix time and other time schemes. Sometimes it is useful to also include specific time variables such as spacecraft elapsed time in addition. CDFs can have multiple time variables, with variables pointing to their corresponding time variable with the DEPEND_0 variable attribute.

The ISTP Metadata Guidelines should added to the datasets as global and variable attributes.

Data collection naming and file naming should follow the SPDF recommended dataset and file naming practices, so that future users will be able to easily distinguish between data collections and find the data they are interested in.

Names for variables and attributes should use only alphanumeric and underscore for best compatibility with various programming languages. Attributes themselves can use UTF-8 characters, but this may cause issues in some programming languages.

Each attribute in a CDF has a unique name. Attribute names are case sensitive regardless of the operating system being used and may consist of up to CDF_ATTR_NAME_LEN or CDF_ATTR_NAME_LEN256 printable characters (including blanks). Trailing blanks, however, are ignored when the CDF library compares attribute names. “UNITS” and “UNITS” are considered to be the same name, so they cannot both exist in the same CDF. This was done because Version 1 of CDF padded attribute names on the right with blanks out to eight characters. When CDF version 1 was converted to version 2 these trailing blanks remained in the attributes names. To allow CDF Version 2 applications to read such a CDF without having to be concerned with the trailing blanks, the trailing blanks are ignored by the CDF when comparing attributes names. The trailing blanks are returned as part of the name, however, only when an attribute is inquired by an application program.

Global and variable attributes are defined before using, and are defined for the whole file.

FILLVAL in particular, plus VALIDMIN/VALIDMAX variable attributes must be defined with same variable type as the their variables in order to work well. FILLVAL provides a specific bit sequence to match the variable values to be ignored. As an aside, PAD values are similar but used inside the CDF to indicate that no value has been assigned (usually when creating a variable).

We recommend compressing all variables ecept for the time variables, and turning on checksums and file validation to ensure file consistency when transferred.

We recommend column-majority over row-majority for more intuitive use in IDL and some other languages. This affects the definitions of variable attributes for multi-dimensional variables.

Did I make a valid dataset?

You can test your CDF dataset by opening a sample CDF in the Java-based SKTeditor and includes a command-line checker. Soon in a JavaScript-based metadata editor. Some CDF-related Python libraries also have checkers for ISTP metadata compliance.

How do I make a CDF from another type of scientific data formats?

Use the Data Format Translation Tools.

How do I make CDF datasets for archiving at the Planetary Data System (PDS)?

PDS accepts CDF files with specific constraints, called CDF-A, see description at the bottom of Data Format Translation Tools

Data Systems and Analysis Tools that Produce CDFs:

CDAWeb (web based data browse system)

Data can be displayed as plots, listings, and data files in the following formats: CDF, CSV, JSON, and audio files. The user can make either a subset or superset from the original data set files by specifying a time range and variables to be put into a CDF (based on the original data set CDF layout). The interface to CDAWeb progresses through a series of pages (Home Page, Data Selector, Data Explorer), as follows:

CDAWeb HOME PAGE (known as the CDAWeb Source Selector Form)

From this form, choose a mission group (e.g., Wind, Geosynchronous spacecraft, ground-based investigations) and/or instrument type (e.g., Magnetic Field [space], Particles [space], Ground-Based HF-Radars) to get a list of datasets. One of more items needs to be selected in one column or both columns.
Mission groups and instruments are logically combined in a query such that selecting “Geotail” mission and “Magnetic Fields” instrument will only match Geotail Magnetic Field datasets as in a logical “AND” operation. Selection of multiple missions and multiple instruments becomes a logical “OR” operation; e.g., selecting Geotail and Wind missions and Electric Fields and Magnetic field instruments will match against all Geotail and Wind datasets that have either electric field or magnetic field data.
Press the “submit” button to find matching datasets.

CDAWeb DATA SELECTOR

Datasets matching your query are pre-selected in a list and may be unselected by clicking on the checkboxes.
The list includes the dataset name, source name, PI, and affiliation.
Make your selections and press “submit” button to view the variables/parameters in those datasets.

More information on CDAWeb can be found in the CDAWeb Quick Start Guide.

Autoplot Data analysis tool

Autoplot is an interactive browser for data on the web; give it a URL or the name of a file on your computer and it tries to create a sensible plot of the contents in the file. Autoplot was developed to allow quick and interactive browsing of data and metadata files that are often encountered on the web. Autoplot was developed under the NASA Heliophysics Data & Model Consortium (HDMC) for Heliophysics program in a collaborative effort among several institutions, including support or code contributions from PDS-PPI Node, RBSP-ECT, and the Radio and Plasma Wave Group at The University of Iowa.

Programming Language Examples:

Python

See Example on Creating CDF in SpacePy

The following example is code to read a CDF into an Xarray structure using the CDFlib Python library:

$ pip install xarray
$ python3

>>> # Import required module
>>> import cdflib
>>> 
>>> # Read CDF file into Xarray Dataset object and display variables and global attributes
>>> data1 = cdflib.cdf_to_xarray("soho_celias-pm_30s_20200101_v02.cdf", to_unixtime=True, fillval_to_nan=True)
>>> data1
<xarray.Dataset>
Dimensions:    (Epoch: 2112, dim0: 3)
Coordinates:
  * Epoch      (Epoch) float64 1.578e+09 1.578e+09 ... 1.578e+09 1.578e+09
Dimensions without coordinates: dim0
Data variables:
    V_p        (Epoch) float32 320.0 315.0 323.0 326.0 ... 332.0 327.0 331.0
    N_p        (Epoch) float32 5.32 5.62 4.99 4.78 6.09 ... 5.0 4.88 5.08 4.93
    Vth_p      (Epoch) float32 19.0 19.0 19.0 19.0 18.0 ... 19.0 19.0 19.0 19.0
    NS_angle   (Epoch) float32 0.5 0.9 1.4 1.8 1.2 0.4 ... -0.0 -0.0 0.3 0.3 0.2
    V_He       (Epoch) float32 321.0 317.0 324.0 328.0 ... 334.0 328.0 333.0
    CRN        (Epoch) uint16 2225 2225 2225 2225 2225 ... 2225 2225 2225 2225
    GSE_POS    (Epoch, dim0) float32 199.1 -79.5 8.0 199.1 ... 198.4 -77.0 8.5
    label_GSE  (dim0) <U7 'X (GSE)' 'Y (GSE)' 'Z (GSE)'
    HC_RANGE   (Epoch) float32 145.8 145.8 145.8 145.8 ... 145.8 145.8 145.8
    HG_LAT     (Epoch) float32 -2.9 -2.9 -2.9 -2.9 -2.9 ... -3.0 -3.0 -3.0 -3.0
    HG_LONG    (Epoch) float32 71.0 71.0 71.0 71.0 71.0 ... 57.9 57.9 57.9 57.9
Attributes: (12/22)
    Project:                     ISTP>International Solar-Terrestrial Physics
    Source_name:                 SOHO>Solar Heliospheric Observatory
    Discipline:                  Solar Physics>Heliospheric Physics
    Data_type:                   30S>30 second resolution
    Descriptor:                  CELIAS-PM>Proton Monitor
    Data_version:                2
    ...                          ...
    Instrument_type:             Plasma and Solar Wind
    LINK_TEXT:                   SOHO CELIAS-PM 30 second data available at
    LINK_TITLE:                  the SOHO Archive
    HTTP_LINK:                   https://soho.nascom.nasa.gov/data/archive.html
>>>
>>> # Access global attributes
>>> # data1.attrs['Project']
>>> data1.Project
'ISTP>International Solar-Terrestrial Physics'
>>>
>>> # Display variable and its attributes 
>>> # data1.data_vars['V_p']
>>> data1.V_p
<xarray.DataArray 'V_p' (Epoch: 2112)>
array([320., 315., 323., ..., 332., 327., 331.], dtype=float32)
Coordinates:
  * Epoch    (Epoch) float64 1.578e+09 1.578e+09 ... 1.578e+09 1.578e+09
Attributes: (12/14)
    CATDESC:        Proton speed, scalar
    FIELDNAM:       Proton speed
    LABLAXIS:       Proton V
    FILLVAL:        [-1.e+31]
    VALIDMIN:       [0.]
    VALIDMAX:       [10000.]
    ...             ...
    DEPEND_0:       Epoch
    FORMAT:         F7.0
    DISPLAY_TYPE:   time_series
    standard_name:  Proton speed
    long_name:      Proton V
    units:          km/s
>>>
>>> # Access variable data
>>> # data1.data_vars['V_p'].data
>>> data1.V_p.data
array([320., 315., 323., ..., 332., 327., 331.], dtype=float32)
>>>
>>> # Access variable attributes
>>> # data1.V_p.attrs['CATDESC']
>>> data1.V_p.CATDESC
'Proton speed, scalar'

IDL®

This example demonstrates creating a CDF file and writing data and metadata, including creating a variable storing data and global and variable attributes storing metadata.

IDL> ; Create new CDF file, erase if the file already exists
IDL> cdf_id = CDF_CREATE('example.cdf', /CLOBBER)
IDL> ; Create CDF global attribute called Project, and write a string
IDL> att1_id = CDF_ATTCREATE(cdf_id, 'Project', /GLOBAL_SCOPE)
IDL> CDF_ATTPUT, cdf_id, 'Project', 0, 'ISTP>International Solar-Terrestrial Physics', /CDF_CHAR 
IDL> ; Create CDF zVariable called Epoch of type CDF_EPOCH, and write data from IDL&reg; variable called epoch
IDL> var1_id = CDF_VARCREATE(cdf_id, 'Epoch', /REC_VARY, ALLOCATE=n_epoch, /CDF_EPOCH, /ZVARIABLE)
IDL> CDF_VARPUT, cdf_id, 'Epoch', epoch
IDL> ; Create CDF variable attribute called CATDESC, and write a string for CDF variable Epoch
IDL> att2_id = CDF_ATTCREATE(cdf_id, 'CATDESC', /VARIABLE_SCOPE)
IDL> CDF_ATTPUT,  cdf_id, 'CATDESC', 'Epoch', 'Time, number of milliseconds since 01-Jan-0000 00:00:00.000', /CDF_CHAR
IDL> ; Close CDF file
IDL> CDF_CLOSE, cdf_id

MATLAB®

MATLAB® supports CDF reading and writing with two groups of modules, cdflib and cdfread/cdfinfo/cdfwrite. The module cdflib enables creating/reading/writing portions of CDF variables, while cdfread/cdfwrite reads/writes whole variables. Each release of MATLAB® is based on a particular version of CDF; for instance MATLAB® version 2018a is based on CDF V3.6.1. While this CDF version includes the newer types: CDF_INT8 and CDF_TIME_TT2000, however, the MATLAB® modules still do not support them. The CDF patch provides modified and expanded capabilities of the original cdfread/cdfinfo/cdfwrite from MATLAB®.

Times in CDF are generally stored in one of three forms: CDF_EPOCH (8-byte floating milliseconds from 0AD), CDF_EPOCH16 (two 8-byte floats of seconds from 0AD and picoseconds of that second), and CDF_TIME_TT2000 (8 byte integer of nanoseconds from 2000 AD). SPDF’s spdfcdfread can handle all time types, and optionally can convert their values to MATLAB®‘s datenum. SPDF’s spdfcdfread also provides cdfepoch and cdftt2000 objects (more accurate but not as efficient as MATLAB®‘s datenum).

If users get error messages reading a CDF file while using MATALB distributed CDF package/module they should try the patch from the CDF home page. The patch includes many user suggested enhancements and features. This patch works only for MATLAB® version R2007a and later.

For additional programming language examples, please see the Quick Start Guide.