This page describes the structure of the files for a new underlying ("BINFIL3
") binary mode for the EDSS/Models-3 I/O API, to supplement the existing (and default) netCDF-file mode, the in-memory BUFFERED mode, and the PVM-based virtual mode.Since this mode uses native machine binary representation for its data as its underlying data representation layer, it should offer somewhat greater performance than the machine independent lower layers (
netCDF
,PVM
) do, for applications where I/O performance is critical. On the other hand, it is very desirable to keep the header metadata in a portable format, so that user-level programs can still read the data on binary-incompatible platforms and perform the appropriate data conversion themselves. For this reason, header metadata is stored in the portable formats, as described below.The sequence of data structures in these files is modeled somewhat after the structure of netCDF files, although the implementation mechanisms to store some of the metadata in a machine independent fashion are to some extent borrowed from ideas found in other formats, e.g., GRIB.
Restrictions and Limitations
- Initially, the supported platforms are ones with UNIXoid Fortrans (as listed below), but not Win32 nor Cray. Of these latter, Cray is the more difficult (made more difficult by the fact that I don't have access to one of their systems any more...)
- OSF/Alpha from DEC^H^H^HCompaq^H^H^H^H^H^H HP
- HP/UX
- IBM AIX
- Sun
- SGI
- Linux
- x86 with gcc/g77, gcc/lf95, pgcc/pgf90, gcc/pgf90, or icc/ifc;
- Alpha with gcc/g77 or cc/fort;
- ia64 with gcc/g77 or ecc/efc;
- [PPC970 with either gcc/g77, gcc andAbsoft f90, or IBM xlc/xlf should not be difficult but hasn't been done yet]
- [Mac OS-X with either gcc/g77 or xlc/xlf should not be difficult but hasn't been done yet either, AFAIK]
- Initially, the supported data types are those needed for current air quality modeling (and excluding the grid-nest and stream-hydrology data types):
CUSTOM3
GRDDED3
BNDARY3
IDDATA3
PROFIL3
SMATRX3
- Initially, the following (as far as I know, unused) two I/O routines are not supported:
READ4D
WRITE4D
Remarks on Implementation Strategy
- Implementation is in C, interfacing to Fortran in the same manner as the rest of the I/O API C code.
- Uses C
stdio
, and particularly usesfseeko()
for seeks (instead offseek()
), in order to interoperate with large file systems (implies Linuxglibc
version > 2.0).
- Implementation is in file
iobin3.c
.
INIT3
callsINITBIN3
FLUSH3
calls and other required disk synchronizations use new routineSYNCFID
that unifies calls toFLUSHBIN3
andNF_SYNC
- For
BINFIL3
files,
CRTFIL3
callsCRTBIN3
OPNFIL3
callsOPNBIN3
RDTFLAG
callsRDBFLAG
WRTFLAG
callsWRBFLAG
RDVARS
callsRDBVARS
WRVARS
callsWRBVARS
XTRACT3
callsXTRBIN3
CLOSE3
callsCLOSEBIN3
OPNLOG3
(called fromOPEN3
) now logs the implementation-layer used
SHUT3
does a sequence ofCLOSEBIN3
calls
The following representations of primitive data types of significance to the I/O API are used to store metadata in a portable fashion (so that the metadata can be interpreted on platforms other than the originating platform) in I/O APIBINFIL3
files. In principle, this lets the application programmer use theBINFIL3
layer of the I/O API to read the data on any platform, determine the transformations necessary to interpret it on his platform, and then perform the transformations on the data and use it.
INT4
- represented by a 4-byte string, in little-Endian order:
BYTE_0(X)
contains(unsigned char)(X&&255)
, i.e., the least significant byte of X
BYTE_1(X)
contains(unsigned char)((X/256)&&255)
BYTE_2(X)
contains(unsigned char)((X/65536)&&255)
BYTE_3(X)
contains(unsigned char)((X/16777216)&&255)
REAL
- represented by a character string formatted with format equivalent to the Fortran
FORMAT 1PE15.9
, followed by a trailing ASCII NULL
DOUBLE
- represented by a character string formatted as
1PD27.19
, followed by a trailing ASCII NULL
NAME
- Equivalent to a Fortran
CHARACTER*16
type (fixed-length 16-byte string, padded on the right by blanks; not nul-terminated as a C string would be.)
LINE
- Equivalent to a Fortran
CHARACTER*80
type (fixed-length 80-byte string, padded on the right by blanks)
STRING
- Equivalent to the Mac Fortran internal representation of a Fortran
CHARACTER*(*)
variable (with blank-padding on the right), i.e., as a C "struct hack"struct{
INT4 length;
char contents[ length ];
} ;
BINFIL3
file is as follows:
Machine/Compiler Architecture Metadata
INT4 INTSIZE: size of Fortran
"INTEGER"
INT4 REALSIZE: size of Fortran
"REAL"
INT4 DBLESIZE: size of Fortran
"DOUBLE PRECISION"
Per-File Metadata
NAME UPDATE_NAME: name of the last program
writing to file
LINE EXECUTION: value of environment
variable EXECUTION_ID
LINE FILE_DESC[ MXDESC3=60 ]: array
containing file
description (set by programmer during OPEN3())
LINE UPDATE_DESC[ MXDESC3=60 ]: array
containing run description, from file with logical name
SCENFILE
Dimension/Type Metadata
INT4 GDTYP: map projection type
INT4 VGTYP: vertical coordinate type
INT4 NROWS: number of grid rows
INT4 NLAYS: number of layers
INT4 NTHIK:
Temporal Metadata
INT4 STIME: starting time, coded HHMMSS
according to Models-3 conventions
INT4 TSTEP: time step, coded HHMMSS
according to Models-3 conventions
INT4 NRECS: current number of time step
records in the file (1-based Fortran-style counting)
Spatial Metadata
DOUBLE P_BETA: second map projection
descriptive parameter
DOUBLE P_GAMMA: third map projection
descriptive parameter
DOUBLE X_CENTER: Longitude of the
Cartesian map projection coordinate-origin (location
where X=Y=0)
DOUBLE Y_CENTER: Latitude of the
Cartesian map projection coordinate origin (map units)
DOUBLE X_ORIGIN: Cartesian X-coordinate of
the lower left corner of the (1,1) grid cell (map units)
DOUBLE Y_ORIGIN: Cartesian Y-coordinate of
the lower left corner of the (1,1) grid cell (map units)
DOUBLE X_CELLSIZE: X-coordinate cell
dimension (map units)
DOUBLE Y_CELLSIZE: Y-coordinate cell
dimension (map units)
REAL VGTOP: model-top, for sigma
vertical-coordinate types
REAL VGLEVELS[0:NLAYS+1]: array of
vertical coordinate level values; level 1 of the grid goes
from vertical coordinate VGLEVELS[0] to VGLEVELS[1], etc.
Per-Variable Metadata
NAME UNITS[ NVARS ]: array of units or 'none'
LINE VDESC[ NVARS ]: array of array of variable descriptions
INT4 VTYPE[ NVARS ]: array of variable types:
Additional attributes
Eventually: TBD, as necessary for the WRF extensions
placed in I/O API Version 2.2. At this point,
we anticipate that the implementation will be in terms of a
sequence of <name-type-value> triplets
Data Section
FLAGS[2,V] are the times for the data
record, encoded HHMMSS
FLAGS[1,V] and FLAGS[2,V] are in consecutive
memory/disk locations.
(NOTE: This amount of data is not functionally
necessary; however, it is included for the
historical reasonsa involving the convenience of
visualization-system programmers.)
Time step Contents:
Header Section
INT4 IOAPI_VRSN: I/O API Version
INT4 BYTE_ORDER: Byte order, i.e., the C
subscripts at which BYTE_0, BYTE_1, BYTE_2, BYTE_3 would
occur if we think of an integer as a C union:
union{ int idata; char cdata[4] } ;
NAME GRIDNAME: grid name
INT4 FTYPE: File data type
CUSTOM3, GRDDED3, BNDARY3, IDDATA3, PROFIL3,
or SMATRX3
LATGRD3=1 (Lat-Lon),
LAMGRD3=2 (Lambert conformal
conic),
MERGRD3=3 (general tangent Mercator),
STEGRD3=4 (general tangent stereographic),
UTMGRD3=5 (UTM, a special case of Mercator),
POLGRD3=6 (polar secant stereographic),
EQMGRD3=7 (equatorial secant Mercator), or
TRMGRD3=8 (transverse secant Mercator)
VGSGPH3=1 (hydrostatic sigma-P),
INT4 NCOLS: number of grid columns
VGSGPN3=2 (nonhydrostatic sigma-P),
VGSIGZ3=3 (sigma-Z),
VGPRES3=4 (pressure (mb)),
VGZVAL3=5 (Z (m above sea lvl), or
VGHVAL3=6 (H (m above ground))
for BNDARY3 files, perimeter thickness
(cells), or for SMATRX3 files, number of matrix-columns
(unused for other file types)
INT4 SDATE: starting date, coded YYYYDDD
according to Models-3 conventions
DOUBLE P_ALPHA: first map projection
descriptive parameter
NAME VNAME[ NVARS ]: array of variable names
M3BYTE = 1
M3INT = 4
M3REAL = 5
M3DBLE = 6
Not implemented at this time.
sequence of time step records
Time Step Header
INT4 FLAGS[2,NVARS]: array of
data-availability flags (with Fortran-style left-major,
1-based subscripting):
FLAGS[1,V] are the dates for the data
record, encoded YYYYDDD
array of data records,
subscripted by variable 1, ..., NVARS:
<type> array of data for this
variable and time step. Data is in native machine
binary format.
INTEGER IOAPI_VERSION
INTEGER IMPL_LAYER
INTEGER BYTE_ORDER
INTEGER INTEGER_SIZE
INTEGER REAL_SIZE
INTEGER DBLE_SIZE