The exercise of "wrapperizing" the Models-3/EDSS I/O API (M3IO) within the WRF I/O API (WRFIO) is dominated by just a few issues:Back to ContentsThese issues are quite resolvable. This paper gives a suggested implementation of the M3IO external I/O package within WRFIO, along with the metadata dictionary needed by the WRF "configure/build" system, additional requirements that this places on WRFIO, thoughts for an enhanced M3IO, and suggestions for augmenting WRFIO.
- different approaches to dataset/variable/metadata definition,
- data distribution issues
- date&time sequencing assumptions
- similar/compatible approaches to field I/O (albeit with somewhat different allocation of responsibility for gather/scatter)
- minor data structure differences
The current plan is a two-stage one:
- to construct a restricted prototype WRFIO
ext_m3io
package on top of a minimally modified version of M3IO, for use with the Summer 2002 air quality forecast effort; and
- to construct an enhanced version of the Models-3/EDSS I/O API (M4IO), together with a full-function WRFIO
ext_m4io
package built on top of it.As this document is being written, the effort to implement (1) is underway. This effort is bearing fruit in terms of finding "holes", ambiguities, and other difficulties in the WRFIO specification.
M3IO takes the point of view that the API should encapsulate state to the greatest degree possible, and avoid shared state between itself and its callers. For example, a dataset must be defined in one atomic operation, in which the user fills in all the fields in a dataset definition and then calls theBack to ContentsOPEN3()
routine. Similarly, to get a file's metadata, one makes a singleDESC3()
call and is returned the complete description of a dataset, together with all its metadata. As is appropriate for providing the level of commonality and integrity required of an API used for multi-model environmental modeling applications, for model coupling, and for easy callability from legacy Fortran-77 applications, the set of metadata required is fixed by a proscriptive standard; it is an error not to supply the complete set of metadata needed byOPEN3()
during data set creation.On the other hand, the WRFIO paradigm is to supply tools that allow the definition of an open-ended set of metadata, at the modeler's discretion, but with the added complication of considerable shared state between WRFIO and its callers. The
ext_pkg_get|put*()
routines in WRFIO allow the user to define a dataset incrementally by means of repeated calls. Provided that the "configure/build" system can ensure that all of the definitions expected by the M3IO are made (between calls toext_pkg_open_for_write_begin()
andext_pkg_open_for_write()
), there would seem to be no conceptual difficulty in wrapperizing M3IO within WRFIO. A complete dictionary of the metadata which must be set in order to completely define an M3IO-implemented data set is given in the section below, Required-Attribute Dictionary.
Ensuring that the "configure/build" system generates all the required definitions and metadata attributions during data set creation is a fundamental constraint of this "wrapperizing" approach.One problem with this open-ended operational paradigm is the lack of routines -- within both WRFIO and M3IO as they presently exist -- for inquiring about the existence, names, and types of metadata that have been stored in a WRF data set, as we describe in the section, Additional Requirements, below.
A minor implementation issue is that M3IO only supports input and output of fields of types
INTEGER
,REAL
, andREAL8
, whereas WRF also has fields of typeLOGICAL
. It will be very simple to add support for the additionalLOGICAL
type required by WRF, and it has recently become understood how to add theCHARACTER(LEN=<n>
types requested by EPA for emissions and observational data. Changes for these new types not will break backwards data, source, nor link compatibility of M3IO.A larger issue is what to do with attributes not expected by M3IO (which has a fixed list of what attributes it expects, as documented below in the section Required Attribute Dictionary ); we conclude that the wrapper should go ahead and read or write these attributes at the raw netCDF (or PVM or other) lower I/O level, so that the resulting data sets are "M3IO-plus", with additional metadata not maintained by the M3IO proper. (Due to the selective direct access nature of the underlying netCDF and PVM M3IO-implementation layers, the existence of additional attributes not accessed by the existing system is irrelevant to that system.) This WRF requirement implied by the open-ended set of "put attribute calls could lead to an augmentation of the existing M3IO, for the creation, inquiry, and access to additional metadata within M3IO datasets (probably the cleanest way to implement this WRF requirement). This issue,as well as the previous type extension issue, is discussed further below in the section, Enhancements to the Models-3 I/O API.
Another issue that does come up is the issue of time-dependent metadata, implied by the
ext_pkg_put_*_td_type()
routines: netCDF does not have provisions for implementing time dependent metadata, and consequently neither does M3IO. Presently, these are not used by the WRF model itself, and the draft document indicates "not yet implemented". One way to implement these is as by auxilliary variables in the data set; doing so puts some additional constraints on WRFIO, as we describe below in the section, Additional Requirements.A final issue is the issue of WRF "stagger" -- for M3IO files, all variables have the same grid structure, so that (for an MM5/MAQSIP example), the dot point variables (for which WRF stagger is
"XY"
) inhabit different files from the cross point variables(for which WRF stagger is none, coded"-"
). The three obvious potential solutions to this problem are (1) to extend M3IO to support stagger; (2) to dimension files to the extent of"XY"
stagger, pad all output variables out to the extent of those dimensions (and subset on input), and to add extra per-variable attributes describing the stagger-padding, Z-staggered variables being written both to the 2-D and the 3-D M3IO data sets that represent a particular WRFIO data set; or (3) to let a WRFIO data set be implemented by multiple files, one for each stagger type. We recommend alternative (1) below, in the section, Enhancements to the Models-3 I/O API.Due to the difficulties of maintaining the data structures necessary to store some indefinite number of additional non-standard metadata of multiple unknown types and sizes, the initial M3IO implementation of WRFIO to be used for the Summer 2002 air quality forecasting will not support any metadata except per-variable stagger and those metadata already standard for M3IO.
The WRFIOBack to Contentsext_pkg_read|write_field()
calls map fairly cleanly to M3IOREAD3()
andWRITE3()
calls -- the biggest difference being the domain and patch dimensionality and stagger info contained in the calls, and the added per-level optional selectivity offered by the M3IOREAD3()
. The WRFIO notions of "output frames" andext_pkg_get_next_var()
(etc.) can easily be implemented by maintaining an internal per-dataset state variable that keeps the current-frame simulation clock and current variable for the dataset.Initially, we propose to to support two modes of operation, based on data distribution--the modes that do not do distributed gather/scatter operations within WRFIO, with place-holders in place to extend to a third which does implement internal WRFIO gather/scatter, as described below. (As we understand it, this restriction is consistent with the current functionalities supported by the netCDF implementation, as well.)
- Question 1:
- NetCDF doesn't have a notion of a
LOGICAL
type for variables and attributes. Is this mapped to netCDFINTEGER
, , and if so, is this guaranteed to be portable?
- Question 2:
- What is the role of the
COM
argument toext_m3io_write_field()
andext_m3io_read_field()
? Can it be used to distinguish between the non-gather/scatter first and second modes, and the (internal gather/scatter) third?
- Full-Domain to Full-Domain
- In this scenario, full-domain gathers are performed by the driver layer prior to calling
ext_m3io_write_field()
or subsequent toext_m3io_read_field()
. There is a single unified full-domain data set produced. This case can be recognized on the M3IO-package side by the fact that the patch-extent arguments are the same as the domain-dimension arguments (and theCOM
argument is turned off?).
- Distributed to Distributed (same patch decompostion)
- In this scenario, distributed (parallel per-patch) calls are made to
ext_m3io_write_field()
andext_m3io_read_field()
. These routines do I/O on a distributed data set (i.e., with different files for different patches), already correctly partitioned appropriately across the computational nodes, for distributed (this-patch) inputs or outputs. For this mode, patch-extent arguments are different from domain-dimension arguments (and theCOM
argument is turned off?).
- Distributed to Full-Domain
- In this scenario, distributed (parallel per-patch) calls are made to
ext_m3io_write_field()
andext_m3io_read_field()
. These routines do I/O on a full-domain data set and then do gather/scatter operations to generate distributed (this-patch) inputs or outputs. This mode of operation is distinguished by the fact that theCOM
argument is turned on (?)
- Distributed to Distributed (different patch decompostions)
- In this scenario, distributed (parallel per-patch) calls are made to
ext_m3io_write_field()
andext_m3io_read_field()
. These routines do I/O on a distributed data set (i.e., with different files for different patches), on distributed data sets with patch decompositions different from the current WRF patch decomposition. We do not intend to support this mode. Is it necessary to detect it, and flag it as an error?).
M3IO clock objects use pairsBack to ContentsINTEGER JDATE,JTIME
to represent time accurate to (exact integer) seconds, using a coding ofYYYYDDD:HHMMSS
; time steps are exact integers codedH*MMSS
. There is a complete set of routines for date and time manipulation; the entire system is robust with respect to negative time steps and "denormalized" dates and times such as1999476:-234567
(this represents April 19, 2000 at 00:13:53--a date&time 23 hours, 45 minutes, and 67 seconds before the 476-365=111'th day of the year 2000). WRFIO clock objects usecharacter(len=??)
strings (of various inconsistent lengths in the documentation) and documented as accurate to milliseconds and formattedYYYY-MM-DD-HH-MM-SS.SSS
. What to do if the millisecond field for an M3IO implementation of WRFIO is nonzero is an open problem; however, it is not unreasonable that for many (or even most) applications, the WRF model time step will be an integer number of seconds, and this issue can be avoided. (On the other hand, there is a request for an enhanced I/O API that maintains an even more accurate notion of clock object, as documented below in the section, Enhancements to the Models-3 I/O API. With Fortran-90/95 function overloading, one can even make this enhancement transparent to existing applications, as a (hidden-by-F90) M3IO extension. The M3IO direct access mode of operation, and hence its deterministic record-number requirement can be preserved either by the use of rational-number fraction fields, or by double-precision fraction-fields (which have sufficient numerical precision that millisecond-level error tolerances can be preserved even for very long time step sequences). For the initial M3IO version for the Summer 2002 air quality forecasting effort, we will require exact-integer time steps; for the full M4IO, we will choose one of the enhanced formulations.An issue arises with respect to time-step sequences. For M3IO, time step sequences are characterized by a starting date, a starting time, and a time step. The correspondence between elements of a time step sequence and "record numbers" is a deterministic one, as made possible by the facts that the relevant arithmetic is exact integer arithmetic and that the time step is constant for the entire time step sequence. This will create a "show-stopper" problem for data sets of high temporal resolution, if a WRF driver were constructed to use an adaptive time step scheme. For coarser output time steps (i.e., substantially larger than the solver time-step), we would resort to the scheme used by the
MCPL()
output module for MM5, to deal with the non-determinism of pre-Version-3 MM5 time-keeping: find the model time step that contains the desired input or output time step, and thereby keeping the temporal error at most one model time step (probably good enough for hourly input, for example).A special case of this is the case of time-independent data: presently, I can see no means in WRFIO to define that a variable is time-independent (at least in the documentation to which I have access, nor do I see provision for time independent variables implied in what I read of the Registry. On the other hand, there are a number of variables for which it is clearly worthwhile to note that they are time independent: z, pb, pb8w, msft, msfu, msfv, f, e, sina, cosb, ht, etc.
Another issue is the WRFIO concept of input and output frames. Since M3IO does selective direct access I/O, it naturally performs I/O operations when they are requested, and the sequential-file "frame" concept is not even relevant. As far as I can tell, however, this is of issue only in terms of requiring that the external M3IO package for WRFIO must maintain and update the state variables related to frames (
ext_m3io_end_of_frame(DataHandle,Status)
must increment the specified data set's date&time, for example).QUESTION: Given the WRFIO concept of a "current frame", (with its own date&time) as maintained by routines
ext_pkg_get_next_time(DataHandle, DateStr, Status), ext_pkg_set_time(DataHandle, DateStr, Status) ext_pkg_end_of_frame(DataHandle, Status)
andext_pkg_get_next_var(DataHandle, VarName, Status)
, and given that the routinesext_pkg_read_field(DataHandle, DateStr, ...)
andext_pkg_write_field(DataHandle, DateStr, ...)
have a possibly differentDateStr
as one of the arguments, how do these latter two routines interact with the current-frame date&time? Also, how do the first three (dealing with potentially changing the current frame) interact withget_next_var
?
As a consequence of its single-operation "build a file from a complete file-definition" mode of operation, and consistent with the integrity requirements for the multi-model nature of environmental modeling applications, it is required that during the "dry-run" file-creation phase, the WRF execute a set of "put-metadata" calls sufficient to complete a MM3IO file definition, and specify these metadata by the names recognized by theexternal_m3io
package. It is an error to callext_open_for_write_commit()
without having specified the entire set of metadata for a complete M3IO data set definition.The dictionaries of required "global" and per-variable metadata are given below.
Question: What is the difference between WRFIO "global metadata" and "domain metadata"? Both have unique per-data-set values, applicable to the entire data set...
The following are global metadata that must be set while the file is still in the definition phase.
In addition, the M3IO runtime system always automatically maintains the following additional metadata, which are available on datasets opened for input:
DOMAIN_NAME (character(len≤32))
PROJ_TYPE (integer)
- Identifier-Token for the horizontal map-projection type, as documented at URL /products/ioapi/GRIDS.html#horiz
- 1: Lat-Lon
- 2: Lambert Conformal Conic
- 3: General Tangent Mercator
- 4: General Stereographic
- 5: Universal Transverse Mercator
- 6: Secant Polar Stereographic
- 7: Equatorial Mercator
- 8: General Transverse Mercator
PROJ_ALPHA (real8)
- First defining angle for the map projection, as documented in the URL above, in degrees
PROJ_BETA (real8)
- Second defining angle for the map projection, as documented in the URL above, in degrees
PROJ_GAMMA (real8)
- Third defining angle for the map projection, as documented in the URL above, in degrees
PROJ_XCENT (real8)
- Longitude of the Cartesian origin for the map projection, as documented in the URL above, in degrees
PROJ_YCENT (real8)
- Latitude of the Cartesian origin for the map projection, as documented in the URL above, in degrees
DOMAIN_XORIG (real8)
- Cartesian X coordinate for the starting (1,1) corner of the domain, as documented in the URL above, in meters.
DOMAIN_YORIG (real8)
- Cartesian Y coordinate for the starting (1,1) corner of the domain, as documented in the URL above, in meters.
DOMAIN_XCELL (real8)
- Cartesian cell-size in the X direction, as documented in the URL above, in meters.
DOMAIN_YCELL (real8)
- Cartesian cell-size in the Y direction, as documented in the URL above, in meters.
DOMAIN_NCOLS (integer)
- Number of dot point columns in the domain
DOMAIN_NROWS (integer)
- Number of dot point rows in the domain
DOMAIN_NTHIK (integer)
- Boundary thickness (# of cells) for the domain; usually +1 for an (unthickened) external boundary, or -1 for an internal boundary.
DOMAIN_NLAYS (integer)
- Number of levels for the domain
DOMAIN_VTYPE (integer)
- Vertical coordinate type for the domain, as documented at URL /products/ioapi/GRIDS.html#horiz and to be extended for WRF below:
- 7,8,...: New vertical coordinate types to be defined for WRF, as documented below.
- 1: Hydrostatic sigma-P (e.g., for MM4, MM5)
- 2: Non-hydrostatic sigma-P
- 3: Sigma-Z
- 4: Pressure (Pa)
- 5: Altitude Z (M above sea level)
- 6: Height H (M above ground level)
DOMAIN_VTOP (real)
- Domain top for sigma coordinates (Pa for Sigma-P, M for Sigma-Z)
DOMAIN_LEVELS (real(0:NLAYS) )
- Vertical coordinate values for the level-surfaces Constraint: Prior establishment of
DOMAIN_NLAYS
is required.
START_DATETIME (character(len=24) )
- starting date and time, given according to WRF conventions
TIME_STEP (character(len=24) )(?)
- dataset time step, given according to WRF time step conventions (which are TBD?)
NOTE: to the extent that WRF data sets have both time independent and time stepped variables, this will need to be a per-variable attribute that must either be zero (indicating a time independent variable) or have a common value for the entire data set (for the time stepped variables)
PROG_NAME (character(len=*))
- Name of the program creating the data set.
DATASET_DESC (character(len<=4800))
- dataset description (think of this as up to 60 lines of 80 characters each, with NEWLINE=ACHAR(10) as the delimiter)
DOMAIN_NVARS_2D (integer)
- Number of 2-D variables for the domain
DOMAIN_NVARS_3D (integer)
- Number of 3-D variables for the domain
CREATION_DATETIME (character(len=24))
- Date and time (GMT) that the dataset was created, given according to WRF conventions
UPDATE_DATETIME (character(len=24))
- Date and time (GMT) that the dataset was last updated, given according to WRF conventions
EXECUTION_ID (character(len=80) )
- Execution-ID for the program execution that created the dataset, according to Models-3 conventions.
UPDATE_DESC (character(len=4800) )
- Run/Execution-description for the program execution that created the dataset, according to Models-3 conventions.
Constraint 1: For the initial Summer 2002 M3IO air quality forecast implementation, onlyBack to ContentsSTAGGER
and the standard M3IO global and per-variable metadata, as documented below, will be supported.Constraint 2: The indicated data set must be in define-mode (i.e., dry-run, prior to commit) in order to set these metadata.
Constraint 3: Before these (or any per-variable) metadata are specified, the variable itself must have been registered by a prior (dry-run)
ext_m3io_write_field()
call.
VAR_TSTEP (character(len=24) )
- Time step specification for the indicated variable. Must either be 0 (indicating that the variable is time independent, or must agree with global attribute
TIME_STEP
.
VAR_UNITS (character(len≤32) )
- Units specification for the indicated variable; should be MKS / UDUNITS compliant.
VAR_DESC (character(len≤80) )
- one-line text description of the variable.
VAR_LEVELS (integer)
- Number of layers for the variable (either 1 or else matches
DOMAIN_LEVELS
.
VAR_TYPE (integer)
- Data type the variable (
INTEGER, LOGICAL, REAL, REAL8
).
VAR_STAGGER (character(len≤32) )
- NEW: Stagger specification for the indicated variable; should be one of the following: "", "X", "Y", "Z", "XY", "XZ", "YZ", "XYZ". Case is not significant.
I'm not sure of the meaning ofWARNING
andFATAL
in the error-value parameters in the list below; that issue should be re-visited. Moreover, from what I can infer fromWRFV1/external/IOAPI
, the current list of status codes is altogether too closely tied to the current netCDF implementation of WRFIO.
- NOTE
- The WRF documentation to which I have access describes WRF date&time representation variously as
character(len=19)
, "accurate to exact milliseconds," and "as exemplified by the format0000-01-00:00:00.0000
". These are mutually inconsistent, and this inconsistency needs to be resolved. Moreover, the code and documentation do not consistently specify the field-delimiters; I've coded my wrappers to look for any non-digits.
- Standard representation for time-deltas
- (This may be already present, but I can't find it). What I would suggest is that time deltas use a format adapted from the date&time representation: something like either
<sign>[[H*:]MM:]SS.SSS
or<sign>H*:MM:SS.SSS
where the sign, hours, and minutes fields are optional.Note that since "month" and "year" do not have invariant meanings, there should not be month-fields and year-fields in a time step representation.NOTE In
module_date_time
, the routinesgeth_idts
andgeth_newdate
seem to imply that time-deltas should be integers. This is an error if, as the comments in that very code indicate, date and time representation is"YYYY-MM-DD HH:MM:SS.ffff"
.
- Specification of the vertical coordinate type(s)
- There is a list of current candidates. I need to know how many items there are in this list, and what the names of them are. There also need to be standard
parameter
tokens for the corresponding WRF vertical coordinate type IDs...which should go in one of the WRF "constants" modules--which one?
- Provision for Time Independent Variables
- I do not see provision in the documentation nor what I interpret of the
Registry
for stating that a variable (such as terrain height, map scale factors, map rotation angles, and reference atmosphere). There need to be two extensions to the WRF to support this: augmentation of theRegistry
so that it allows the designation of a variable as time independent or time stepped, and a means within the WRF API to communicate this fact to WRFIO. For this latter, the standard could be that a dry-run call toext_pkg_write_field()
for which theDateStr
argument is identically zero (i.e.,00:00:00.0000
) designates the corresponding variable as time independent.
- Grid and Map Projection characterization
- Mathematically, the notion of a grid is subordinate to that of a map projection:
A map projection is a mapping from a domain on the surface of the Earth into a rectangle in a two-dimensional Cartesian space R2. The common mesoscale examples -- Lambert conformal conic, polar secant sterographic, equatorial Mercator -- require not only a set of (typically 3) defining angles, but also a Cartesian origin for their specification. It is not necessarily the case that the Cartesian origin falls at an origin "central lat-lon" coincident with the defining angles of the projection. This would seem to be an unwarranted assumption. Moreover, at the Meso-Gamma and finer scales targeted by the WRF, there will be a desire (at least for air quality applications) to support for the so-called Universal Transverse Mercator (UTM) map projections, by the way.A grid has a map projection and a "lattice structure" defined in terms of that grid -- typically needing the following, of which I can find only the last two in the Registry. One can not assume that the grid must be centered relative to the Cartesian origin of the map projection, especially as one goes to smaller scales (and there can be particular modeling advantages to having the grid off-center, if by so doing one can line up a coordinate axis with the prevailing winds of the domain modeled).
- starting corner
X1
,Y1
(e.g., the SW corner of the(1,1)
-cell)- cell-size
DX
,DY
- dimensions
NX
,NY
As far as I can tell, the WRF does not have any proper characterization of grids relative to the map projections within which they "live".
- Routines for the definition of time-dependent metadata
ext_pkg_def_glb_td_char(DataHandle,Length,Status)
ext_pkg_def_glb_td_type(DataHandle,Count,Status)
ext_pkg_def_var_td_char(DataHandle,Var,Length,Status)
ext_pkg_def_var_td_type(DataHandle,Var,Count,Status)
INTEGER, INTENT( IN ):: DataHandle, Length, Count
CHARACTER(len=*), INTENT( IN ):: Var
INTEGER, INTENT( OUT ):: Status
Constraint 1: These methods must be called while the data set is in its data definition phase, prior to the call to
ext_pkg_open_for_write_commit()
Constraint 2: These methods must be called prior to any call to a time dependent metadata routine
ext_pkg_put_td*()
NOTE: the draft-document prototypes for routines
ext_pkg_get_glb_td_*()
andext_pkg_get_dom_td_*()
needDateStr
as an additional argument.
Program termination
- In order for WRFIO data sets to have correctly updated headers (or the equivalent), all program terminations must cause the call of
ext_pkg_exit()
.STOP
andCALL EXIT()
should be forbidden.
WRF_ERR_INCOMPLETE_SD_DEF
parameter- It is a fatal error for data set creation to attempt to do a "commit" operation with a data set definition that is yet incomplete.
WRF_ERR_DEFN_CONSTRAINT
parameter- Constraint violation (e.g., setting a per-variable metadatum prior to establishment of the variable via dry-run
ext_pkg_write_field()
call.)
WRF_ERR_FATAL_
parameters for the following failures:- Possibly one additional error-code
WRF_ERR_FUBAR
, or else specific error codes for the following:
ext_m3io_init()
failedext_m3io_init()
not yet calledext_m3io_exit()
failedext_m3io_open_dataset_for_read()
failedext_m3io_open_dataset_for_write_commit()
when the supplied data set definition is not complete:WRF_ERR_INCOMPLETE_DS_DEF
ext_m3io_inquire_opened()
when the datahandle is incorrect for the specified file name.ext_m3io_close()
failed, when the indicated file exists but the close operation failed.ext_m3io_*_md*()
: metadataElement
not in data set.
- Time-interpolation method
ext_pkg_interp_field( <args>)
- where the argument list
<args>
is the same as that forext_pkg_read_field()
Constraint: The indicated variable must be of typereal
orreal8
.The corresponding M3IO routine,
INTERP3(FNAME,VNAME,<date&time>,...)
, has proven quite valuable. It interpolates variableVNAME
from fileFNAME
to the specified date and time (or returns a failure-status if this is not possible). The routine handles I/O optimization for the caller (i.e., maintains double buffering) behind the scenes.
- Time-Derivative method
ext_pkg_ddt_field( <args>)
- Constraint: The indicated variable must be of type
real
orreal8
.Given the double-buffering of a time-interpolation routine, it is easy to construct a time-derivative routine using the same buffer system. Such a routine is occasionaly useful, e.g., for getting a rainfall rate from the prognostic cumulative rainfall variable.
- Windowed Output
- For research and diagnostic purposes it may be worthwhile to make special provisions for the capability to output selected variables at high temporal frequency (e.g., the model time step), but only for a specified window into the model domain (resource requirements may make full-domain model time step output impractical).
- Layer-Selective input method
ext_pkg_read_field_level( <args>)
- where the argument list
<args>
augments that forext_pkg_read_field()
by adding an additional selector for model level.Air quality modeling experience shows that there is relatively frequent use of the selective-read operations that pick out just one level (especially the model-bottom level) from a 3-D field. Given a selective direct-access lower API-layer such as netCDF, this is quite easy to implement.
- Metadata-name-inquiry functions
- Presently, only if the modeler already knows beforehand that a variable named foo has an attribute named bar and that the attribute has type qux, can the modeler inquire for the value of the attribute (and similarly for "domain" and "global" attributes).
Full functionality requires a complete set of inquiry functions that retrieve the names and types of all the per-variable, domain, and global metadata. The end result would be a fully reflective metadata interface for WRFIO data sets.
- UNITS
- I would like to suggest that units given for variables in the
Registry
should be compatible to the extent possible with the MKS UCAR UDUNITS package. This reflects upon capitalization issues, so that one has, e.g.,M
instead ofm
for "meters". (This is an operational requirement, rather than a software-system requirement...)
- Data set opening modes "unknown" and possibly "create"
- One of the most useful M3IO file-open modes is
UNKNOWN
, which behaves as follows:One of the least useful M3IO file-open modes is "create" mode: if the file exists, delete it and create a new file according to the caller-supplied definition. (This one was added at the insistence of the visualization people, who wanted it for scratch files, and who alone are allowed to use it: By agreement between the heads of EPA ORD and MCNC Environmental Modeling Center, modelers are forbidden to use this mode, on pain of being sentenced to not less than three years nor more than five years, on a 386/25 running Windows 3.0 and Microsoft Fortran 1.0. :-) If the visualization people get wind of this mode, they may insist upon it too...
- The caller must supply a data set definition.
- If the file does not exist, create it according to the caller-supplied definition.
- If the file exists, check that its definition is consistent with the caller-supplied definition.
- In particular, if the caller-supplied starting date&time is later than the file's starting date&time, the former must be a valid time step for the latter.
There are a number of potential modifications to the Models-3 I/O API that would ameliorate incompatibilities between it and the WRF I/O API. Some of these would cause source-code, link, or data incompatibilities with the existing M3IO (which has been carefully-maintained to preserve backward compatibility), and would at the very least lead to a "flag day". For this reason, and because the WRF-Chem air quality model which (because of its planned use of SMOKE) is one of the chief targets of this effort will in fact be a fourth generation air quality model (as the EPA's Models-3 is of the third generation), it is attractive to create a new library -- M4IO --as a follow-on to M3IO. At the same time (if for no other reason than SMOKE compatibility), it is desirable to make compatibility between M3IO and M4IO as great as possible. It may be possible to make this almost entirely transparent by using Fortran-90 subroutine overloading for M4IO (although the Fortran-90 type system is not strong enough to do all that the API needs to do).Back to ContentsNOTE: Given the time and level of effort required to implement all the M3IO enhancements given below, and the level of agreement with EPA needed for M4IO acceptance by them, we propose that for the initial WRFIO-to-M3IO implementation to be used for the Summer 2002 air quality forecasting effort, we use the current version (Version 2.1) of M3IO, and require the following constraints upon WRF configurations used with it:
- Dates, times, and time steps are exactly representible in integer seconds (i.e., no fractional seconds).
- Variable-names and units-designations have length at most 16 characters.
- No M3IO-nonstandard nor time dependent metadata will be stored.
- WRFIO Data sets are represented as pairs of M3IO data sets, and have logical names built according to the formulas:
M3IONAME = TRIM(WRFIONAME); // _(2D|3D)
e.g., the WRF data setHISTORY1
becomes the pair of M3IO filesHISTORY1_2D
andHISTORY1_3D
.- "stagger" implemented by padding on output, and pad-stripping on input, Z-stagger being implemented by writing the surface level to the 2-D M3IO file and the elevated levels to the 3-D M3IO file.
Potential changes/enhancements from M3IO that go into the new M4IO are the following:
- Additional
LOGICAL
andCHARACTER(LEN=<n>
field types- Currently, M3IO supports
READ3()
andWRITE3()
for fields only of typesINTEGER
,REAL
, andREAL8
(INTERP3()
makes sense only for the last two of these). It would be almost trivial to add support forLOGICAL
; it has long been an EPA request to support fields of typeCHARACTER(LEN=<n>
, and we have recently worked out how to implement this cleanly, as well.
Does not break backwards data, source, nor link compatibility.
NOTE: netCDF does not supportLOGICAL
; do we handle this by overloadingINTEGER
?
- New "extra-attribute" M3IO routines
LOGICAL INQATT3(FNAME,VNAME,NATTS,ANAMES,ATYPES,ASIZES)
Does not break backwards data, source, nor link compatibility.
CHARACTER*(*) FNAME ! logical file nameLOGICAL RDATT3(FNAME,VNAME,ANAME,ATYPE,AMAX,ASIZE,AVAL)
CHARACTER*(*) VNAME ! variable name, or "ALL"
CHARACTER*(*) ANAMES( MXATTS3 ) ! attribute names
INTEGER ATYPES( MXATTS3 ) ! " types (M3REAL,M3INT,M3DBLE)
INTEGER ASIZES( MXATTS3 ) ! " sizes/lengths
LOGICAL RDATTC(FNAME,VNAME,ANAME,CVAL )
LOGICAL WRATT3(FNAME,VNAME,ANAME,ATYPE,AMAX,AVAL)
LOGICAL WRATTC(FNAME,VNAME,ANAME,CVAL )
CHARACTER*(*) FNAME ! logical file name
CHARACTER*(*) VNAME ! variable name, or "ALL"
CHARACTER*(*) ANAME ! attribute name
INTEGER ATYPE ! attribute type (M3REAL,M3INT,M3DBLE)
INTEGER AMAX ! attribute dimensionality
INTEGER ASIZE ! attribute actual size
REAL AVAL( AMAX ) ! attribute value (numeric)
CHARACTER*(*) CVAL ! attribute value (character-string)
Work currently underway (prototype version coded and in test at this time); should fit into Summer 2002 air quality prototype and be used for the initial implementation ofSTAGGER
- New "disk-synchronization" M3IO routine
LOGICAL SYNC3(FNAME)
Does not break backwards data, source, nor link compatibility.
CHARACTER*(*) FNAME ! logical file name
Work currently underway (prototype version coded and in test at this time); should fit into Summer 2002 air quality prototype and be used for the initial M3IO implementation.
- Higher-resolution date and time support
- WRF uses millisecond resolution; is this the appropriate way to go? There have been arguments in favor of three other approaches for date&time objects that are even stronger than millisecond temporal resolution (where below, date has a resolution of 1 day, and time has a resolution of 1 second):
Note that for the first two of these,
- <date&time> = <
INTEGER YYYYDDD
date > + <INTEGER HHMMSS
time> + <REAL8 0.SSS...
fractional seconds>- <date&time> = <
INTEGER YYYYDDD
date > + <REAL8 HHMMSS.SS...
time>- <date&time> = <
INTEGER YYYYDDD
date > + <INTEGER HHMMSS
time> + <INTEGER FRAC(2)=(NUMERATOR,DENOMINATOR)
rational-number fractional seconds>REAL8
provides sufficient numerical precision to allow for the deterministic calculation of time step record numbers accurate to WRF's millisecond tolerances, even for very long time step sequences. Multi-century runs would run into difficulty, however. The rational-number approach has the virtue of exact arithmetic; however, it is much more complex both for us to implement and for the average modeler to use. Currently, I (CJC) favor the first of these approaches, in terms of maximizing the combination of usability and precision.
Breaks backwards source and link compatibility.
- 32-character variable names
- M3IO currently supports 16-character variable names and units designations; WRF names are presently mandated to be at most 31-character. (NOTE: Length 32 bytes potentially gives alignment-improvements to the internal data structures used for implementation.)
Does not break backwards data nor source compatibility. Breaks backwards link compatibility.
INTERIM STEP: modify M3IO internals so that it requires trimmed name lengths to be at most 16.
Done, 1/20/2002
- New vertical coordinate types for WRF
- This one is trivial to add, without breaking backwards compatibility--it just adds new parameters in the
PARMS
include-file.
Does not break backwards data, source nor link compatibility
- Increased level-dimension support
- Currently, M3IO stores a maximum of 101 full-level values in file headers and file-description data structures. This number could be increased as desired.
What is an appropriate maximum for WRF? 256? 512?
Does not break backwards data noir source compatibility; breaks link compatibility.
- Support for WRF "stagger"
- There are two ways to accomplish this, as described above in the section Dataset Definition Issues: to add a per-variable
STAGGER
attribute in M3IO data sets, and either (1) use it to adjust the actual dimensions for input and output, or (2) to pad pad all output variables out to the extent of those dimensions (and subset on input), on the basis ofSTAGGER
. The former alternative gives smaller data set sizes and more efficient I/O, but at substantially greater code complexity and with the probability of hard-to-diagnose program crashes if old-M3IO programs are fed new-M3IO data sets.The consequence is that different variables within a M3IO data set would have different dimensionality, breaking forwards compatibility -- old implenmentations would be unable to read new "staggered" variables without scrambling them, and might generate hard-to-diagnose program errors in the process. Does not break backwards data compatibility. Breaks backwards source and link compatibility, potentially in ways that lead to hard-to-diagnose bugs.
- NAMELIST support for logical names
- M3IO by default uses environment variables (initialized by script commands
"setenv foo /bar/qux/dingbats.dat")
to bind data set logical names (as used by model code) to physical path names (as used by system calls), whereas WRFIO has traditionally usedNAMELIST
s. If possible, the relevant M3IO utility routines will be modified to use either environment variables, or anM3IO_NAMELIST
, ideally in that priority order, to evaluate the bindings of logical names.M3IO_NAMELIST
will itself be either the logical name for the namelist-file, or else will be the physical path-name for a file to be found in the current working directory.NOTE: Since the M3IO environment-variable utilities are themselves C called from the Fortran, it remains to be seen whether one can make this C code actually interpret the
NAMELIST
s. Perhaps the way to do this is to make the initialization routine do aputenv()
?Does not break backwards data, source nor link compatibility
- Geodetic-Spheroid metadata
- Presently, the knowledge of what spheroid is used for the map projection (and therefore for the geo-referencing of the data) is implicit, maintained in the head of the modeler. Especially for coupled environmental modeling systems, this is less than satisfactory; the data sets should contain a complete and accurate characterization of their georegistration.
Does not break backwards data compatibility. Breaks backwards source and link compatibility, potentially in ways that lead to hard-to-diagnose bugs.
- User-Defined metadata
- The present M3IO maintains a fixed set of global and per-variable metadata, with additional metadata stored as user-defined text in the (4800-character) file description and run/update description fields. The goal would be to provide additional interface-methods that yield a reflective interface to this user-defined metadata. We do not propose to relax the standards for mandatory metadata.
Does not break backwards data, source nor link compatibility.
- Fortran-90 overloaded-interface implementation
- Careful implementatoin of this would allow a backwards-compatible implementation that uses Fortran-90 polymorphism to handle the different kinds of time steps that may occur. Michael Metcalf's convert program can be of some help here, although we have seen that it does not deal with
INCLUDE
-files correctly, distorts the layout of comments and code, and requires some manual fixup and checking.
Does not break backwards data compatibility. Minimizes effects upon source compatibility. Breaks link compatibility. Causes portability difficulties due to the variety of different ways vendors implement modules.
- New Geospatial-Element Cell Complex data type
- This one isn't necessary for WRF (as it is presently conceived), nor for existing applications (although it would be useful for adding additional capabilities to SMOKE). However, it does fulfill a request from EPA for means to handle time-stepped/time-independent geospatial-coverage and finite element data in an efficient and powerful manner. It currently exists in prototype form (not included in the standard M3IO release); the EPA proposal describing its original conception can be found here
Does not break backwards data, source nor link compatibility.
- Additional software-implementation layers
- Currently, the M3IO is layered on top of two software libraries used for physical data storage/communication: netCDF (using files, and the netCDF2 Fortran interface), and PVM3, using the mailbox interface. It is perhaps desirable to extend or modify this in a number of ways:
- Add additional lower software-layers -- particularly MPI2
- Update the netCDF interface to use netCDF3
- Revise the name-binding mechanism for greater regularity in syntax (e.g.,
setenv foo MPI:/bar/stuff
)
Does not break backwards data, source nor link compatibility.
Back to Contents
wrf_io_flags.h, wrf_status_codes.h
location- These are of general use for all external I/O packages and should not be squirreled away under the
io_netcdf
leaf subdirectory.
- UTM map projection support
- These are a de facto standard for urban-scale air quality applications, and will be useful to support the urban-scale air quality modeling community (which already has its emissions databases configured in terms of UTM; note that emissions database development is one of the most resource-intensive activities in such air quality applications. Satisfying this request would largely be a matter of getting the correct preprocessor support; moreover, since almost all the original terrain and land cover databases are defined with respect to UTM, this should not be difficult (and would potentially avoid the loss of resolution that currently occurs when such data is re-mapped to another map projection.
- Names and Keywords
- Don't use variable names that happen to be the same as Fortran keywords, e..g,
DATA
in the argument list or implementation ofext_pkgget_glb_md_type()
.
- foo
- bar
The first requisite for layering WRFIO on top of an external M3IO package is to implement the simpler related objects and methods--such as date&time conversion routines--on top of which the API is built.
- QUESTION 1:
- Can we nail down that the WRF time-delta represention is as suggested above: similar to the WRF date&time representation,except that it may have a leading minus sign, it has hour, minute, and seconds fields, and that the leading fields may be missing? e.g.,
Note that the use of a hyphen as the delimiter in WRF date&time representation makes the construction and interpretation of negative time-differences rather trickier!12.375
twelve-and-three-eighths seconds
08-00-00.000
eight hours
-08-00-00.000
negative eight hours
168-00-00.000
one hundred sixty-eight hours (one week)
- QUESTION 2:
- Can we assume that the separator in the date&time character strings is always a hyphen? (i.e., so that date-strings look like
"YYYY-MM-DD-HH-MM-SS.SSS"
- QUESTION 3:
- Are WRF date&time objects always normalized? Or can things like
2001-12-32-01-00-00.000
(that "ought to be" 1 A.M. on Jan. 1, 2002) happen?
SUBROUTINE wrftime2m3io(DateStr,JDate,JTime[, Frac])
character(len=*), intent(in):: DateStr
integer, intent(out):: JDate, JTime
REAL8, intent(out):: FracOverloaded module routine that converts from WRF
"YY-MM-DD-HH-MM-SS.SSS"
character string time representation to (extended) M3IOinteger YYYYDDD:HHMMSS[:0.xxxxD0]
time representation
SUBROUTINE m3iotime2wrf(JDate,JTime[, Frac],DateStr)
integer, intent(in):: JDate, JTime
REAL8, intent(in):: Frac
character(len=19), intent(out):: DateStrOverloaded module routine that converts from M3IO
integer YYYYDDD:HHMMSS[:0.xxxxD0]
time representation to WRF character string time representation
WRFDT2M3IO( DTStr, Tstep[, Tfrac] )
character(len=*), intent(in):: DTStr
integer, intent(out):: TStep
REAL8, intent(out):: TFracOverloaded module routine that converts a standard WRF time-delta (once that is defined) into M3IO time-delta representation.
Note that M3IO already has functions that convert back and forth between seconds abd M3IO time-delta representation.
M3IODT2WRF( Tstep[, Tfrac], DTStr )
integer, intent(out):: TStep
REAL8, intent(out):: TFrac
character(len=*), intent(out):: DTStrOverloaded module routine that converts a standard WRF time-delta (once that is defined) into M3IO time-delta representation.
Note that M3IO already has functions that convert back and forth between seconds abd M3IO time-delta representation.
Each WRF data set will be implemented as a set of M3IO data sets, as necessitated by the fact that individual M3IO data sets have homogeneous time step, layer, and horizontal grid structures. The decision of just how "stagger" is implemented will determine just how many M3IO data sets there will be for each WRF data set.Another issue that comes up in this regard is the issue of name bindings: both the WRFIO and the M3IO systems achieve directory independence by mapping logical names for data sets into physical (path) names for files. Logical names are program-properties but do not imply actual file-system location or implementation by other means. This independence allows M3IO transparently to support both persistent file-based data sets and PVM-mailbox based communications channels for coupling a set of cooperating processes (as will be used initially to couple SMOKE with WRF. M3IO uses environment variables (e.g., set using the csh
setenv
command) to bind logical names to physical names; WRF traditionally usesNAMELIST
s. It may be possible to enhance the underlying M3IO environment variable routines so that they support both means of name binding. If so, this allows us to perform a transparent "behind the scenes " upgrade that supports both means of name bindings when the enhanced routines become available.For the initial implementation to be used for the Summer 2002 air quality forecast effort, we propose that the
ext_m3io_
package behave as follows:
- Each WRFIO data set is implemented as a pair of M3IO data sets: the 2-D data set and the 3-D data set.
- Initially, "stagger" will be implemented by padding,
- Environment variable name binding will be employed.
module_m3io.F90
and
wrappers-file ext_m3io.F90
This module encapsulates the state necessary to maintain the WRF datasets, as well as the input and output frames for each of them. It isBack to ContentsUSE
d by the subroutines inext_m3io.F90
, which implements all of theext_m3io_*()
wrapper calls that give the WRFIO external-package API interface. Theext_m3io_*()
routines provide Fortran-77 style (in practice, this means call-by-reference) implicit calling interfaces rather than explicit interfaces; this is necessitated by the fact that the subroutines (such asext_m3io_write_field()
which writes both 2D and 3D fields of typesINTEGER
,LOGICAL
,REAL
, andREAL8
) are overloaded to act upon what an explicit Fortran-90 interface would see as incompatible argument lists. (IMNHO, the lack of a "void pointer" type is a serious defect in the Fortran-9x standards, but we can't really do anything about it.)
- Wrapper-routines in
ext_m3io.F90
- Each of the routines in the
ext_m3io_*()
API has an implementation that wraps M3IO calls for WRF, and maintains the state tables inmodule_m3io
accordingly.
PARAMETER
s inmodule_m3io
- The M3IO parameters from
PARMS3.fh
areINCLUDE
d. Additionally, there are
INTEGER, PARAMETER:: WRF_ERR_FUBAR
: Otherwise-unclassified WRFIO-M3IO error.INTEGER, PARAMETER:: MXWRFIO
INTEGER, PARAMETER:: WRF_ERR_INCOMPLETE_DS_DEF
:INTEGER, PARAMETER:: FS_PENDING
token to indicate a data set is currently in the dry-run state.- tbd...
- Data Structures in
module_m3io
- tbd...
- State Variables in
module_m3io
- tbd... We need to
- Current number
NWRFIO
of data sets- List
FNAME
of current data set names.- Lists
FID2D
andFID3D
ofSTATE3
ID's for the 2D and 3D M3IO files mapped to the current data set.- Lists of standard metadata values for data sets currently in the dry-run state.
- tbd...
- Internal Subroutines in
module_m3io
INTEGER m3io_get_att_<type>(FID,VID,ATTNAME,ATT)
INTEGER, intent(IN):: FID ! m3io file ID
There is one such function for each
INTEGER, intent(IN):: VID ! m3io vble ID, or 0 for global
CHARACTER(len=*), intent(IN):: ATTNAME ! attribute-name
<type>, intent(IN):: ATT ! attribute
type
in{character(len=*), integer, real, REAL8 }
These functions are used to store only the non-M3IO variable or data set attributes
INTEGER m3io_put att_<type>(FID,VID,ATTNAME,ATT)
INTEGER, intent(IN):: FID ! m3io file ID
Retrieve the indicated per-variable attribute for the indicated variable or the indicated global dataset attribute from the ndicated M3IO file.
INTEGER, intent(IN):: VID ! m3io vble ID, or 0 for global
CHARACTER(len=*), intent(IN):: ATTNAME ! attribute-name
<type>, intent(OUT):: ATT ! attribute
There is one such function for eachtype
in{character(len=*), integer, real, REAL8 }
. These functions are used to retrieve only the non-M3IO variable or data set attributes.
Constraint: Dataset must be in define-mode, else the call is an error.
SUBROUTINE wrftime2m3io(WRFTIME,JDATE,JTIME[,SECFRAC])
CHARACTER(LEN=*), INTENT( IN ):: WRFTIME
This polymorphic routine implements the conversion from character string based WRF date&time representation to numeric M3IO or extended M3IO date&time representation.
INTEGER , INTENT( OUT ):: JDATE, JTIME
REAL8 , INTENT( OUT ):: SECFRAC
SUBROUTINE m3iotime2wrf(JDATE,JTIME,[SECFRAC,]WRFTIME)
INTEGER , INTENT( IN ):: JDATE, JTIME
This polymorphic routine implements the conversion from the numeric M3IO or extended M3IO date&time representation to character string based WRF date&time representation
REAL8 , INTENT( IN ):: SECFRAC
CHARACTER(LEN=*), INTENT( OUT ):: WRFTIME
Back to Contents