INCLUDE
files.
MODULE M3UTILIO
Instruction... not supported
or Program Exception—illegal instruction
issues
If you run into troubles with I/O API related programs, it is useful to know the versions of all the software components. The CVS-related program ident can report to you versioning keywords in the various components of (binary) object, library, or executable files. For example, I can run the following sequence of commands on my desktop machine to find out the versioning information of various binary components:Each I/O API source file will have its version embedded in the file's header-comment, e.g.% cd $HOME/apps/$BIN % ident init3.o —reports the INIT3 version: init3.F 87 2015-01-07 17:37:58Z coats % ident libioapi.a —reports the INIT3 and M3UTILIO versions % ident m3stat —reports the INIT3, M3UTILIO, and netCDF versions!! Version "$Id: ERRORS.html 265 2024-12-09 21:41:40Z coats $"
I/O error status numbers are compiler specific, so one needs to know what the underlying compiler is, and search for its error-code list (Google for the compiler and quot;Fortran runtime error codesquot;).Note that for CMAQ, mpif90 is actually a script "wrapping" around conventional Fortran+C compilers; this script quot;knowsquot; which libraries and which include-files to use, so you are highly dependent upon the underlying compiler and need to search for its error-codes.
This version of gfortran takes a particularly idiosyncratic interpretation of the (latest) Fortran-2018 Standard.Back to "Troubleshooting" ContentsAS of July 12, 2020, the relevant ioapi/Makeinclude.${BIN} files have been modified to add Fortran compile-flag
so that this interpretation does not cause a compile-error.-std=legacy
However, using this compiler version will cause the generation of a huge number of spurious warning-messages, as the compiler is still trying to enforce its version of the Fortran-2018 (not Fortran-90, not Fortran-95, not Fortran-2008) Standard.
Thanks to Mrs. Indumathi S Iyer, (SO/D), BARC, for pointing out this compiler-problem and help with testing the fix.—CJC
MODULE M3UTILIO
SinceBack to "Troubleshooting" ContentsMODULE M3UTILIO
itselfINCLUDE
s the standard I/O include-files and also hasINTERFACE
-blocks for (almost all of) the public I/O API functions, when you retrofitUSE M3UTILIO
into an old code, you must remove theseINCLUDE
-statements and declarations andEXTERNAL
statements for the public I/O API functions. If you missed some of these, you may see compile errors like the following... /home/coats/ioapi-3.2/ioapi/PARMS3.EXT(66): error #6401: The attributes of this name conflict with those made accessible by a USE statement. [NAMLEN3] ... /home/coats/ioapi-3.2/m3tools/m3tproc.f90(102): error #6401: The attributes of this name conflict with those made accessible by a USE statement. [GETNUM] ...or... Error: Symbol 'getnum' at (1) conflicts with symbol from module 'm3utilio', use-associated at (2) ...or... Back to "Troubleshooting" ContentsTo fix these errors, remove the corresponding
INCLUDE
-statements, function-declarations, andEXTERNAL
statements.
Thanks to Christopher G. Nolte, Ph.D., US EPA Office of Research and Development for his M3USER mailing-list comments this one.Problem: at run-time, messages like
Please verify that both the operating system and the processor support Intel® X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, F16C, AVX, FMA, BMI, LZCNT and AVX2 instructions.
or
Program Exception - illegal instruction
This is probably the result of compiling either the library or the model (or both) for a different processor-model than you are running it on.
Starting with the Pentium II processor (1997), successive generations of Intel processors have introduced more and more powerful vector-style instructions (
MMX, SSE, SSE2, SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, ...
) that can substantially speed up array-style calculations (including, particularly, the I/O APIINTERP3()
). Note that each processor generation does support all the previous generations of instructions (but not, of course, vice versa).Well designed modeling codes will get approximately a 20-25% performance boost for using SSE4.2 instructions, a further 70-80% boost for AVX, and a further 25-30% for AVX2. Because of its sloppy coding, WRF will get less than half that much speedup, and CMAQ even less than that (due to the fact that these codes are so bottlenecked by main-memory operations that improving the arithmetic doesn't help much) . Note that Intel and AMD have also improved the memory systems of the various processor generations, giving a further 5-10% performance boost per processor generation for that reason (independent of which instruction set you're using). In fact, the degree of speedup is a good measure of how well array based calculations are coded: good CFD applications will typically get an
AVX
speedup factor of about 1.8, whereas the (more poorly-coded) WRF gets only about 1.3 (which can be improved substantially by re-coding the advection and diffusion routines to be less memory-system-hostile).See https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions and https://en.wikipedia.org/wiki/Advanced_Vector_Extensions for more information about the SSE and AVX families of new instructions.
On a Linux system, you can see what instructions are supported by running the following at the command line
and then looking at thecat /proc/cpuinfo
flags
sections for the instructions listed below.Use of these instructions is typically governed by command-line directives given to the compiler; different compilers use different flags to govern this, and have different defaults. GNU and Intel compilers typically default to
SSE3
; PGI compilers typically default to the instruction set for the processor on which the compiler itself is being run. See your compiler's documentation on how to control this. Some examples are:
- Intel ifort/icc:
-x...
directives:
-xHost
: Use all the instructions for this machine
-xSSE4.2
: Nehalem or later
-xAVX
: SandyBridge or later
-xAVX2
: Haswell or later
-xCORE-AVX512
: Skylake-X or later
- GNU gfortran/gcc
-march=... -mtune=...
directives: the first of these governs instruction set use; the second controls how the optimizer uses it
-march=native -mtune=native
: this machine's architecture
-march=corei7 -mtune=corei7
: Nehalem or later (SSE4.2)
-march=corei7-avx -mtune=corei7-avx
: SanyBridge or later (AVX)
-march=corei7-avx2 -mtune=corei7-avx2
: Haswell or later (AVX2)
- Portland Group ifort/icc
- Default is this machine's architecture (dangerous if you have multiple different-generation machines!)
-tp=nehalem
: Nehalem or later (SSE4.2)
-tp=sandybridge
: SanyBridge or later (AVX)
-tp=haswell
: Haswell or later (AVX2)
Recent Intel processors and their instruction sets
- Nehalem (2008)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2
- Sandy Bridge (2011)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX
- Ivy Bridge (2012)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX
- Haswell (2013)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
- Broadwell (2015)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
- (XEON server-processor) Skylake (2015)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
- (XEON server-processor) Skylake-X (2017)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, FMA3
- Kaby Lake (2017)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
- Coffee Lake (2018)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
Back to "Troubleshooting" Contents
Generally, you may need to run these programs withBack to "Troubleshooting" Contentslimit stacksize unlimitedsince they allocate scratch-variables "off the stack" (as is the usual/recommended practice in Fortran-90). You may possibly also needlimit memoryuse unlimited
The configure script for netCDF Fortran will sometimes fail with the obscure error message that it could not find a (working) Fortran compiler, frequently when you well know that you have specified a perfectly-good such compiler. What it is actually saying isBack to "Troubleshooting" ContentsThe combination of compiler and flags that you have specified does not work correctly.This means you have a bad set of flags — frequently, that you are requiring static linking for which only shared libraries are available. For example, if yourLDFLAGS
containsthen this obscure error will almost-certainly happen, since almost always there will not be the static library libm.a but only instead the shared library libm.so.-static -lm
The system-utility nm is very useful for this category of problems; it can be used to list all of the symbols in a library .a, object .o or executable. When they contain machine-code for a routine, it will show up with its linker-name (as opposed to source-code name), with aBack to "Troubleshooting" ContentsU
for each use and aU
for the routine's definition. For example, to find out aboutOPEN3
inlibioapi.a
:nm $io/../$BIN/libioapi.a | grep -i open3 U open3_ U open3_ U open3_ U open3_ U open3_ open3.o: 0000000000000000 d open3.firstime_ 0000000000000000 T open3_ open3c.o: U open3_ 0000000000000000 T open3csays thatOPEN3
is defined in open3.o and used 6 other times (including in open3c.o)Generally, missing symbols with kmp, omp or openmp as parts of their name indicate that you may have compiled the I/O API with OpenMP parallelism enabled, but are not linking your program accordingly. Look in the relevant ioapi/Makeinclude.$BIN for the make-variables
OMPFLAGS
andOMPLIBS
to see what you need to add to your program's Makefile.Many other missing symbols (especially with nf_ or nc in them) are related to netCDF-library issues (and to the libraries which netCDF assumes); see the section on netCDF Version 4 issues.
In general, you are best off if you can build the whole modeling system (libnetcdf.a, libpvm3.a, libioapi.a, and your model(s) CMAQ, SMOKE, etc. with a common compiler set and common set of compile-flags. When this is not done, there are a number of compatibility issues with mixed compiler sets, and with the GNU 3.x-4.x compiler set these get worse. Some of these problems show up at link time; others at run-time. In particular, the following are known to have problems:Back to "Troubleshooting" Contents
- Linux-distribution-supplied libnetcdf.a rarely works with CMAS-supported compiler sets. It is best to build your own netCDF library, with the same compilers and compiler-flags as your libioapi.a and models.
- NetCDF Versions 4.x have lots of changes; see this note about it in the build instructions.
- Compiler-version to compiler-version library troubles. These are known to happen in particular between versions for the Sun and Intel compiler sets. It is likely an issue with mixed GNU 3.x and 4.x systems as well.
- Link errors with pgf90 and ifort on Linux (below)
The following are not relevant for I/O API-3.0 or later, since Fortan-77 support has been dropped:
- Builds with mixed g77, g95, and/or gfortran: these seem to link correctly, but Fortran I/O gets messed up because they use different unit-number systems behind the scenes and give you troubles. Thanks to Erick Jones, BSI, for this one.
- Builds with mixed f77 f90 on various systems including Sun and SGI (troubles similar to the above...)
- Builds with mixed f77 f90 on various systems including Sun and SGI (troubles similar to the above...)
Starting with their Version 16 compilers, Intel has introduced a new compiler directiveBack to "Troubleshooting" Contents-qopenmp
to enable OpenMP, and has deprecated the previous-openmp
. This previous-version flag now results in a "deprecated flag" warning from the compiler. Changing theMakeinclude.*ifort*
to match this compiler-change can eliminate this compile-warning for the latest set of Intel compilers at the cost of making, for example, makes,Makeinclude
s, etc. incompatible with Intel-15 or earlier ones.
At least some versions of the Intel compilers icc and ifort cannot handle the internal complexity of some routines (usually iobin3.c) when compiling with full optimization: one will see error messages like the following when running make for the I/O API (where I've used backslashes to fold the compile-line to make it readable):Back to "Troubleshooting" Contentscd /nas01/depts/ie/cempd/apps/CMAQ/v5.1/Linux2_x86_64ifortopenmpi; \ icc -c -DIOAPI_PNCF=1 -DAUTO_ARRAYS=1 -DF90=1 -DFLDMN=1 -DFSTR_L=int \ -DIOAPI_NO_STDOUT=1 -DAVOID_FLUSH=1 -DBIT32=1 -O3 -traceback -xHost \ -DVERSION='3.2-nocpl' /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c(1111) (col. 29): internal error: 0_1529 compilation aborted for /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c (code 4) make: *** [iobin3.o] Error 4A measure that generally works is to re-do the last compile-command manually, but with a lower optimization, and then re-do the make. It is useful to cut-and-paste the last command into a sub-shell (enclosing the command by parentheses), with the "-O3" eliminated or reduced to "-O", as in the following example:
( cd /nas01/depts/ie/cempd/apps/CMAQ/v5.1/Linux2_x86_64ifortopenmpi; \ icc -c -DIOAPI_PNCF=1 -DAUTO_ARRAYS=1 -DF90=1 -DFLDMN=1 -DFSTR_L=int \ -DIOAPI_NO_STDOUT=1 -DAVOID_FLUSH=1 -DBIT32=1 -traceback -xHost \ -DVERSION='3.2-nocpl' /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c ) ! makeThis trick is also useful when trying to do highly-optimized builds of other models that contain large, complex routines (WRF, CMAQ, ...)
General Principle: various Fortran compilers "mangle" subroutine names (etc.) for the linker in various ways.Back to "Troubleshooting" ContentsNote: 64-bit mode under Linux adds further issues.
Note added 2/24/2009:: Aparna Vemuri of EPRI reports troubles with recent gcc compiler systems and Portland Group pgf90: the gcc Fortran name mangling system has changed, requiring a change in compile flags. For mixed pgf90/gcc builds, one can either remove the
-Msecond_underscore
flag fromFOPTFLAGS
in the Makeinclude.Linux2_x86_64pg_gcc* or else change the lineCC = pgcc
toCC = gcc
in the Makeinclude.Linux2_x86_64pg_pgcc* files. These modifications have been made to the 2/24/2009 release of the Makeinclude.Linux2_x86_64pg_gcc*, with the older flags commented out, for use by those who need them.In particular, Gnu Fortrans (g77 and g95) have different name mangling behavior than is the default with Portland Group pgf90. Vendor supplied NetCDF librararies
libnetcdf.a
always use the Gnu Fortran conventions, and as such are incompatible with the default compilation flags for SMOKE or CMAQ. For the Linux/Portland Group/SMOKE or CMAQ combination, you have two choices:
- Use the vendor supplied
libnetcdf.a
and default I/O API build, but fix the SMOKE or CMAQ compile flags, usingioapi/Makeinclude.Linux2_x86pg_gcc*
as your guide; or- Build
libnetcdf.a
from scratch for yourself, using compile flags compatible with your SMOKE or CMAQ build; build the I/O API usingioapi/Makeinclude.Linux2_x86pg_pgcc*
; and use these libraries.This Portland Group inconsistency is exactly why the I/O API is supplied with multiple
/Makeinclude.Linux2_x86pg*
files in the first place... Note that the I/O API supplies a scriptnm_test.csh
and a make targetmake nametestto help you identify such problems.
Added 4/4/2005Back to "Troubleshooting" ContentsInternal compiler errors have shown with gcc/g77 on the some Linux distributions for x86_64, particularly with Fedora Core 3 and Red Hat Enterprise Linux Version 3 for x86_64: the symptom is a sequence of messages such as the following:
A workaround is to weaken architecture/optimization flags for binary typeerror: unable to find a register to spill in class `AREG' /work/IOAPI/ioapi/currec.f:93: error: this is the insn: (insn:HI 145 171 170 8 (parallel [ (set (reg:SI 3 bx [95]) (div:SI (reg/v:SI 43 r14 [orig:67 secs ] [67]) (reg/v:SI 2 cx [orig:68 step ] [68]))) (set (reg:SI 1 dx [96]) (mod:SI (reg/v:SI 43 r14 [orig:67 secs ] [67]) (reg/v:SI 2 cx [orig:68 step ] [68]))) (clobber (reg:CC 17 flags)) ]) 264 {*divmodsi4_cltd} (insn_list:REG_DEP_ANTI 92 (insn_list:REG_DEP_OUTPUT 91 (insn_list 140 (insn_list 84 (insn_list:REG_DEP_ANTI 139 (nil)))))) (expr_list:REG_DEAD (reg/v:SI 43 r14 [orig:67 secs ] [67]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_UNUSED (reg:SI 1 dx [96]) (nil))))) ...confused by earlier errors, bailing outLinux2_x86_64
as described above to get around this compiler bug -- eliminating the -fschedule-insns and -march=opteron optimization flags from "Makeinclude.Linux2_x86_64" will tend to get rid of the problem. Note that this same compiler bug will bite you when trying to build lots of other stuff (TCL/TK, plplot, NCAR graphics), on FC3/gcc/g77 systems, and the same fix seems to work for many other problems as well.
relocation error
link issues for x86_64 Linux
If sizes of individual arrays or ofBack to "Troubleshooting" ContentsCOMMON
blocks exceed 2GB on the x86_64 platforms, Intel ifort and icc will give you failures, with messages aboutrelocation error
s at link-time. The problem is that the default "memory model" doesn't support huge arrays and huge code-sets properly. The "medium" memory model supports huge arrays, and the "medium" memory model supports both huge arrays and huge code-sets. To get around this, you will need to add-mcmodel=medium -shared-intelto your compile and link flags (for the medium model), and then recompile everything including libioapi.a and libnetcdf.a using these flags. Note that this generates a new binary type that should not be mixed with the default-model binaries. There is a new binary typeBIN=Linux2_x86_64ifort_medium
for this binary type, and a is a sample Makeinclude file for it, to demonstrate these flags:Makeinclude.Linux2_x86_64ifort_mediumOther compilers and other non-Linux x86_64 platforms will have similar problems, but the solutions are compiler specific.
Added 12/18/2003Back to "Troubleshooting" ContentsSGI F90 compiler-flag problems: It seems that SGI version 7.4 and later Fortran compilers demand a different set of
TARG
flags than do 7.3.x and before. For example, for an Origin 3800 (where hinv reportsone would use the following sets of24 400 MHZ IP35 Processors CPU: MIPS R12000 Processor Chip Revision: 3.5 ...
ARCHFLAGS
compiler flags inMakeinclude.${BIN}
with the different Fortran-90 compiler versions:
-TARG:platform=ip35,processor=r12000
for 7.3.x and before-TARG:platform=ip35 -TARG:processor=r12000
for 7.3.x and beforeThere are a number of problems with both the I/O API and netCDF with the newer (version 7.4) SGI compilers:
Added 12/18/2003
SGI claims to have fixed this in the latest patch for F90 version 7.4.1 (bug # 895393); I haven't had time to test it yet, though. -- CJC
- NetCDF and IRIX 7.4 compilers:
Experience indicates that the IRIX 7.4 compilers will not correctly build the netCDF library used by the I/O API. Although the make seems to succeed on that platform, make test fails almost immediately; attempts to use thelibnetcdf.a
that was built will also lead to program crashes.
At present, the only workaround we have is to use a libnetcdf.a built using IRIX 7.3 or earlier compilers.
- I/O API and IRIX 7.4 f90:
The IRIX 7.4 f90 compiler refuses to recognize industry-standard practice for linkingBLOCK DATA
subprograms from libraries. For the upcoming I/O API Version 3, we have put into place a workaround-hack that puts a conditionally-compiled non-Fortran-conforming SGI-onlyat the start of subroutineCALL INITBLK3
INIT3
.The IRIX 7.4 f90 compiler also thoroughly mangles the buffering of log-output in ways that we have not yet managed to decipher completely, much less repair. The outcome is that log output will show up in scrambled order. (Note that industry-standard mapping of
WRITE(*,...)
onto unbuffered UNIX standard output still happens with version 7.3 and must be preserved, but fails with version 7.4.)
Back to "Troubleshooting" Contents
- Multiply defined symbol
nf_get_var_int64_
(etc.) errors on program builds:
Some configurations of netCDF-4 supportINTEGER*8
(64-bit integer) variables, and some don't. I/O API-3.2 and later attempt to support these when they are available, and have to provide "hacks" when they're not. To detect netCDF-4INTEGER*8
support:If this turns up a result, then you need to add the definitionnm libnetcdff.a | grep nf_get_var_int64_
-DIOAPI_NCF4=1
to the make-variableARCHFLAGS
in yourMAKEINCLUDE.${BIN}
. Otherewise, you will get "multiply defined symbol" errors when you attempt to compile programs.
- NetCDF Error Troubleshooting Generalities:
All the netCDF "magic numbers" are defined in the I/O API
NETCDF.EXT
file (which is the I/O API name for the file netCDF callssrc/fortran/netcdf.inc
and also (for I/O API-3.2) in themodncfio.F90
: look for parametersnf_noerr
, etc. Errors defined in netCDF 2.x have positive values in the range 1...32 (except forNCSYSERR
which is -1); errors newly defined for netCDF 3.x are in the range -60...-1. General methodology: find the error-number and then try to figure out what's wrong from the name of the correspondingPARAMETER
.Note that UCAR re-defined some of these errors between versions 3.3.1 and 3.4 of netCDF (while leaving the various library versions link-compatible), so you may have to look at the
src/fortran/netcdf.inc
for the version of the netCDFlibnetcdf.a
you are linking with, if this is different from the version used to build yourlibioapi.a
Martin Otte, US EPA, reports that there are similar errors encountered with netCDF Version 4, due to more stringent interpretation of flags for opening or creating files. This is fixed in the Oct. 28 I/O API distribution.
- I get "netCDF error -1"
This is
NCSYSERR
, meaning the system wouldn't give you permission for what you wanted to do. Most probably it means you need to check permissions on either the file you're trying to create or access, or on the directories in its directory path.
- I get "netCDF error 2"
"Not a netcdf id", which can happen both if the file honestly isn't a netCDF file, and also if it is a netCDF file, but wasn't shut correctly. (unless you've declared a file "volatile" by
setenv <file> <path> -v
, netCDF doesn't update the file header until you callSHUT3()
orM3EXIT()
.)
- I get "netCDF error 4"
"Invalid Argument", but almost certainly this means you're using netCDF library 2.x with an I/O API library built for netCDF version 3.x (NCAR accidentally changed one of the "magic numbers" used in opening files when they upgraded netCDF from 2.x to 3.x).
- I get "netCDF error -31"
This is a variant of the system permission problem. A directory spec of with an extra nonexistent component, e.g.,
/foo/bar/qux/zorp
when you really mean/foo/bar/zorp
and the/foo/bar/qux
doesn't exist seems to cause Error -31. Can also happen by trying to open too many netCDF files simultaneously (although the I/O API has additional traps around this).Or on a Cray vector machine, this may mean you're running up against your memory limit. (On Crays, netCDF v3.x dynamically-allocates a fairly large buffer to optimize I/O for each file; this allocation may well push you over your (interactive or queue) memory limit. For netCDF v3.4, there are tricks you can play with environment variables to manipulate these buffer sizes. This error also has turned up with some of the more obscure file-permission problems.
- I get "netCDF error -40"
Probably means you tried to read data past the last date-and-time on the file (the I/O API runs netCDF in "verbose mode", so that netCDF will always print all error messages, including this one. Also can happen when the calling program is running in parallel, but a non-MP-enabled version of the I/O API library was linked in.
- List of netCDF errors, with attempted annotations
- ncnoerr = nf_noerr = 0: : no error has been detected at this time.
- ncenfile = nc_syserr = -31: see above
- ncebadid = nf_ebadid = -33: not a netcdf ID (might indicate a bug in I/O API internals, or attempt to use a coupling-mode virtual file in a program linked to an I/O API library without coupling-mode enabled)
- nceexist = nf_eexist = -35: attempting to create a new file when the file already exists (from
OPEN3()
with status argumentFSNEW3
)- nceinval = nf_einval = -36: invalid argument (see above about "incompatible netCDF and I/O API versions")
- nceperm = nf_eperm = -37: attempted write to a read only file
- nf_enotindefine = -38: operation not allowed in data mode (would indicate a bug in I/O API internals)
- nceindef = nf_eindefine = -39: operation not allowed in define mode (would indicate a bug in I/O API internals)
- ncecoord = nf_einvalcoords = -40: coordinates out of range -- probably, attempt to read past the last date-and-time on the file. Can also be caused by running a program in parallel with the non-MP-enabled version of the I/O API library. (Otherwise, would indicate a bug in I/O API internals)
- ncemaxds = nf_emaxdims = -41: maxncdims exceeded (would indicate a bug in I/O API internals)
- ncename = nf_enameinuse = -42: string match to name in use: indicates that you're trying to have two different variables with the same name when creating a file
- ncenoatt = nf_enotatt = -43: attribute not found: would indicate that a file is not a correct I/O API file, because it is missing some of the required
FDESC3
header-components- ncemaxat = nf_emaxatts = -44:
maxncattrs
exceeded (would indicate a bug in I/O API internals)- ncebadty = nf_ebadtype = -45: not a netcdf data type: you are trying to create a file for which some value of
VGTYP3D(<variable>)
inFDESC3
is not one ofM3INT
.M3REAL
, orM3DBLE
- ncebadd = nf_ebaddim = -46: invalid dimension ID (would indicate a bug in I/O API internals)
- nceunlim = nf_eunlimpos = -47: ncunlimited in the wrong index: Could be caused by incorrectly-set (or un-set) grid dimensions
NCOLS3D
,NROWS3D
,NLAYS3D
, orNTHIK3D
(else would indicate a bug in I/O API internals).- ncemaxvs = nf_emaxvars = -48: maxncvars exceeded (would indicate a bug in I/O API internals--probably means somebody changed INCLUDE-file
PARMS3.EXT
inappropriately for the target machine.)- ncenotvr = nf_enotvar = -49: variable not found (attempt to read or write a variable not actually in the file; would indicate a bug in I/O API internals)
- ncenotvr = nf_eglobal = -50: action prohibited on ncglobal varid (would indicate a bug in I/O API internals)
- ncenotnc = nf_enotnc = -51: not a netcdf file: File not recognized as a netCDF file (possibly empty; possibly not closed properly (e.g., no
SHUT3()
orM3EXIT()
; possibly generated by a program that uses HDF-enabled netCDF but being read by a program with (the recommended) HDF-disabled netCDF).- ncests = nf_ests = -52: In Fortran, string too short (shouldn't happen with I/O API)
- ncentool = nf_emaxname = -53: variable-name or attribute-name too long (would indicate a bug in I/O API internals)
- nf_eunlimit = -54: something went wrong with the time dimension in a file; might indicate a bug in I/O API internals
- nf_enorecvars = -55: attempting to time-step a time-independent file; would indicate a bug in I/O API internals
- nf_echar = -56: Attempt to convert between text and numbers (would indicate a bug in I/O API internals)
- nf_eedge = -57: subscript out-of-bounds error (would indicate a bug in I/O API internals)
- nf_estride = -58: illegal stride (won't happen with I/O API)
- nf_ebadname = -59: variable name contains illegal characters
- nf_erange = -60: math result not representible (could not convert from native machine floating-point format to XDR/IEEE floating-point format; should be Cray PVP-only)
- NF_ENOMEM = -61: internal netCDF memory allocation failure
- NF_EVARSIZE = -62: Illegal variable-size: one or more variable sizes violate format constraints (possibly negative or zero)
- NF_EDIMSIZE = -63: Invalid dimension-size (possibly negative or zero)
- NF_ETRUNC = -64: File likely truncated or possibly corrupted
- NCFOOBAR = 32 NetCDF-3: Something is messed up, and netCDF doesn't have an error number for it, or doesn't understand how/why the messup happened
- other errors: should be OS errors, as defined in the system's
/usr/include/sys/errno.h
Back to "Troubleshooting" Contents
- What's this about notCDF?
This is only relevant for users at NCEP, where local politics forbids any copy of libnetcdf.a on their systems. libnotcdf.a is a library that satisfies linker references to libnetcdf.a with "stub" routines that merely report that the user is trying to use NCEP-forbidden netCDF file mode instead of NCEP-required native-binary file mode.
- Why does something with log output, netCDF crashes, netCDF failures, etc. happen with the SGI Version 7.4 compilers?
- Why does the I/O API "hang" inside env*() calls on my Linux box, using the Portland compilers?
Analysis due to Robert Elleman, Dept of Atmospheric Sciences, University of Washington: When programs are compiled with the Portland compilers, without the
-mp
flag (as is the default for mcip) but the I/O API is compiled with this flag (as is the I/O API default), the program will hang (i.e., appear to freeze, consuming all available computational resources but making no evident progress).Solution: either use the
-mp
compile flag for all compiles -- both program and library, or use it for neither.
General principle: Make sure the program compile-flags and the I/O API compile-flags (and the netCDF compile-flags!) are consistent!
- "Why do I have trouble with my
LOGFILE
on my SGI?"There is a problem with SGI f90 Version 7.4 and initialization of
COMMON
blocks. The Fortran language standard specifies thatCOMMON
blocks must be initialized byBLOCK DATA
subprograms, but (since the actual operations of compiling and linking are not covered by the language standard, which considers them "implementation details") does not specify just how to ensure that theBLOCK DATA
subprogram is linked in with the rest of the executable. Usual and customary industry practice is that the use of a statementin either the main program or in other subroutines that are called should ensure thatEXTERNAL FOOBLOCK
BLOCK DATA FOOBLOCK
is linked into the final executable. This does not happen with SGI f90 Version 7.4, even in very simple test cases. Note thatBLOCK DATA INITBLK3
is needed to initialize I/O API internal data structures, including the unit number forLOGFILE
and the number of I/O API files currently open; fortuitously, the latter seems to be initialized to zero (which is correct); the former is not initialized correctly, leading to failures to open and use aLOGFILE
when you try to specify one.Note that this error does not seem to happen with SGI f90 Version 7.3 or earlier. I have submitted this problem to SGI in an error report. Their reply is to suggest the use of non-standard
CALL DATA INITBLK3
, which would need to be done by every internal I/O API routine that references theSTATE3
internal data structures.
--CJC
- "Why do I get messages about unresolved symbols with names like
__mp_getlock
,__mp_unlock
, or something else with_mp
or_kmp
in it?"This probably means that you are using a version of the
libioapi.a
that is enabled for OpenMP parallel usage, but have not activated the system parallel libraries in your model's build procedure. For Intel compilers this means that you need to add-openmp
(for compiler-version 15 or earlier) or-qopenmp
(for compiler-version 16 or later); for GNU compilers,-fopenmp
, and for PGI compilers,-mp
. See the variableOMPLIBS
defined in your machine/compiler's Makeinclude.${BIN}.
- "Why are my program log and my Fortran-style files missing or screwed up? And where did these
fort.<nn>
files come from?"On some systems (notably Sun and SGI), there are incompatibilities in run-time libraries between
f77
andf90
. The upshot is that on these systems, you can link together Fortran-77 and C usingf77
, or Fortran-90 and C usingf90
, but you can't link together Fortran-77 and Fortran-90. The default I/O API distribution for I/O API-3.0 or later is built usingf90
and runs into this problem when your model code is built usingf77
. The solution is to rebuild the model code usingf90
.
- Problems with RedHat 7.0 Linux (thanks to Zion Wang for chasing this down):
RH7 uses quite-nonstandard gcc v2.96 and glibc versions; there are patches available at URL http://www.redhat.com/support/errata/rh7-errata-bugfixes.html
RH7's gcc v2.96 does not work with the standard edition of the Portland Group F90 compiler; there is a version which does work; see URL http://www.pgroup.com/faq.htm: (UPDATE on: RED HAT 7.0 and 3.2 RELEASE COMPILERS!)
- "My program does a
segmentation fault
on theOPEN3
call when I attempt to create a new file!"Probably the file description was not completely filled in. This has been observed, for example, when one of the variable names
VNAME3D(I)
inFDESC3.EXT
was not set correctly. (What actually happens is that theFDESC3.EXT
data structures are initialized to zero by the linker; then the netCDF internals don't handle strings that contain just ASCII zeros correctly).
- "My program wrote the data out but I can't read/
ncdump
/PAVE
it now!"Probably the file wasn't shut correctly. (Unless you've declared a file "volatile" by
setenv <file> <path> -v
, netCDF doesn't update the file header until you callSHUT3()
orM3EXIT()
.)
- "The log says
OPEN3()
could not open the file, and specifies the logical name rather than the physical file name."This usually means one of two things:
- The program is opening the file with mode
FSNEW3
, which means that the file must not exist (and will be created anew byOPEN3()
), but the file actually does exist.
Delete the file and re-run.- The script which ran your program failed to execute correctly the setenv to define the logical name for the file. Try using the env command in the script before you run the program, in order to get started debugging your script, and then check the value of the problem-file's logical name.
- "Why does the linker say
ncabor_
oropen3_
(etc.) is an undefined symbol?"There are four probable causes we've been observing:
- NetCDF-4 Issues:
There are now two separate libraries (with the Fortran and the C parts of netCDF); you now need libraries-flags
-lnetcdff -lnetcdf
(in that specific order).
Netcdf-Fortran-4.4 and later seem to have dropped the olderCALL NC*()
interfaces in favor of the more recentIERR=NF_*()
ones. I/O API Version 3.2 has been tediously recoded to replace the 790-odd older-style calls by the newer ones, so you need to either use that, or use an older netCDF version.
- Link command-line order:
Probably, the command line that links your program has
-lnetcdf
before-lioapi
instead of after. (Most UNIX linkers only try to resolve things in terms of libraries yet to be defined, and don't go backwards. E.g., if you havethe linker won't know where to go to find netCDF functions that are called in the I/O API; instead, if you use!!! INCORRECT !!! f90 -o foo foo.o ... -lnetcdf -lioapi
then the linker will scan "-lnetcdf" to find functions called in "-lioapi"!! CORRECT: f90 -o foo foo.o ... -lioapi -lnetcdf
Another possibility is that you are doing multilingual programming, and using maybe "cc" or "g++" or something else to do the link step. If so, you need to explicitly list the libraries that f90 would include. The list of these is vendor dependent but frequently looks something like
One way to find out is to try to use the Fortrtan comp;iler in verbose mode, e.g., (... -lf90 -lU90 -lm
f90 -v ...
on most UNIX systems) to do the linking: it may not find the needed C++ libraries, but it will tell you what libraries it needed for the Fortran part of the I/O API and you can then modify your original link command to use them.
- Compiler name inconsistencies
Compilers "mangle" the names of Fortran
COMMON
blocks, functions, and subroutines in various ways (usually turn them into lower case, and then prefix or postfix them by one (or, for gcc/g77, sometimes two) underscores. This will be a problem when you use the Intel or Portland Group compilers on Linux systems that come with a system-installedlibnetcdf.a
(which will have been built withgcc/g77
).The precise mangling behavior depends upon the compiler, your system defaults file for the compiler, and the compile/link command lines themselves. (It can also happen that netCDF was built without the expected Fortran or C++ support thay your model was expecting. A useful UNIX utility for diagnosing these problems is nm, which reports what linker visible symbols are present in binary executable, object (
.o
), or library (.so
and.a
) files. So if you see a linker error message likethen do the following sorts of things:symbol foo_ not found (referenced in bar.o)
to try to find which program-component has the differently-mangled symbol that the linker needs. Then go back and review the compiler flags used in the build-process for that component.nm foo.o | grep -i foo
nm libnetcdf.a | grep -i foo
nm libioapi.a | grep -i foo
etc, and maybe
man -k foo
I/O API Version 2.2 and later have a scriptnm_test.csh
to help you with this: runnm_test.csh <obj-file> <lib-file> <symbol>
- Bad compiler installation/configuration
Sometimes you'll find that the missing symbol was in a system routine that the compiler should have known about but somehow (maybe bad compiler-installation) didn't. That one happened to me earlier this week (as I write this May 3, 2002) on an HP system.
- "PAVE reports bad values -- -9.xxE37 or something!"
This is a PAVE bug, not an I/O API bug: the original person who wrote the file-reader for PAVE couldn't be bothered to use the I/O API, but instead used raw netCDF reads without proper data-structure and error checking. NetCDF fills in "holes" in its files with a particular fill-value that you are seeing, and this is an indication that the data for that variable and time step was never written to the file. This happens, for example, at the starting time for an MM5/MCPL run, for some of the variables which aren't calculated until after the run is in progress.
This is fixed in Pave Version 2 and later.
- "I get an error message that looks something like"
>>> WARNING in subroutine CHKFIL3 <<< Inconsistent file attribute NVARS for file FOO Value from file: 6 Value from caller: 9This means that
- File
FOO
already exists- You are trying to open it as "unknown" (
FSTATUS=FSUNKN3
) in the call toOPEN3
- The file description from within file
FOO
's header does not match the file description you have supplied in theFDESC3
COMMONs.For the I/O API, you can't change a file's definition once it has already been created. What you probably want to do is to delete the existing file (or move it somewhere else), and re-run your program--this time creating a new file according to the description you supply.
- "I get a compiler warning message that looks something like"
PGF90-W-0006-Input file empty (<somewhere>/ioapi/ddtvar3v.F) PGF90/any Linux/x86 5.2-4: compilation completed with warningsThere are three worker routines that are empty after preprocessing for the non-coupling-mode compiles. Some compilers treat the attempt to compile an empty file as a problem situation... It isn't.
Send comments to
Carlie J. Coats, Jr.
carlie@jyarborough.com