The agenda has not yet been finalized and the list of topics is subject to change.
Tuesday, October 26
Tutorials
Wednesday, October 27
Plenary Session
Presentation Group 1
Presentation Group 2
Thursday, October 28
Presentation Group 3
Tutorials
This Tutorial gives a brief introduction to the HDF4 and HDF5 data formats and corresponding data models. It provides a short overview of other popular scientific data formats and data models such as NetCDF, FITS, GRIB and HDF-EOS and compare them with HDF.
Introduction to HDF5: Data Model and Programming Model
NCSA
This Tutorial gives a brief introduction to HDF5 for people who have never used it. It covers the HDF5 Data Model including HDF5 objects and their properties. It also briefly describes the HDF5 Programming Model and prepares participants for further self-study of HDF5 and hands-on sessions.
Introduction to the HDF5 Lite and High Level Interfaces
NCSA
The HDF5 High Level APIs consist of a set of functions built on top of the basic HDF5 library. The purpose is either to define functions which do more operations per call than the basic HDF5 interface or to build a set of functions for added standard object definitions (like images or tables). Topics to be presented include:
In this Tutorial we will discuss different storage methods for the HDF5 files (split files, family of files, multi-files), and datasets (compressed, external, compact), and related filters and properties. This tutorial will introduce advanced features of HDF5, including:
The IDL HDF5 interface consists of a dynamically loadable module that provides a set of IDL software routines to directly access the NCSA HDF5 C library. In addition, the IDL HDF5 module has the ability to create entire HDF5 files from IDL structures with minimal HDF5 knowledge. This tutorial will demonstrate the use of IDL's HDF5 routines.
Introduction to HDF NPOESS Products With Example Code
IPO, NPOESS
Abstract not available.
HDF Tutorials Part 2: Hands-on (self-paced)
NCSA
Web-based self-paced instructional materials on all the topics covered in the morning. Participants can focus on topics of interest. Instructors will be available to answer questions and help.
Making HDF5 Software Work for You
NCSA
Complexity and flexibility of HDF5 may be easily misused causing headaches for applications developers and frustration for the HDF5 application users. In this talk, we will try to give guidelines how to use HDF5 efficiently and avoid common mistakes.
Wednesday, October 27
Plenary Session
HDF in EOS - Overview and Status
NASA ESDIS
HDF is the format of NASA EOS standard products. Since launch of the TERRA mission in 1999, the EOSDIS as accumulated over 3 petabytes of data and derived products containing geophysical parameters with an accumulation rate of over 3.5 terabytes per day. The vast majority of these products are stored natively using the Hierarchical Data Format (HDF) This presentation will give an overview of the range of EOS data products and the systems available to facilitate access by science and application users.
Overview and Status of HDF in NPOESS & NPP
IPO, NPOESS
The NPOESS Integrated Program Office and its principal contractors -- Raytheon and Northrop-Grumman -- are prepraring for the launch of the NPP risk-reduction mission in 2006. This presentation will review program status, and how HDF will be used to deliver NPOESSproducts at the domestic weather centrals and worldwide field terminals.
HDF Status and Development
NCSA
Update on HDF, including recent changes to the software, upcoming releases, collaborations, future plans.
HDF and HDF-EOS: Implications for Long-Term Archiving and Data Access
NSIDC
The Hierarchical Data Format (HDF) has been in existence since the 1980's. Over the years, HDF has been chosen as the format of choice for a number of large programs. One of these programs, NASA's Earth Observing System (EOS), chose to further specialize the format, creating what is now known as HDF-EOS. In addition to these large, typically government programs, a number of commercial data visualization, manipulation, and analysis packages support the format, indicating a level of acceptance by data users.
While not originally intended as a format for long-term data preservation, NASA's EOS program uses HDF not only as a data distribution, access and manipulation format but also as the active archive format for the data. Furthermore, the NPOESS progam is currently considering HDF/HDF-EOS as their archive format. However, formats designed to facilitate data access and analysis are not necessarily well suited to long-term data preservation. This presentation will discuss some of the considerations driving the choice of an archive format, assess the HDF and HDF-EOS formats in terms of these considerations, and discuss implications of the choice of HDF/HDF-EOS as an archive format for archives responsible for preserving data and providing data access into the distant future.
Presentation Group 1
Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS concepts
Raytheon, NPOESS
This poster (or slides) describes the main concepts which relate data format (HDF5), metadata, and data organization, as they are being applied to NPOESS. It includes the motivation for selecting HDF5; the motivation and general implementation of FGDC base metadata and remote sensing extensions; and the current attempt to use best experiences with HDF-EOS swath and netCDF Climate & Forecast conventions to guide NPOESS product definition.
HDF-EOS to netCDF Converter
ESDIS DU
Last year at this conference, we announced the availability of a tool for converting HDF-EOS data to netCDF. The tool takes HDF-EOS 5 data as input, and generates COARDS-compatible output - if the input file has enough metadata to be COARDS-compliant, the output file will be COARDS-compliant. The tool is written in portable C, and ought to run on any platform where the HDF-EOS and netCDF libraries are available.
This year, we have made two major enhancements to the converter:
The HDF-EOS Support Group has developed crosswalks (mappings) for Earth Science Metadata and standards. In doing so, several themes emerged that are applicable to any type of metadata. Some themes are specific within a set of metadata and others are associated with the mapping of one set to another or to a standard. This presentation will discuss these themes and possible methods to address them.
The Intel® Array Visualizer (http://www.intel.com/software/products/compilers/fwin/array_vis.htm) is a software tool for loading, viewing, and saving array data. Supported file formats include HDF4, HDF5, netCDF, and FITS. Support for additional formats can be provided by the user through the use of a plug-in architecture.
The Visualizer includes a viewing program as well as libraries that provide a file format independent API for accessing and displaying data. In addition to programs written in C and Fortran, scripts can be written in JavaScript that work with the Visualizer object model. Scripts can also be embedded within a data file, enabling customized behaviors when viewed.
This presentation will provide a brief overview of the product covering areas of interest to HDF and EOS developers.
HDFexplorer
Space Research
HDF Explorer is a leading data visualization program that reads the HDF and HDF5 file formats. In the workshop, version 1.3 will be introduced, newer features being the reading and display of netCDF files, and the display of more metadata information.
Modular HDFView With an HDF-EOS Plug-in
NCSA
Modular HDFView is an improved HDFView with replaceable I/O and GUI modules. It consists of several interfaces that enable users to write and use alternative implementations of I/O and GUI components to replace default modules. The HDF-EOS plugin will be used as examples.
New HDF Utilities: diff, repack, jam
NCSA
Several new utility programs have been added to the HDF4 and HDF5 software distributions, and older tools have been enhanced.
The new tools for HDF4 include:
As more and more people start using HDF products on windows, HDF windows support becomes more and more important. However since HDF products are mainly developed on Unix/Linux platforms, the significant difference between windows and Unix/Linux makes the HDF Windows support harder. In this poster, we will present HDF group efforts to provide HDF5 support for windows users. The contents include efforts to make building and testing HDF5 more convenient for users; to strength documentation support including windows support website and re-arrangements of installation documents, to fix bugs timely and to investigate cygwin supports on HDF5.
Presentation Group 2
The HDF5 WRF-IO Module
NCSA
The Weather Research Forecast (WRF) model developed at NCAR is the community weather model. WRF is designed in such a way that it has standard configuration and IO common APIs to enable external IO packages to be easily added to. An application can select which IO module to use. The current WRF software package supports a netCDF WRF IO module.
This talk presents a new HDF5 WRF IO modules, The Parallel HDF5-WRF uses the parallel HDF5 library, which is implemented with MPI IO. Performance studies show that this module can improve performance for computations with large IO requirements. The sequential HDF-WRF module can support in memory compression that can save disk space.
Netcdf-Java Common Data Model
UCAR/Unidata
NetCDF-Java version 2.2 provides an Application Programmer Interface (API) for a scientific data model called the */Common Data Model (CDM)/*. The CDM is the result of merging the NetCDF (version 3), OpenDAP (version 2), and the HDF5
A Pathfinding Project of HDF-EOS2 to HDF-EOS5 Transition for MODIS Surface Reflectance Product HDF-EOS5 has many advantages over HDF-EOS2. Future Earth observation data products (such as Aura, NPOESS) are in HDF-EOS5 or HDF5. There is a need to find a path (or obstacles) to transition EOS products from HDF-EOS2 to HDF-EOS5. The goal is to preserve EOS data products in the long run.
MODIS 8-day level 3 global 500 m land surface reflectance product (MOD09A1) is selected to conduct this pathfinder project of HDF-EOS2 to HDF-EOS5 transition. Two methodologies are used for this study. The first approach is to directly modify HDF-EOS2 Product Generation Executable (PGE21) codes to produce HDF-EOS5 products. The second is to develop standalone converter to change HDF-EOS2 data to HDF-EOS5 data. We analyzed PGE21 data types, objects, attributes, metadata and structures and found mapping relationships between HDF-EOS2 and HDF-EOS5. While keeping the algorithm and data values the same, HDF-EOS2 codes and data types are changed to HDF-EOS5. Finally, we added an internal compression feature in PGE21 to compress HDF-EOS5 file. The standalone converter is based on heconvert program code and we added metadata and compression features.MODIS 8-day level 3 global 500 m land surface reflectance product (MOD09A1) is selected to conduct this pathfinder project of HDF-EOS2 to HDF-EOS5 transition. Two methodologies are used for this study. The first approach is to directly modify HDF-EOS2 Product Generation Executable (PGE21) codes to produce HDF-EOS5 products. The second is to develop standalone converter to change HDF-EOS2 data to HDF-EOS5 data. We analyzed PGE21 data types, objects, attributes, metadata and structures and found mapping relationships between HDF-EOS2 and HDF-EOS5. While keeping the algorithm and data values the same, HDF-EOS2 codes and data types are changed to HDF-EOS5. Finally, we added an internal compression feature in PGE21 to compress HDF-EOS5 file. The standalone converter is based on heconvert program code and we added metadata and compression features.
We evaluated the HDF-EOS5 data output from PGE21 and compared PGE21 performance for HDF-EOS2 and HDF-ESO5. Detailed results are presented in the poster.
Using HDF5 to Store Geospatial Records (NARA project) The merits of HDF5 and HDF-EOS are considered for storing a variety of types of geospatial data. This work involves mapping many different geospatial data formats to HDF5 and/or HDF-EOS, converting sample files, developing visualization tools for examining these files, and performance analyses. Supported by the National Archives and Records Administration and the Illinois State Geological Survey, this use of HDF5 and HDF-EOS is seen to have potential value for the long-term preservation of geospatial data, as well as providing efficient storage and access in active geospatial data repositories.
Using HDF5 to Store Engineering Test Data (Boeing project) Collecting and storing data from test systems and platforms has historically been reduced to unique in-house implementations. Man-months to man-years have been expended to create and develop these site and system specific storage file formats.
NCSA's HDF5 data management system has the functionality and performance needed to capture, store and retrieve flight test data -- it is comprehensive, scalable, flexible, and fast. Using HDF5 as the initial container for test data and replacing all the subsequent intermediate formats will yield cost savings for both non-recurring and recurring items associated with test data collection and processing. Ultimately, this work can lead to standardized formats, software, and tools benefiting a wide range of test systems in many industries.
The focus in the first year of this project will be on variable length array storage of test data in HDF5.
UML Representation of NPOESS Data Products in HDF5 NPOESS is a system of polar orbiting weather satellites and ground equipment used for the collection, analysis and distribution of weather data to government and civilian users. The NPOESS Preparatory Program (NPP) will be used as a bridge between the existing Earth Observing System (EOS) program and the NPOESS Program. The NPP will provide an opportunity to utilize new instruments, algorithms and data delivery packages prior toto utilize new instruments, algorithms and data delivery packages prior to NPOESS. The NPP will be utilizing the upcoming JTA standard of Hierarchical Data Format, version 5 (HDF5) for packaging the data. HDF5 was chosen for use on NPOESS because it has the capability to operate well in high performance, data intensive environments. HDF5 can store data in a variety of ways. NPOESS has chosen to standardize their organization of the HDF5 files so data can be easily and consistently accessed and shared amongst the community. The organization of the HDF5 files is described using Unified Modeling Language (UML), a standard modeling language used to design structured or object-oriented software applications.
HDF-EOS Maintenance and Development We summarize current development and maintenence status of HDF-EOS and associated tools. A new browser, an HDF-EOS plugin for the NCSA-developed tool, HDFView, is now available. The plugin offers browse capability for both HDF4 and HDF5 - based files. HDFView can also process vanilla HDF4 and HDF5 files. Functions including shuffling and szip compression have been added.
The HDF-EOS to GeoTIFF (HEG) conversion tool has been augmented to include new projections, other analysis features and support for additional MODIS and ASTER products. Subsetting features have also been augmented. A port to Mac OS X version will be available in October. The tool is available in both stand-alone and EOS DAAC online versions.
Profile of NPOESS HDF5 Files The NPOESS program uses Unified Modeling Language (UML) to describe the format of the HDF5 files produced. For each unique type of data product, the HDF5 storage organization and the means to retrieve the data is the same. This provides a consistent data retrieval interface for manual and automated users of the data, without which would require custom development and cumbersome maintenance. The data formats are described using UML to provide a profile of HDF5 files.
This poster will show each unique data type so far produced by NPOESS, and the contents of the files. We will also have overhead snapshots of the data contents.
Thursday, October 28 During the first year of the project "Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability" we have made progress towards the ultimate goals to create a new data access library combining the desirable characteristics of netCDF-3 and HDF5, make netCDF more suitable for high-performance computing, provide a new simple, high-level interface for HDF5, and demonstrate benefits of the combination for advanced Earth science modeling. We have successfully integrated a netCDF interface with the HDF5 storage layer. We are experimenting with full-scale data sets in the prototype. The project is on schedule for initial release of netCDF-4 in the summer of 2005, with earlier test releases planned.
Survey of Data Format Tools Background: The National Polar-orbiting Operational Environmental Satellite System (NPOESS) will deliver data to four U.S. Government processing centers in Hierarchical Data Format version 5 (HDF5). The National Oceanic and Atmospheric Administration (NOAA) will tailor the NPOESS data by subsetting and translating the HDF5 dataset into data formats such as BUFR, GeoTIFF, NetCDF and SDTS to meet the diverse requirements of its customers.
Presentation: We will present a survey and feature comparison of software tools currently available to evaluate, translate and extract data from datasets in HDF and other standard data formats. Our presentation will also identify opportunities to extend the functionality of existing toolsets to better support the NPOESS HDF5 dataset.
A Web Browser Plug-in for HDF The main goal of the HDF browser plug-in is click-and-view HDF files remotely and locally from popular web browsers. Different options of how to implement the plug-in will be presented for comments. Depending on the progress, a demo version may be available at the workshop.
Spatial Types: Looking Ahead to Spatial Search It is not enough to collect the data and produce data products. In order to be useful the data has to be used. To facilitate data use eventually a search interface has to be developed, probably many. And those interfaces can only be as good as the metadata they have to work with. HDF is not so much a data format as a file format that packages the data with the metadata, so facilitating data access by providing adequate and appropriate metadata starts with data production.
In many areas this is not much of a challenge. The temporal coverage of the granules is generally well known, channel and derived parameter names are generally just a matter of convention, etc. But spatial coverage can vary quite a bit, especially for remotely sensed data, and can often be problematic. This paper goes through the five most common spatial types (point, grid, tile, scene, and swath) discusses the problems associated with each, and makes some recommendations for the metadata that needs to be included with the data to facilitate fast, efficient, accurate search when the time comes.
HDF-EOS Subsetting Activity at UAH The presentation will provide an overview of subsetting software development activity at UAH. Updates have been made to all packages, reflecting the latest versions of HDF5 and HE5. The library of tools (HSE) for subsetting HDF-EOS data is up-to-date for SGI, Sun, and Linux platforms. Subsetting software is operational at NSIDC DAAC and GDAAC, in testing at LPDAAC. Ongoing work and plans will also be described, including row/column subsetting and index subsampling.
HDF-EOS Subsetting activities at GES DAAC The Goddard Earth Sciences (GES) DAAC has an extensive archive of atmospheric, ocean, and radiance data products (MODIS, AIRS, Aura, etc.). Many of these are stored in the HDF-EOS or HDF file formats, in either version 4 or the new version 5. Many of these data products are too large to use in their full format, and thus it becomes ecessary to subset only that information which is truly useful to the user. For high spatial resolution data, this involves subsetting the data by latitude and longitude. For hyper- or multi-spectral data, it involves subsetting the data by channel or band. Some data files contain many different measured parameters, and thus subsetting by geophysical parameter becomes important. There are other methods of reducing the data volume, such as only retrieving those regions that are cloud free, for example.
Some data products are on-line (anonymous ftp data pool) and thus require subsetting on-the-fly, while other data are backed up in the near-line archive and must be subsetted on-demand. A third option involves pre-selected subsets (such as a geographic region, or set of popular parameters). Of course every data product ends up having some unique properties that makes generic subsetting non-trivial. The GES DAAC has experience with all of these cases and is in the process of tailoring many of its data products to suit users needs. In this presentation we provide an overview of the various ongoing subsetting activities at the GES DAAC.
HDF/HDF-EOS Tools and Services at GES DAAC Abstract not available.
An HDF-EOS Data Server Based on OpenDAP and ECHO We describe a data server for publishing HDF-EOS datasets to the web. This system makes HDF-EOS datasets:
Web Interface for Searching, Subsetting, Stitching, Regridding, Resampling, and Reformating Data (WISRD) was designed to allow users to search for swath, scene, and gridded data sets by collection, parameter, date, and region of interest. Requested data granules are optionally stitched together and then gridded or regridded to a common user-definable grid covering the user's region of interest prior to delivery. Additionally output data can be stitched together to create daily grids, and multiple output formats are supported. We will be demonstrating a prototype version of WISRD
Tutorials The HDF group has experienced remarkable successes, producing high quality open source software that is widely used throughout the world. This talk presents a collection of thoughts on the HDF Project's approach to software engineering. In writing these notes we have come to realize that any success the group has had are due to several factors, including: HDF5 is designed to work well on high performance parallel systems and clusters. This tutorial will review the high performance features of HDF5, including:
Currently we are supporting NetCDF, HDF5, OpenDAP, Grib1 and Grib2. Other formats and access protocols, including McIdas ADDE, are being considered. This is work in progress, but an alpha release of the software and documention will be available by October.
ESDIS DU
NCSA
NCSA
Raytheon, NPOESS Program
L-3/HDF-EOS Developers
Raytheon, NPOESS
Presentation Group 3
Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability
UCAR Unidata Program
NOAA/NESDIS
NCSA
NSIDC
UAH
GES DAAC
GES DAAC
ESDIS DU
Web Interface for Searching, Subsetting, Stitching, Regridding, Resampling, and Reformating Data (WISRD)
NSIDC
HDF Software Process--Lessons Learned
NCSA
These factors are little more than platitudes, however. The manner in which they are successfully applied can only be understood by examining the details. In this talk, we describe some of the details, emphasizing mostly those areas in which we have had success.
It is desired that participants are familiar with MPI and MPI I/0 and have a basic knowledge of sequential HDF5 Library. The lecture will prepare them for the Parallel I/O hands-on session.