IODA
05b-ObsGroupAppend.cpp
Go to the documentation of this file.
1 /*
2  * (C) Copyright 2020-2021 UCAR
3  *
4  * This software is licensed under the terms of the Apache Licence Version 2.0
5  * which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
6  */
7 /*! \addtogroup ioda_cxx_ex
8  *
9  * @{
10  *
11  * \defgroup ioda_cxx_ex_5b Ex 5b: Appending to ObsGroups
12  * \brief Appending to ObsGroups
13  * \see 05b-ObsGroupAppend.cpp for comments and the walkthrough.
14  *
15  * @{
16  *
17  * \file 05b-ObsGroupAppend.cpp
18  * \brief ObsGroup
19  *
20  * The ObsGroup class is derived from the Group class and provides some help in
21  * organizing your groups, variables, attributes and dimension scales into a cohesive
22  * structure intended to house observation data. In this case "structure" refers to the
23  * hierarchical layout of the groups and the proper management of dimension scales
24  * associated with the variables.
25  *
26  * The ObsGroup and underlying layout policies (internal to ioda-engines) present a stable
27  * logical group hierarchical layout to the client while keeping the actual layout implemented
28  * in the backend open to change. The logical "frontend" layout appears to the client to be
29  * as shown below:
30  *
31  * layout notes
32  *
33  * / top-level group
34  * nlocs dimension scales (variables, coordinate values)
35  * nchans
36  * ...
37  * ObsValue/ group: observational measurement values
38  * brightness_temperature variable: Tb, 2D, nlocs X nchans
39  * air_temperature variable: T, 1D, nlocs
40  * ...
41  * ObsError/ group: observational error estimates
42  * brightness_temperature
43  * air_temperature
44  * ...
45  * PreQC/ group: observational QC marks from data provider
46  * brightness_temperature
47  * air_temperature
48  * ...
49  * MetaData/ group: meta data associated with locations
50  * latitude
51  * longitude
52  * datetime
53  * ...
54  * ...
55  *
56  * It is intended to keep this layout stable so that the client interface remains stable.
57  * The actual layout used in the various backends can optionally be organized differently
58  * according to their needs.
59  *
60  * The ObsGroup class also assists with the management of dimension scales. For example, if
61  * a dimension is resized, the ObsGroup::resize function will resize the dimension scale
62  * along with all variables that use that dimension scale.
63  *
64  * The basic ideas is to dimension observation data with nlocs as the first dimension, and
65  * allow nlocs to be resizable so that it's possible to incrementally append data along
66  * the nlocs (1st) dimension. For data that have rank > 1, the second through nth dimensions
67  * are of fixed size. For example, brightness_temperature can be store as 2D data with
68  * dimensions (nlocs, nchans).
69  *
70  * \author Stephen Herbener (stephenh@ucar.edu), Ryan Honeyager (honeyage@ucar.edu)
71  **/
72 
73 #include <array> // Arrays are fixed-length vectors.
74 #include <iomanip> // std::setw
75 #include <iostream> // We want I/O.
76 #include <numeric> // std::iota
77 #include <string> // We want strings
78 #include <valarray> // Like a vector, but can also do basic element-wise math.
79 #include <vector> // We want vectors
80 
81 #include "Eigen/Dense" // Eigen Arrays and Matrices
82 #include "ioda/Engines/Factory.h" // Used to kickstart the Group engine.
83 #include "ioda/Exception.h" // Exceptions and debugging
84 #include "ioda/Group.h" // Groups have attributes.
85 #include "ioda/ObsGroup.h"
86 #include "unsupported/Eigen/CXX11/Tensor" // Eigen Tensors
87 
88 int main(int argc, char** argv) {
89  using namespace ioda; // All of the ioda functions are here.
90  using std::cerr;
91  using std::endl;
92  using std::string;
93  using std::vector;
94  try {
95  // It's possible to transfer data in smaller pieces so you can, for example, avoid
96  // reading the whole input file into memory. Transferring by pieces can also be useful
97  // when you don't know a priori how many locations are going to be read in. To accomplish
98  // this, you set the maximum size of the nlocs dimension to Unlimited and use the
99  // ObsSpace::generate function to allocate more space at the end of each variable for
100  // the incoming section.
101  //
102  // For this example, we'll use the same data as in example 05a, but transfer it to
103  // the backend in four pieces, 10 locations at a time.
104 
105  // Create the backend. For this code we are using a factory function,
106  // constructFromCmdLine, made for testing purposes, which allows one to specify
107  // a backend from the command line using the "--ioda-engine-options" option.
108  //
109  // There exists another factory function, constructBackend, which allows you
110  // to create a backend without requiring the command line option. The signature
111  // for this function is:
112  //
113  // constructBackend(BackendNames, BackendCreationParameters &);
114  //
115  //
116  // BackendNames is an enum type with values:
117  // Hdf5File - file backend using HDF5 file
118  // ObsStore - in-memory backend
119  //
120  // BackendCreationParameters is a C++ structure with members:
121  // fileName - string, used for file backend
122  //
123  // actions - enum BackendFileActions type:
124  // Create - create a new file
125  // Open - open an existing file
126  //
127  // createMode - enum BackendCreateModes type:
128  // Truncate_If_Exists - overwrite existing file
129  // Fail_If_Exists - throw exception if file exists
130  //
131  // openMode - enum BackendOpenModes types:
132  // Read_Only - open in read only mode
133  // Read_Write - open in modify mode
134  //
135  // Here are some code examples:
136  //
137  // Create backend using an hdf5 file for writing:
138  //
139  // Engines::BackendNames backendName;
140  // backendName = Engines::BackendNames::Hdf5File;
141  //
142  // Engines::BackendCreationParameters backendParams;
143  // backendParams.fileName = fileName;
144  // backendParams.action = Engines::BackendFileActions::Create;
145  // backendParams.createMode = Engines::BackendCreateModes::Truncate_If_Exists;
146  //
147  // Group g = constructBackend(backendName, backendParams);
148  //
149  // Create backend using an hdf5 file for reading:
150  //
151  // Engines::BackendNames backendName;
152  // backendName = Engines::BackendNames::Hdf5File;
153  //
154  // Engines::BackendCreationParameters backendParams;
155  // backendParams.fileName = fileName;
156  // backendParams.action = Engines::BackendFileActions::Open;
157  // backendParams.openMode = Engines::BackendOpenModes::Read_Only;
158  //
159  // Group g = constructBackend(backendName, backendParams);
160  //
161  // Create an in-memory backend:
162  //
163  // Engines::BackendNames backendName;
164  // backendName = Engines::BackendNames::ObsStore;
165  //
166  // Engines::BackendCreationParameters backendParams;
167  //
168  // Group g = constructBackend(backendName, backendParams);
169  //
170 
171  // Create the backend using the command line construct function
172  Group g = Engines::constructFromCmdLine(argc, argv, "Example-05b.hdf5");
173 
174  // Create an ObsGroup object using the ObsGroup::generate function. This function
175  // takes a Group arguemnt (the backend we just created above) and a vector of dimension
176  // creation specs.
177  const int numLocs = 40;
178  const int numChans = 30;
179  const int sectionSize = 10; // experiment with different sectionSize values
180 
181  // The NewDimensionsScales_t is a vector, that holds specs for one dimension scale
182  // per element. An individual dimension scale spec is held in a NewDimensionsScale
183  // object, whose constructor arguments are:
184  // 1st - dimension scale name
185  // 2nd - size of dimension. May be zero.
186  // 3rd - maximum size of dimension
187  // resizeable dimensions are said to have "unlimited" size, so there
188  // is a built-in variable ("Unlimited") that can be used to denote
189  // unlimited size. If Unspecified (the default), we assume that the
190  // maximum size is the same as the initial size (the previous parameter).
191  // 4th - suggested chunk size for dimension (and associated variables).
192  // This defaults to the initial size. This parameter must be nonzero. If
193  // the initial size is zero, it must be explicitly specified.
194  //
195  // For transferring data in pieces, make sure that nlocs maximum dimension size is
196  // set to Unlimited. We'll set the initial size of nlocs to the sectionSize (10).
198  NewDimensionScale<int>("nlocs", sectionSize, Unlimited),
199  NewDimensionScale<int>("nchans", numChans)
200  };
201 
202  // Construct an ObsGroup object, with 2 dimensions nlocs, nchans, and attach
203  // the backend we constructed above. Under the hood, the ObsGroup::generate function
204  // initializes the dimension coordinate values to index numbering 1..n. This can be
205  // overwritten with other coordinate values if desired.
207 
208  // We now have the top-level group containing the two dimension scales. We need
209  // Variable objects for these dimension scales later on for creating variables so
210  // build those now.
211  ioda::Variable nlocsVar = og.vars["nlocs"];
212  ioda::Variable nchansVar = og.vars["nchans"];
213 
214  // Next let's create the variables. The variable names should be specified using the
215  // hierarchy as described above. For example, the variable brightness_temperature
216  // in the group ObsValue is specified in a string as "ObsValue/brightness_temperature".
217  string tbName = "ObsValue/brightness_temperature";
218  string latName = "MetaData/latitude";
219  string lonName = "MetaData/longitude";
220 
221  // Set up the creation parameters for the variables. All three variables in this case
222  // are float types, so they can share the same creation parameters object.
224  float_params.chunk = true; // allow chunking
225  float_params.compressWithGZIP(); // compress using gzip
226  float_params.setFillValue<float>(-999); // set the fill value to -999
227 
228  // Create the variables. Note the use of the createWithScales function. This should
229  // always be used when working with an ObsGroup object.
230  Variable tbVar = og.vars.createWithScales<float>(tbName, {nlocsVar, nchansVar}, float_params);
231  Variable latVar = og.vars.createWithScales<float>(latName, {nlocsVar}, float_params);
232  Variable lonVar = og.vars.createWithScales<float>(lonName, {nlocsVar}, float_params);
233 
234  // Add attributes to variables. In this example, we are adding enough attribute
235  // information to allow Panoply to be able to plot the ObsValue/brightness_temperature
236  // variable. Note the "coordinates" attribute on tbVar. It is sufficient to just
237  // give the variable names (without the group structure) to Panoply (which apparently
238  // searches the entire group structure for these names). If you want to follow this
239  // example in your code, just give the variable names without the group prefixes
240  // to insulate your code from any subsequent group structure changes that might occur.
241  tbVar.atts.add<std::string>("coordinates", {"longitude latitude nchans"}, {1})
242  .add<std::string>("long_name", {"ficticious brightness temperature"}, {1})
243  .add<std::string>("units", {"K"}, {1})
244  .add<float>("valid_range", {100.0, 400.0}, {2});
245  latVar.atts.add<std::string>("long_name", {"latitude"}, {1})
246  .add<std::string>("units", {"degrees_north"}, {1})
247  .add<float>("valid_range", {-90.0, 90.0}, {2});
248  lonVar.atts.add<std::string>("long_name", {"longitude"}, {1})
249  .add<std::string>("units", {"degrees_east"}, {1})
250  .add<float>("valid_range", {-360.0, 360.0}, {2});
251 
252  // Let's create some data for this example.
253  Eigen::ArrayXXf tbData(numLocs, numChans);
254  std::vector<float> lonData(numLocs);
255  std::vector<float> latData(numLocs);
256  float midLoc = static_cast<float>(numLocs) / 2.0f;
257  float midChan = static_cast<float>(numChans) / 2.0f;
258  for (std::size_t i = 0; i < numLocs; ++i) {
259  lonData[i] = static_cast<float>(i % 8) * 3.0f;
260  // We use static code analysis tools to check for potential bugs
261  // in our source code. On the next line, the clang-tidy tool warns about
262  // our use of integer division before casting to a float. Since there is
263  // no actual bug, we indicate this with NOLINT.
264  latData[i] = static_cast<float>(i / 8) * 3.0f; // NOLINT(bugprone-integer-division)
265  for (std::size_t j = 0; j < numChans; ++j) {
266  float del_i = static_cast<float>(i) - midLoc;
267  float del_j = static_cast<float>(j) - midChan;
268  tbData(i, j) = 250.0f + sqrt(del_i * del_i + del_j * del_j);
269  }
270  }
271 
272  // Transfer the data piece by piece. In this case we are moving consecutive,
273  // contiguous pieces from the source to the backend.
274  //
275  // Things to consider:
276  // If numLocs/sectionSize has a remainder, then the final section needs to be
277  // smaller to match up.
278  //
279  // The new size for resizing the variables needs to be the current size
280  // plus the count for this section.
281  std::size_t numLocsTransferred = 0;
282  std::size_t isection = 1;
283  int fwidth = 10;
284  std::cout << "Transferring data in sections to backend:" << std::endl << std::endl;
285  std::cout << std::setw(fwidth) << "Section" << std::setw(fwidth) << "Start" << std::setw(fwidth)
286  << "Count" << std::setw(fwidth) << "Resize" << std::endl;
287  while (numLocsTransferred < numLocs) {
288  // Figure out the starting point and size (count) for the current piece.
289  std::size_t sectionStart = numLocsTransferred;
290  std::size_t sectionCount = sectionSize;
291  if ((sectionStart + sectionCount) > numLocs) {
292  sectionCount = numLocs - sectionStart;
293  }
294 
295  // Figure out the new size for the nlocs dimension
296  Dimensions nlocsDims = nlocsVar.getDimensions();
297  Dimensions_t nlocsNewSize
298  = (isection == 1) ? sectionCount : nlocsDims.dimsCur[0] + sectionCount;
299 
300  // Print out stats so you can see what's going on
301  std::cout << std::setw(fwidth) << isection << std::setw(fwidth) << sectionStart
302  << std::setw(fwidth) << sectionCount << std::setw(fwidth) << nlocsNewSize
303  << std::endl;
304 
305  // Resize the nlocs dimension
306  og.resize({std::pair<Variable, Dimensions_t>(nlocsVar, nlocsNewSize)});
307 
308  // Create selection objects for transferring the data
309  // We'll use the HDF5 hyperslab style of selection which denotes a start index
310  // and count for each dimension. The start and count values need to be vectors
311  // where the ith entry corresponds to the ith dimension of the variable. Latitue
312  // and longitude are 1D and Tb is 2D. Start with 1D starts and counts denoting
313  // the sections to transfer for lat and lon, then add the start and count values
314  // for channels and transfer Tb.
315 
316  // starts and counts for lat and lon
317  std::vector<Dimensions_t> starts(1, sectionStart);
318  std::vector<Dimensions_t> counts(1, sectionCount);
319 
320  Selection feSelect;
321  feSelect.extent({nlocsNewSize}).select({SelectionOperator::SET, starts, counts});
322  Selection beSelect;
323  beSelect.select({SelectionOperator::SET, starts, counts});
324 
325  latVar.write<float>(latData, feSelect, beSelect);
326  lonVar.write<float>(lonData, feSelect, beSelect);
327 
328  // Add the start and count values for the channels dimension. We will select
329  // all channels, so start is zero, and count is numChans
330  starts.push_back(0);
331  counts.push_back(numChans);
332 
333  Selection feSelect2D;
334  feSelect2D.extent({nlocsNewSize, numChans}).select({SelectionOperator::SET, starts, counts});
335  Selection beSelect2D;
336  beSelect2D.select({SelectionOperator::SET, starts, counts});
337 
338  tbVar.writeWithEigenRegular(tbData, feSelect2D, beSelect2D);
339 
340  numLocsTransferred += sectionCount;
341  isection++;
342  }
343 
344  // The ObsGroup::generate program has, under the hood, automatically assigned
345  // the coordinate values for nlocs and nchans dimension scale variables. The
346  // auto-assignment uses the values 1..n upon creation. Since we resized nlocs,
347  // the coordinates at this point will be set to 1..sectionSize followed by all
348  // zeros to the end of the variable. This can be addressed two ways:
349  //
350  // 1. In the above loop, add a write to the nlocs variable with the corresponding
351  // coordinate values for each section.
352  // 2. In the case where you simply want 1..n as the coordinate values, wait
353  // until transferring all the sections of variable data, check the size
354  // of the nlocs variable, and write the entire 1..n values to the variable.
355  //
356  // We'll do option 2 here
357  int nlocsSize = gsl::narrow<int>(nlocsVar.getDimensions().dimsCur[0]);
358  std::vector<int> nlocsVals(nlocsSize);
359  std::iota(nlocsVals.begin(), nlocsVals.end(), 1);
360  nlocsVar.write(nlocsVals);
361 
362  // Done!
363  } catch (const std::exception& e) {
365  return 1;
366  }
367  return 0;
368 }
IODA's error system.
Definitions for setting up backends with file and memory I/O.
Interfaces for ioda::Group and related classes.
Interfaces for ioda::ObsGroup and related classes.
Groups are a new implementation of ObsSpaces.
Definition: Group.h:159
An ObsGroup is a specialization of a ioda::Group. It provides convenience functions and guarantees th...
Definition: ObsGroup.h:32
static ObsGroup generate(Group &emptyGroup, const NewDimensionScales_t &fundamentalDims, std::shared_ptr< const detail::DataLayoutPolicy > layout=nullptr)
Create an empty ObsGroup and populate it with the fundamental dimensions.
Definition: ObsGroup.cpp:72
A Selection represents the bounds of the data, in ioda or in userspace, that you are reading or writi...
Definition: Selection.h:48
Variables store data!
Definition: Variable.h:680
IODA_DL Group constructFromCmdLine(int argc, char **argv, const std::string &defaultFilename)
This is a wrapper function around the constructBackend function for creating a backend based on comma...
Definition: Factory.cpp:21
int main(int argc, char **argv)
Selection & extent(const VecDimensions_t &sz)
Provide the dimensions of the object that you are selecting from.
Definition: Selection.h:111
Selection & select(const SingleSelection &s)
Append a new selection.
Definition: Selection.h:103
string latName
Definition: 05-ObsGroup.py:118
list newDims
Definition: 05-ObsGroup.py:95
string tbName
Definition: 05-ObsGroup.py:117
string lonName
Definition: 05-ObsGroup.py:119
constexpr int Unlimited
Specifies that a dimension is resizable to infinity.
IODA_DL void unwind_exception_stack(const std::exception &e, std::ostream &out=std::cerr, int level=0)
Convenience function for unwinding an exception stack.
Definition: Exception.cpp:48
std::vector< std::shared_ptr< NewDimensionScale_Base > > NewDimensionScales_t
Describes the dimensions of an Attribute or Variable.
Definition: Dimensions.h:22
std::vector< Dimensions_t > dimsCur
The dimensions of the data.
Definition: Dimensions.h:23
Used to specify Variable creation-time properties.
Definition: Has_Variables.h:57
VariableCreationParameters & setFillValue(DataType fill)
Definition: Has_Variables.h:69
bool chunk
Do we chunk this variable? Required for extendible / compressible Variables.
Definition: Has_Variables.h:84