UFO
DataExtractorCSVBackend.h
Go to the documentation of this file.
1 /*
2  * (C) Crown copyright 2021, Met Office
3  *
4  * This software is licensed under the terms of the Apache Licence Version 2.0
5  * which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
6  */
7 
8 #ifndef UFO_UTILS_DATAEXTRACTOR_DATAEXTRACTORCSVBACKEND_H_
9 #define UFO_UTILS_DATAEXTRACTOR_DATAEXTRACTORCSVBACKEND_H_
10 
11 #include <string>
12 
14 
15 namespace ufo
16 {
17 
18 /// \brief Produces input for a DataExtractor by loading data from a CSV file.
19 ///
20 /// The file should have the following structure (described in more detail in the example below):
21 /// * First line: comma-separated column names in ioda-v1 style (`var@Group`) or ioda-v2 style
22 /// (`Group/var`)
23 /// * Second line: comma-separated column data types (datetime, float, int or string)
24 /// * Further lines: comma-separated data entries
25 /// The number of entries in each line should be the same.
26 ///
27 /// Here's an example of a file that could be read by this backend and used for bias correction:
28 ///
29 /// station_id@MetaData,air_pressure@MetaData,air_temperature@ObsBias
30 /// string,float,float
31 /// ABC,30000,0.1
32 /// ABC,60000,0.2
33 /// ABC,90000,0.3
34 /// XYZ,40000,0.4
35 /// XYZ,80000,0.5
36 ///
37 /// One of the columns (above, air_temperature@ObsBias) contains the values to be extracted (also
38 /// known as the _payload_). The payload column is identified by the group it belongs to, i.e. the
39 /// part of its name following the `@` sign (ioda-v1 style) or preceding the last `/` sign (ioda-v2
40 /// style); this group is specified in the call to the loadData() member function. The values from
41 /// the other columns (_coordinates_) are compared against ObsSpace variables with the same names
42 /// to determine the row or rows from which the payload value should be extracted for each
43 /// observation. The details of this comparison (e.g. whether an exact match is required, the
44 /// nearest match is used, or piecewise linear interpolation is performed) depend on how the class
45 /// using the extracted data (e.g. the DrawValueFromFile ObsFunction) is configured. The data type
46 /// of each column must match the data type of the corresponding ObsSpace variable. The type of the
47 /// payload column should match the template parameter `ExtractedValue`, which must be set to
48 /// either `float`, `int` or `std::string`. The column order does not matter.
49 ///
50 /// Notes:
51 ///
52 /// 1. A column containing channel numbers (which aren't stored in a separate ObsSpace variable)
53 /// should be labelled `channel_number@MetaData` or `MetaData/channel_number`.
54 ///
55 /// 2. Single underscores serve as placeholders for missing values; for example, the following row
56 ///
57 /// ABC,_,_
58 ///
59 /// contains missing values in the second and third columns.
60 ///
61 /// To continue the example above, suppose the file shown earlier is passed to the
62 /// DrawValueFromFile ObsFunction configured in the following way:
63 ///
64 /// name: DrawValueFromFile@ObsFunction
65 /// options:
66 /// file: ... # path to the CSV file
67 /// group: ObsBias # group with the payload variable
68 /// interpolation:
69 /// - name: station_id@MetaData
70 /// method: exact
71 /// - name: air_pressure@MetaData
72 /// method: linear
73 ///
74 /// For an observation taken by station XYZ at pressure 60000 the function would be evaluated in the
75 /// following way:
76 /// * First, find all rows in the CSV file with a value of `XYZ` in the `station_id@MetaData`
77 /// column.
78 /// * Then take the values of the `air_pressure@MetaData` and `air_temperature@ObsBias` columns
79 /// in these rows and use them to construct a piecewise linear interpolant. Evaluate this
80 /// interpolant at pressure 60000. This produces the value of 0.45.
81 ///
82 /// Refer to the documentation of the DrawValueFromFile ObsFunction for more information about the
83 /// available extraction methods.
84 template <typename ExtractedValue>
85 class DataExtractorCSVBackend : public DataExtractorBackend<ExtractedValue> {
86  public:
87  /// \brief Create a new instance.
88  ///
89  /// \param filepath Path to the CSV file that will be read by loadData().
90  explicit DataExtractorCSVBackend(const std::string &filepath);
91 
92  DataExtractorInput<ExtractedValue> loadData(const std::string &payloadGroup) const override;
93 
94  private:
95  std::string filepath_;
96 };
97 
98 } // namespace ufo
99 
100 #endif // UFO_UTILS_DATAEXTRACTOR_DATAEXTRACTORCSVBACKEND_H_
Provides data to the DataExtractor.
Produces input for a DataExtractor by loading data from a CSV file.
DataExtractorInput< ExtractedValue > loadData(const std::string &payloadGroup) const override
Load data for subsequent extraction.
DataExtractorCSVBackend(const std::string &filepath)
Create a new instance.
Definition: RunCRTM.h:27
Input data for the DataExtractor.