Submenu

Faster processing in EMADDC Release 2.2(-fast)

September 22, 2022 1612 keer bekeken {0} reacties

Over the last few months, EMADDC has been working on an update of the current operational Release 2.2 to provide users with data quicker.

Recently, the regular 15-minute window files have been made available after about 30 minutes from the start of a time-window compared to the previous 45 minute delay as MUAC updated their system to reduce their delay.

With the new update, EMADDC will also output files in 5-minute windows where these files are processed 15 minutes and 25 minutes past the start of such window. These two delays have been chosen such that roughly 70% and 95% of all data is contained in a file respectively as seen in the enclosed graphs. EMADDC refers to these files as "fast" files.

Changes in Output format

ASCII files have been renamed from a txt extension to a csv extension as columns are now correclty comma separated as per users request. Using the new formatting, missing numbers are now acutally missing instead the previously provided NaN. The header matches the content of each column, making it easier to import data into your software and no longer forcing a fixed order of columns.

EMADDC will temporarly output observations in bufr, csv and the "old" txt format to make it easier for users to switch. Note, that fast files are only available in csv and bufr formats.

Files are overwritten: Sequence Number

The results of the second run of processing a fast window, overwrites a file created earlier. The "version" or "sequence number" is now provided in the header (Sequence Number) or updateSequenceNumber BUFR field respectively. The CSVcsv file header also contains the time of processing and an "obs_id" offset.

Introduction of 'obs_id'

The obs_id is newly added to these files which makes it easier for EMADDC to find observations into the database. The obs_id is unique to each observations and is reset to 0 daily. Since this value can be quite large, the header contains an offset value that shall be added to the obs_id to get the actual value in the database. Please use this obs_id and offset in communication with EMADDC regarding issues with observations.

Products only provided using gzip compression

Per the news update of 16-12-2021, this new update only outputs compressed files and no longer provides uncompressed files in parallel to save bandwith. This means that uncompressed files are no longer available on ftppro.knmi.nl while the prodution server (ftpservice.knmi.nl) will follow on 06-12-2022

Release versioning

Finally, the new update provides an improved method for release versioning in output files and monitoring. The new version format is of the form x.y. e,g, 2.2Pf4b936f where 2.2 is the branch or tag version, P the machine name and f4b936f is the commit-hash (up to 8 chars). The hash reflects minor changes in the code which can be performance improvements, bugs fixed or new sources added. If the processing of observations or correction methods are changed which affect the quality of the observations, the tag/branch is updated and the x or y will be increased. Currently, we are working mostly on small changes (code improvements) and the major upgrade to 3.0, so Release 2.2 will be the main software for our products for the coming months. This new release does not affect quality and hence will be named Release 2.2(-fast).

Observation cut-off time

Fast files provide users quicker access to data processed by EMADDC. The graph below shows the number of observations present in a five minute window (12:00 to 12:05) with respect to observation cut-off time and region of interest. The definition of these regions can be found at the top of this article.

The graph below shows the number of observations present in a 10 minute window (12:00 to 12:10) contained in two fast files, with respect to observation cut-off time.

The graph below shows the number of observations present in a 15 minute window (12:00 to 12:15), contained in three fast files, with respect to observation cut-off time.

The new Release 2.2 is deployed for testing and files have been made available on ftppro.knmi.nl since September 13th. The fast files are made available in a separate folder named "fast". Old txt files are currently also still made available but not fot the "fast" files. The Acceptance and Production system will be updated in February 2023 at the latest making fast files also availble on ftpservice.knmi.nl. Please adjust your processing to use compressed files and the new csv files to make use of the faster processing. The old txt format will be deprecated in Q1 2023.

For further questions contact the EMADDC team at emaddc[at]knmi.nl