Dimension rrdtool databases with years worth of values

rrd_graph_db_fileTo create an rrd datbase file needs a certain amount of understanding of what you monitor, and for how long the data needs to be kept, and of course, storage space is also a consideration when creating a new rrd database file.

Even if rrdtool database files are relatively small, when configured to keep years of data, the database file can still get bigger. To give a few numbers, lets assume an rrd database is filled with a value every 5 minutes. Since rrdtool thinks in seconds, theses 5 minutes are 300 seconds.

Entries     Time Frame             File Size
12     1 Hour                    1.3 KB
288     1 Day                     7.8 KB
2016     1 Week                     49 KB
8928     1 Month (31 Days)         211 KB
105120     1 Year (365 Days)         2.5 MB
525600     5 Years                    13 MB
1051200    10 Year                     25 MB 
Calculated based off GAUGE value with AVERAGE, MIN and MAX archive (RRA).

The above list shows the size for the rrd database file when created for GAUGE data stored in an AVERAGE, MIN and MAX archive (RRA = round robin archive). To reproduce the results shown here, just execute the following commands to generate the rrd files.

rrdtool create rrd_db_1hour.rrd --step 300 DS:data:GAUGE:300:U:U RRA:AVERAGE:0.5:1:12       RRA:MAX:0.5:1:12      RRA:MIN:0.5:1:12
rrdtool create rrd_db_1day.rrd --step 300 DS:data:GAUGE:300:U:U RRA:AVERAGE:0.5:1:288       RRA:MAX:0.5:1:288     RRA:MIN:0.5:1:288
rrdtool create rrd_db_1week.rrd --step 300 DS:data:GAUGE:300:U:U RRA:AVERAGE:0.5:1:2016     RRA:MAX:0.5:1:2016    RRA:MIN:0.5:1:2016
rrdtool create rrd_db_1month.rrd --step 300 DS:data:GAUGE:300:U:U RRA:AVERAGE:0.5:1:8928    RRA:MAX:0.5:1:8928    RRA:MIN:0.5:1:8928
rrdtool create rrd_db_1year.rrd --step 300 DS:data:GAUGE:300:U:U RRA:AVERAGE:0.5:1:105120   RRA:MAX:0.5:1:105120  RRA:MIN:0.5:1:105120
rrdtool create rrd_db_5years.rrd --step 300 DS:data:GAUGE:300:U:U RRA:AVERAGE:0.5:1:525600  RRA:MAX:0.5:1:525600  RRA:MIN:0.5:1:525600
rrdtool create rrd_db_10year.rrd --step 300 DS:data:GAUGE:300:U:U RRA:AVERAGE:0.5:1:1051200 RRA:MAX:0.5:1:1051200 RRA:MIN:0.5:1:1051200

When it is required to keep data for more then just a couple of weeks or a month, creating them as above might be suitable, but when the rrd database should keep data for a year, the file is getting significantly bigger. A few MB might sound like nothing, but when monitoring systems or infrastructure, it does not mean keeping one rrd file. Most of the time it means many, many of these rrd database files. So the goal is to keep it small and fast.

Thanks to the rrdtool, there is a way to aggregate the values in a very special way. As the measured data gets older, it is less important to drill down to the 5 minute values. It might be enough to see a value every 15 minutes after 14 days and hourly values after approximately 2 months. The below command creates an rrd database that will hold values for up to 10 years. While still keeping at least one value for every 12 hour interval.

Based on a value ever 5 minutes (300 seconds) and calculating a month with 31 days and a year with 365 days, rrdtool allows to create an rrd database file for 10 years of data with a size of just under 1MB.

The following calculation is the base for the command below.

1 day => 288 values based on 5 minutes interval
14 days   (2 weeks)    5 minute values  (average of 1 value => 4032 entries)
(288 *   14 days) = 4032 entries
62 days   (2 month)   15 minutes values (average of 3 values => 5952 entries)
(288 *  62 days) / 3 (average of X values)
183 days  (6 month)   30 minutes values (average of 6 values => 8784 entries)
(288 *  183 days) / 6 (average of X values)
365 days  (1 year)     1 hour values    (average of 12 values => 8760 entries)
(288 *  365 days) / 12 (average of X values)
1825 days (5 years)    6 hours values   (average of 72 values => 7300 entries)
(288 * 1825 days) / 72 (average of X values)
3650 days (10 years)  12 hour values    (average of 144 values => 7300 entries)
(288 * 3650 days) / 144

The results of these calculations can be directly used in the command to create the rrd database. The number of values that should be aggregated as well as the number of entries to keep for the RRA are the last two options in every line (seperated by colon).

$ rrdtool create 10years_data.rrd --step 300 \
DS:data:GAUGE:300:U:U \
RRA:AVERAGE:0.5:1:4032 \
RRA:AVERAGE:0.5:3:5952 \
RRA:AVERAGE:0.5:6:8784 \
RRA:AVERAGE:0.5:12:8760 \
RRA:AVERAGE:0.5:72:7300 \
RRA:AVERAGE:0.5:144:7300 \
RRA:MAX:0.5:1:4032 \
RRA:MAX:0.5:3:5952 \
RRA:MAX:0.5:6:8784 \
RRA:MAX:0.5:12:8784 \
RRA:MAX:0.5:72:7300 \
RRA:MAX:0.5:144:7300 \
RRA:MIN:0.5:1:4032 \
RRA:MIN:0.5:3:5952 \
RRA:MIN:0.5:6:8784 \
RRA:MIN:0.5:12:8784 \
RRA:MIN:0.5:72:7300 \
RRA:MIN:0.5:144:7300

When created with the above command, the rrd databse is about 992KB (1015576 bytes) in size. Compared to the about 25MB without aggregation, the difference in size is huge. On a system storing about 50 rrd databases, the savings from aggregating in this way would make the difference between using 1.25GB (50x 25MB) or just 49.6MB (50x 992KB) to store the rrd database files.


Read more of my posts on my blog at http://blog.tinned-software.net/.

This entry was posted in Database, Monitoring and tagged , . Bookmark the permalink.