Torrus Scalability Guide
  Introduction
    Installing Torrus in big enterprise or carrier networks requires special
    planning and design measures, in order to ensure its reliable and
    efficient function.

  Hardware Platform Recommendations
    Hardware planning for large Torrus installations is of big importance.
    It is vital to understand the potential bottlenecks and performance
    limits before purchasing the hardware.

    First of all, you need to estimate the number of devices that you are
    going to monitor, with some room for future growth. It is a good
    practice first to model the situation on a test server, and then project
    the results to a bigger number of network devices. The utilities that
    would help you in assessing the requirements are "torrus configinfo" and
    "torrus schedulerinfo".

    The resources for planning are the server CPU, RAM, and disks. While CPU
    and RAM are of great importance, it is the disk subsystem that often
    becomes the bottleneck.

   CPU
    For large installations, CPU power is one of the critical resources.

    One of CPU-intensive processes is XML configuration compiler. A
    configuration for few hundred of nodes may take few dozens of minutes to
    compile. In some complicated configuration, it may require few hours to
    recompile the whole datasource tree. Here CPU power means literally your
    time while testing the configuration changes or troubleshooting a
    problem.

    The SNMP collector is quite moderate in CPU usage, still when the number
    of SNMP variables reaches dozens of thousands, the CPU power becomes an
    important resource to pay attention to. In addition, the collector
    process initialization time can be quite CPU-intensive. This happens
    every time the collector process starts, or when the configuration has
    been recompiled.

    The empiric estimation made by Christian Schnidrig is that one SNMP
    counter collection every 5 minutes occupies approximately 1.0e-5 of the
    Intel Xeon 2.8GHz time, including the OS overhead. For example, the
    Torrus collectors running on 60'000 counters would make the server busy
    at the average of 60%.

   Memory
    The collector would need RAM space to store all the counters
    information, and of course it's undesirable to swap. In addition, the
    more RAM you have available for disk cache, the faster your collector
    may update the data files.

    Each update of an RRD file consists of a number of operations: open a
    file, read the header, seek to the needed offset, and then write. With
    enough disk cache, it is possible that the read operations are made
    solely from RAM, and that significantly speeds up the collector running
    cycle.

    According to Christian Schnidrig's empiric estimations, 30 KB RAM per
    counter should be enough to hold all the neccessary data, including the
    disk cache. For example, for 60'000 counters this gives 1'757 MB, thus 2
    GB of server RAM should be enough.

    In addition, Apache with mod_perl occupies 20-30 MB RAM per process, so
    few hundred extra megabytes of RAM would be good to have.

   Disk storage
    It is not recommended to use IDE disks. They are not designed for
    continuous and intensive use. As experienced by Christian Schnidrig, IDE
    disks don't live long under such load.

    It is recommended to reduce the number of RRD files by grouping the
    datasources. This reduces dramatically the number of read and write
    operations during the update process.

    As noted by Rodrigo Cunha, reducing the size of read-ahead in the
    filesystem may lead to significant optimisation of disk cache usage. RRD
    update process reads only a short header in the beginnin of RRD file,
    and the rest of readahead data is never reused. On Linux, the following
    command would set the readahead size to 4 KB, which equals to i386 page
    size:

     /sbin/hdparm -a 4 /dev/sda

    For servers with dozens of thousands RRD files, it is recommended to use
    hashed data directories. Then the data directories will form a structure
    of 256 directories, with hash function based on hostnames. See *Torrus
    SNMP Discovery User Guide* for more details.

    Spreading the data files over several physical disks is also a good
    plus.

  Operating System Tuning
    Depending on the number of trees and processes that run on a single
    server, you might require to increase the maximum number of filehandles
    that may be opened at the same time, system-wide and per process. See
    the manuals for your operating system for more details.

  Torrus Configuration Recommendatations
   BerkeleyDB configuration tuniung
    When using lots of collectors and/or lots of HTTP processes, it is
    important to increase the size of BerkeleyDB lock region. The command

      db_stat -h /var/torrus/db -c

    would show you the current number of locks and lockers, and their
    maximum quantities during the database history. The maximum numbers of
    lock objects and lockers can be tuned by creating the file DB_CONFIG in
    the database home directory, /var/torrus/db. The following settings
    would work fine with about 20 collector processes and 5 HTTP daemon
    processes:

       set_lk_max_lockers   6000
       set_lk_max_locks     3000

    It is also recommended to increase the cache size from default 256KB to
    some bigger amount. Especially if the database has to hold large Torrus
    trees (hundreds or thousands monitored devices). The following line in
    DB_CONFIG sets the cache size to 16MB:

       set_cachesize        0 16777216 1

    After updating DB_CONFIG, stop all Torrus processes, including HTTP
    server, then run

      db_recover -h /var/torrus/db

    Then start the processes again. Futher info is available at:

    *   General access method configuration (BDB Reference)

        http://tinyurl.com/ybymk7t

    *   DB_CONFIG configuration file (BDB Reference)

        http://tinyurl.com/y9qjodv

    *   Configuring locking: sizing the system (BDB Reference)

        http://tinyurl.com/ya6dtww

    *   C API reference

        http://tinyurl.com/yczgnab

   XML compilation time
    For large datasource trees, XML compilation may take dozens of minutes,
    if not hours. Other processes are not suspended during the compilation,
    and they use the previous configuration version.

    For debugging and testing, it is recommended to create a new tree,
    separate from large production trees. That would save you a lot of time
    and would allow you to see the result of changes quickly.

   Collector schedule tuning
    The Torrus collector has a very flexible scheduling mechanism. Each data
    source has its own pair of scheduler parameters. These parameters are
    *period* and *timeoffset*. Period is usually set to default 300 seconds.
    The time is divided into even intervals. For the default 5-minutes
    period, each hour's intervals would start at 00, 05, 10, 15, etc.
    minutes. The timeoffset determines the moment within each interval when
    the data source should be collected. The default value for timeoffset is
    10 seconds. This means that the collector process would try to collect
    the values at 00:00:10, 00:05:10, ..., 23:55:10 every day.

    Data sources with the same period and timeoffset values are grouped
    together. The SNMP collector works asynchronously, and it tries to send
    as many SNMP packets at the same time as possible. Due to the
    asynchronous architecture, the collector is able to perform thousands of
    queries at the same time with very small delay. Within the same
    collector process, a large number of datasources configured with the
    same schedule is usually not a problem.

    If you configured several datasource trees all with the same period and
    timeoffset values, each collector process would start flooding the SNMP
    packets to the network at the same time. This may lead to packet loss
    and collector timeouts. In addition, all collector processes would try
    to update the RRD files concurrently, and this would cause overall
    performance degradation. Therefore, it is better to assign different
    timeoffset values to different trees. This may be achieved by manually
    specifying the "collector-timeoffset" parameter in discovery
    configuration files.

    In large installations, the collector schedules need thorough planning
    and tuning to insure maximum performance and minimize load on the
    network devices' CPUs. The "torrus schedulerinfo" utility is designed to
    help you in this planning. It shows two types of reports: configuration
    report gives you the idea of how many datasources are queried at which
    moments in time. The runtime report gives you realtime statistics of
    collector schedules, including average and maximum running cycle, and
    statistics on missed or delayed cycles.

    There is a feature that eases the load in large installations. With
    dispersed timeoffsets enabled, the timeoffset for each datasource is
    evenly assigned to one of allowed values, based on the name of the host,
    and name of the interface. By default, these values are: 0, 30, 60, ...,
    270. With thousands of datasources, this feature smoothens the CPU and
    disk load on Torrus server, and avoids CPU usage peaks on network
    devices with big number of SNMP variables per device. It is recommended
    to analyse the current scheduler statistics before using this feature.
    If you run several large datasource trees, don't forget to plan and
    analyse the schedules for the whole system, not just for one tree.

  Distributed setup
   NFS-based setup
    The following setup allows you to distribute the load among several
    physical servers.

    Several Torrus (backend) servers which run collectors and store RRD
    files in the local storage, shared by NFS. The frontend server runs the
    Web interface, and probably some monitor processes, accessing the data
    files by NFS.

    It is possible to organize the directory structure so that each data
    file would be seen at the same path on every server. Then you can keep
    identical Torrus configurations on all servers, and launch the collector
    process only on one of them. XML configuration files may be shared via
    NFS too.

    Be aware that BerkeleyDB database home directory cannot be NFS-mounted.
    See the following link for more details:
    http://www.sleepycat.com/docs/ref/env/remote.html

    Backend servers may run near the limits of their system capacities.
    70-80% CPU usage should not be a problem. For the frontend machine, it
    is preferred that at least 50% of average CPU time is idle.

Authors
    Copyright (c) 2004-2005 Stanislav Sinyagin <ssinyagin@k-open.com>

    Copyright (c) 2004 Christian Schnidrig <christian.schnidrig@bluewin.ch>

