The Writer thread takes records from the Key Generation output queues and passes them to the MDM-RE Sort routine. The sort places each row into a memory buffer. When the buffer becomes full, its contents are sorted and the results are written to a sort work file on disk. Once all groups are sorted, the groups on disk are merged to create a fully sorted file. The fully sorted file is used as input to the
DBMS Load
utility.
The performance of the sort is affected by the
size of the sort buffer,
number of sort threads, and
the placement of the disk files.
These are controlled by the
DATABASE-OPTIONS=IDXSORT (. . . )
parameters defined in the SDF. The default sort buffer size is 64MB.
A large sort buffer is desirable because
there will be less sorted groups to merge (less random I/O)
sorted groups are written in bursts of I/O, so they create less disk contention
they allow larger I/O buffers during the merge phase
However large sort buffers will hold more unsorted records, and therefore they will be sorted less often and each sort operation will take longer (as compared to a smaller sort buffer).
While sorting occurs, the writer thread is blocked. This means it can not remove records from the key generation output queues, so they in turn will block if there is insufficient room to write their results. Therefore it is important that the key generation output queue is large enough to enable key generation to continue while sorting occurs. Since the key generation output queue size is determined by
SSALDR_RBSIZE
, it must be set quite high when large sort buffers are in use.
Tuning
Allocate as much sort memory as possible. Make sure it is not so large as to cause swapping to occur, as this negates the benefit of a memory based sort.
If the Key Generation threads have more waits than the writer thread, it indicates that
SSALDR_RBSIZE
should be increased.
Place the sort work file on a different device to the output file to avoid disk contention.
Compress-Key-Data
The appropriate size of the Compress-Key-Data parameter must be determined. Load a representative sample of data and use the histkg utility to determine the appropriate setting. Refer to the
Compressed Key Data
section in the
DESIGN GUIDE
for details.
DBMS Extents
When loading large amounts of data, it is wise to allocate large extents for the IDT and IDXs. Use the
DATABASE-OPTIONS=IDT(. . . )
and
IDX(. . . )
to allocate large extents and/or place the tables and indexes in appropriate tablespaces.
Partitioning Data
Loading extremely large systems requires a scalable solution. In this situation, consider partitioning the data on a logical criterion such as a range of IDs. Create one system per partition and load them in parallel.
CPU and I/O usage
Key Threads
The Table Loader automatically creates n key generation threads, where n is the number of CPUs available. You may override this value by setting the environment variable
SSALDR_KEYTH=n
.
Loader Threads
The number of Loader threads is set to the number of CPUs available on the machine. You may override this value by setting the environment variable
SSALDR_LOADTH=n
.
In general, the DBMS load is an I/O intensive operation. Creating too many Loader threads may cause I/O contention that could slow down the load process. Not all loader threads can be used in some cases:
When there is insufficient work to utilize all threads.
When there are only Lite Indexes left to load and the IDT has not been loaded yet.
When loading to a UDB database, UDB creates tablespace locks that prevent concurrent loads.
When loading to MSQ, all merge phases must be completed prior to starting the first mass load utilit (