After few week of Mdaemon slow queue complain, 1 week of changing hardware test and 1 week of process tracing, mail queue is now stabilized.
Summarize these week study; there are few setting should help Mdaemon run faster
- Offload Remote Queue to other server by setting delivery options [Must]
- Use server 2008R2 with enough memory to cache more Metafile
and calculation of enough memory = 4GB + (files count * 1KB)
for example, total mail store with 4 million files should have 8GB or more
- Registry tweak (based on Performance Tuning Guidelines for Windows Server 2008 R2)
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
NtfsDisable8dot3NameCreation=1
NtfsDisableLastAccessUpdate=1
- Disk defragment (no need often defragment)
- Application disk volume alignment, Mdaemon Application ( cfilter dir), Local Queue and User folder should be on same volume
- CPU – higher clock / better clock efficient instead of multi core
a. we tested on 8 core 16 thread machine, it use only few thread
b. even average load is below 50%, you will see 2.8GHz CPU perform better than 2.4GHz CPUI think Single CPU SandyBridge 2600K server will run faster than server with Dual Socket Xeon E5625
Offload Remote Queue to other server by setting delivery options
After offload Remote Queue to another Mdaemon server, you will see Incoming Queue run faster (less lockup)

Use server 2008R2 with enough memory to cache more Metafile
When huge amount of files and mail box are accessed frequently, server 2008 will cache Metafile to memory.
for example, Mdaemon folder have 4,385,755 Files, the usage of Metafile can reach 4.7GB.

Also, more memory also can benefit windows read cache. Equallogic SAN HQ report show the read/write ratio change from 66/33 to 15/85 after system upgraded to 16GB memory. That mean 91% of read IO saved by system cache.
Application disk volume alignment
After tracing server with Sysinternals’ Process Monitor ( 3 minutes of event, data size 5.9GB!! ),
I found,
- Mdaemon access disk in 4KB per Read/Write, it would be fine for mail sized in KB sale. However, some user send mail in Megabyte scale.
- Mdaemon create Local Queues and Remote Queues by coping message from Inboundfor example, a email addressed to 5 local mailbox and 5 remote mail recipient with 3 different domain will do following
5 times of ( make unique per recipient mail header + copy remaining message body in 4kB per block from Inbound to Local till EOF)
3 times of ( copy message in 4KB per block to Remote Queue + create *.rte to record target recipient)
- when process Local Queue and found user have mail forwarding setting, it will create related Local Queue and Remote Queue.
And same as Inbound Queue, working in 4KB per block pattern.
- Local Queue will do Antivirus and SpamAssassin check, file will move to Cfilter\Temp during CFilter process
- The final local mail will move to user mailbox
You will found that, file will move between CFilter, Local Queue, and User Folder.
When the folder not in same disk, Mdaemon still calling filesystem MoveFileA and move between volume handled by system.
On server 2008R2, read buffer is 512KB and write block size is 64KB.
CPU – higher clock / better clock efficient instead of multi core
You will found that, upgrade from single Quad Core CPU to dual Quad Core CPU only have slightly improvement.
It is because, queue processing seems single thread (one thread scan folder, one thread move file)
However, CFEngine, Spamassassin and Antivirus seems running in multi-thread, so, messages waiting inside Local Queue mostly are MDAV & SpamD processed.