Perfmon / Performancemonitor, das unentdeckte Land!

Tuesday, January 10, 2012 2:34:26 PM (W. Europe Standard Time, UTC+01:00)

Use Perfmon to monitor servers and find bottlenecks

What and When to Measure

Bottlenecks occur when a resource reaches its capacity, causing the performance of the entire system to slow down. Bottlenecks are typically caused by insufficient or misconfigured resources, malfunctioning components, and incorrect requests for resources by a program.

There are five major resource areas that can cause bottlenecks and affect server performance: physical disk, memory, process, CPU, and network. If any of these resources are overutilized, your server or application can become noticeably slow or can even crash. I will go through each of these five areas, giving guidance on the counters you should be using and offering suggested thresholds to measure the pulse of your servers.

Since the sampling interval has a significant impact on the size of the log file and the server load, you should set the sample interval based on the average elapsed time for the issue to occur so you can establish a baseline before the issue occurs again. This will allow you to spot any trend leading to the issue.

Fifteen minutes will provide a good window for establishing a baseline during normal operations. Set the sample interval to 15 seconds if the average elapsed time for the issue to occur is about four hours. If the time for the issue to occur is eight hours or more, set the sampling interval to no less than five minutes; otherwise, you will end up with a very large log file, making it more difficult to analyze the data.

Hard Disk Bottleneck

Since the disk system stores and handles programs and data on the server, a bottleneck affecting disk usage and speed will have a big impact on the server’s overall performance.

Please note that if the disk objects have not been enabled on your server, you need to use the command-line tool Diskperf to enable them. Also, note that % Disk Time can exceed 100 percent and, therefore, I prefer to use % Idle Time, Avg. Disk sec/Read, and Avg. Disk sec/write to give me a more accurate picture of how busy the hard disk is. You can find more on % Disk Time in the Knowledge Base article available at support.microsoft.com/kb/310067.

Following are the counters the Microsoft Service Support engineers rely on for disk monitoring.

LogicalDisk\% Free Space This measures the percentage of free space on the selected logical disk drive. Take note if this falls below 15 percent, as you risk running out of free space for the OS to store critical files. One obvious solution here is to add more disk space.

PhysicalDisk\% Idle Time This measures the percentage of time the disk was idle during the sample interval. If this counter falls below 20 percent, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.

PhysicalDisk\Avg. Disk Sec/Read This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.

PhysicalDisk\Avg. Disk Sec/Write This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system.

PhysicalDisk\Avg. Disk Queue Length This indicates how many I/O operations are waiting for the hard drive to become available. If the value here is larger than the two times the number of spindles, that means the disk itself may be the bottleneck.

Memory\Cache Bytes This indicates the amount of memory being used for the file system cache. There may be a disk bottleneck if this value is greater than 300MB.

Memory Bottleneck

A memory shortage is typically due to insufficient RAM, a memory leak, or a memory switch placed inside the boot.ini. Before I get into memory counters, I should discuss the /3GB switch.

More memory reduces disk I/O activity and, in turn, improves application performance. The /3GB switch was introduced in Windows NT® as a way to provide more memory for the user-mode programs.

Windows uses a virtual address space of 4GB (independent of how much physical RAM the system has). By default, the lower 2GB are reserved for user-mode programs and the upper 2GB are reserved for kernel-mode programs. With the /3GB switch, 3GB are given to user-mode processes. This, of course, comes at the expense of the kernel memory, which will have only 1GB of virtual address space. This can cause problems because Pool Non-Paged Bytes, Pool Paged Bytes, Free System Page Tables Entries, and desktop heap are all squeezed together within this 1GB space. Therefore, the /3GB switch should only be used after thorough testing has been done in your environment.

This is a consideration if you suspect you are experiencing a memory-related bottleneck. If the /3GB switch is not the cause of the problems, you can use these counters for diagnosing a potential memory bottleneck.

Memory\% Committed Bytes in Use This measures the ratio of Committed Bytes to the Commit Limit—in other words, the amount of virtual memory in use. This indicates insufficient memory if the number is greater than 80 percent. The obvious solution for this is to add more memory.

Memory\Available Mbytes This measures the amount of physical memory, in megabytes, available for running processes. If this value is less than 5 percent of the total physical RAM, that means there is insufficient memory, and that can increase paging activity. To resolve this problem, you should simply add more memory.

Memory\Free System Page Table Entries This indicates the number of page table entries not currently in use by the system. If the number is less than 5,000, there may well be a memory leak.

Memory\Pool Non-Paged Bytes This measures the size, in bytes, of the non-paged pool. This is an area of system memory for objects that cannot be written to disk but instead must remain in physical memory as long as they are allocated. There is a possible memory leak if the value is greater than 175MB (or 100MB with the /3GB switch). A typical Event ID 2019 is recorded in the system event log.

Memory\Pool Paged Bytes This measures the size, in bytes, of the paged pool. This is an area of system memory used for objects that can be written to disk when they are not being used. There may be a memory leak if this value is greater than 250MB (or 170MB with the /3GB switch). A typical Event ID 2020 is recorded in the system event log.

Memory\Pages per Second This measures the rate at which pages are read from or written to disk to resolve hard page faults. If the value is greater than 1,000, as a result of excessive paging, there may be a memory leak.

Processor Bottleneck

An overwhelmed processor can be due to the processor itself not offering enough power or it can be due to an inefficient application. You must double-check whether the processor spends a lot of time in paging as a result of insufficient physical memory. When investigating a potential processor bottleneck, the Microsoft Service Support engineers use the following counters.

Processor\% Processor Time This measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor.

Processor\% User Time This measures the percentage of elapsed time the processor spends in user mode. If this value is high, the server is busy with the application. One possible solution here is to optimize the application that is using up the processor resources.

Processor\% Interrupt Time This measures the time the processor spends receiving and servicing hardware interruptions during specific sample intervals. This counter indicates a possible hardware issue if the value is greater than 15 percent.

System\Processor Queue Length This indicates the number of threads in the processor queue. The server doesn’t have enough processor power if the value is more than two times the number of CPUs for an extended period of time.

Network Bottleneck

A network bottleneck, of course, affects the server’s ability to send and receive data across the network. It can be an issue with the network card on the server, or perhaps the network is saturated and needs to be segmented. You can use the following counters to diagnosis potential network bottlenecks.

Network Interface\Bytes Total/Sec This measures the rate at which bytes are sent and received over each network adapter, including framing characters. The network is saturated if you discover that more than 70 percent of the interface is consumed. For a 100-Mbps NIC, the interface consumed is 8.7MB/sec (100Mbps = 100000kbps = 12.5MB/sec* 70 percent). In a situation like this, you may want to add a faster network card or segment the network.

Network Interface\Output Queue Length This measures the length of the output packet queue, in packets. There is network saturation if the value is more than 2. You can address this problem by adding a faster network card or segmenting the network.

Process Bottleneck

Server performance will be significantly affected if you have a misbehaving process or non-optimized processes. Thread and handle leaks will eventually bring down a server, and excessive processor usage will bring a server to a crawl. The following counters are indispensable when diagnosing process-related bottlenecks.

Process\Handle Count This measures the total number of handles that are currently open by a process. This counter indicates a possible handle leak if the number is greater than 10,000.

Process\Thread Count This measures the number of threads currently active in a process. There may be a thread leak if this number is more than 500 between the minimum and maximum number of threads.

Process\Private Bytes This indicates the amount of memory that this process has allocated that cannot be shared with other processes. If the value is greater than 250 between the minimum and maximum number of threads, there may be a memory leak.

Wrapping Up

Now you know what counters the Service Support engineers at Microsoft use to diagnose various bottlenecks. Of course, you will most likely come up with your own set of favorite counters tailored to suit your specific needs. You may want to save time by not having to add all your favorite counters manually each time you need to monitor your servers. Fortunately, there is an option in the Performance Monitor that allows you to save all your counters in a template for later use.

You may still be wondering whether you should run Performance Monitor locally or remotely. And exactly what will the performance hit be when running Performance Monitor locally? This all depends on your specific environment. The performance hit on the server is almost negligible if you set intervals to at least five minutes.

You may want to run Performance Monitor locally if you know there is a performance issue on the server, since Performance Monitor may not be able to capture data from a remote machine when it is running out of resources on the server. Running it remotely from a central machine is really best suited to situations when you want to monitor or baseline multiple servers.

Firefox Unterstützung für ADM / GPO

Friday, January 06, 2012 6:06:35 PM (W. Europe Standard Time, UTC+01:00)

Mozilla Firefox ist neben Google Chrome einer der weitverbreitesten Multiplattform Browser, der auch Techniken wie HTML5 unterstützt. Da in vielen Unternehmen noch im Intranet Webanwendungen verwendet werden, die auf IE6 Techniken und Spezialitäten eingerichtet sind, bietet es sich an den Firefox als alternierenden Webbrowser einzusetzen. DAgegen spricht, dass es keinen GPO-Support durch Microsoft oder die Mozilla Foundation gibt, analog zum Chrome Browser. Allerdings lässt sich der Firefox Webbrowser hervorragend an einen Kiosk Modus wie etwa auf einem Terminalserver oder Thin Client anpassen. Dazu gibt es zwei verschiedene Ansätze, bei denen Mittels entweder einer Referenzdatei die Menüstruktur des Webbrowsers zentral vorgebenen wird, oder eben eine freie Erweiterung für die Unterstützung von GPOs / ADMx Templates. Mit GPO for Firefox kann man die Standardeinstellungen für die Startseite und Proxy festlegen und eben auch bestimmte Funktionen zentral und flexibel sperren.

Um GPO for Firefox zu verteilen sind im Wesentlichen drei Komponenten notwendig:

  1. Ein ADM-Template für den Gruppenrichtlinieneditor, über den man die zentralen Einstellungen festlegt
  2. Eine Firefox-Extension, damit Firefox abfragt, ob es zentrale Einstellungen per GPO erhalten soll
  3. Ein Skript oder eine Alternativmethode, um die Extension automatisch an alle Clients zu verteilen (nicht zwangsläufig notwendig!!!)

Um die Extension als globale Erweiterung auf die Clients zu verteilen, wird die .XPI-Datei (Ordner mit den Extension-Dateien) aus “%ProgramFiles%\Firefox\Extensions\” kopiert und im NETLOGON-Verzeichnis eines DCs in einen Unterordner abgelegt. Per Startskript erhalten die Anwender anschließend den entsprechenden Ordner in ihr Programmverzeichnis kopiert, sofern dieser noch nicht (in der aktuellen Version) vorhanden ist. Wurde die Extension erfolgreich installiert, seht ihr auf Euren Clients folgendes Icon im rechten Bereich der Statusleiste:

Mit dem Standard-Gruppenrichtlinien-Editor können dann die Einstellungen gesetzt werden:

Neben dieser Option ist auch möglich den kompletten Browser selber anzupassen. Dazu müssen lediglich ein paar .JS Dateien angepasst werden:

user.js / prefs.js

In diesen beiden Dateien werden die einstellungen vom Firefox Browser abgelegt. Dazu gehören eben auch die Einstellungen, die Mittels GPO gesetzt werden können. Dabei ist es wichtig, dass die Datei user.js angelegt und angepasst wird und nicht die Datei prefs.js da diese durch den Anwender angepasst werden kann. Bei jedem Start einer neuen Instanz von Firefox überprüft die Browserengine, ob eine user.js Datei existiert und überschreibt dann die Werte in der prefs.js zum Start der Instanz.

userChrome.css

Mit Hilfe der Einstellungen in dieser Datei lässt sich der Browser limitieren, so dass Anwender nicht mehr in der Lage sind Erweiterungen personalisiert ablegen zu können, da diese evetuell nicht sicher sind, oder schlicht für den kommerziellen Betrieb nicht freigegeben sind und andernfalls lizenziert werden müssten. Danach startet der Browser im Kiosk Modus und ist sicher für den Tagesbetrieb. Ebenfalls ist dadurch die automatische Update-Funktion des Browsers deaktiviert!

NetApp – New Windows MPIO (V3.5) available

Thursday, November 24, 2011 12:52:03 AM (W. Europe Standard Time, UTC+01:00)

NetApp has released a new version of their MPIO DSM (Version 3.5) which does include several fixes and simplifies the deployment for NetApp connected Windows systems.

The new features which are included in version 3.5 are:

The Data ONTAP DSM 3.5 for Windows MPIO includes the following changes:

- Data ONTAP operating in Cluster-Mode is now supported, starting with version 8.1. Note the following about Cluster-Mode support:
  – Asymmetric logical unit access (ALUA) is required for all Fibre Channel (FC) paths and for all iSCSI paths to Cluster-Mode LUNs.
  – Mixed FC and iSCSI paths to the same Cluster-Mode LUN is supported.

- New Windows PowerShell cmdlets are available to manage the DSM. The cmdlets replace the dsmcli commands, which are deprecated starting in DSM 3.5. The dsmcli commands will be removed in a future release.
- The Windows Host Utilities are no longer required. The Windows Host Utilities components that enable you to configure Hyper-V systems (mbralign.exe and LinuxGuestConfig.iso) are now included with the DSM. While no longer required, installing the Windows Host Utilities on the same host as the DSM is still supported.
- Hyper-V guests running Red Hat Enterprise Linux (RHEL) are now supported. The Interoperability Matrix lists the specific versions supported.
- The number of reboots required to install or upgrade the DSM is reduced. For example, when you install Windows hotfixes, you can wait to reboot the host until after you install or upgrade the DSM.
- The timeout values set by the Data ONTAP DSM are updated based on ongoing testing.
- The options that you use to specify preferred paths for the Round Robin with Subset policy has changed. Note the following changes:
- The options that you use in the graphical user interface (GUI) to specify preferred paths for the Round Robin with Subset policy has changed. You now use the Set Preferred and Clear Preferred options to specify preferred paths. The Set Active and Set Passive options are no longer available for Round Robin with Subset. Note: These changes do not alter how the Round Robin with Subset policy works. Round Robin with Subset is still an "active/active" policy that enables you to specify preferred and non-preferred paths. The changes align the GUI terminology with how the policy works.

Release Notes can be found here: https://now.netapp.com/NOW/knowledge/docs/mpio/win/reldsm35/pdfs/rnote.pdf

NetApp MPIO Version 3.5 for Windows Systems can be found here:

https://now.netapp.com/NOW/download/software/mpio_win/3.5/

If you deploy MPIO DSM V3.5 from NetApp at a fully patched Windows 2008 R2 SP1 system you will get following “error” if you haven’t installed KB2522766 and KB2528357:

image

KB2522766 – The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2
http://support.microsoft.com/kb/2522766/

KB2528357 – Nonpaged pool leak when you disable and enable some storage controllers in Windows 7 or in Windows Server 2008 R2
http://support.microsoft.com/kb/2528357

image

image

After installing KB2522766 an reboot is required.

Also KB2522766 requires an reboot:

image

image

After 2 reboots you can install now MPIO DSM 3.5:

image

In my case, there is already an “older” version of NetApp MPIO DSM installed which will be automatically detected and upgrade from setup:

image

image

image

image

image

Note: In my case this is an Hyper-V server therefore I do install the Hyper-V Guest utilities which are one of the new features and mentioned above in the feature list.

image

image

……Finally Finished <img alt=" class="wp-smiley">

image

After an 3rd reboot you can now proceed with your SAN configuration.

Please stay tuned for more details around the new utilities and especially around the new Powershell commandlet’s for NetApp which are called “Powershell Toolkit Version 1.6”.

Powershell Toolkit 1.6

http://communities.netapp.com/community/interfaces_and_tools/data_ontap_powershell_toolkit/data_ontap_powershell_toolkit_downloads
http://communities.netapp.com/community/interfaces_and_tools/data_ontap_powershell_toolkit?view=documents