Monitor and optimize operational resources in SQL Server
--
Describe performance monitoring
Describe performance monitoring tools
.
When you deploy an Azure Virtual Machine from the Azure Marketplace, an agent is installed in the virtual machine that provides a basic set of operating system metrics that are presented to you in the Azure portal.
These metrics pertain to the operating system, not SQL Server. You’ll notice that the namespace for each metric is the virtual machine host, not SQL Server
In the case of a virtual machine, the default metrics captured are CPU, network utilization, and disk read and write operations.
Azure Monitoring Insights allows you to collect additional data points like storage latency, available memory, and disk capacity.
If you create a virtual machine with one of the pre-configured SQL Server images in the Azure Marketplace, you can also get the SQL virtual machine resource provider as shown in the following image.
Take note that to access future features and product improvements, you will want to additionally register your SQL Server VM with the SQL IaaS Agent extension.
.Performance Monitor with SQL Server on an Azure Virtual Machine
Whether you are using an on-premises server or on an Azure Virtual Machine, the Windows Server platform has a native tool called Performance Monitor (commonly shortened to perfmon after the name of its executable file) that allows you to easily and routinely monitor performance metrics.
Perfmon operates with counters for both the operating systems and installed program
Describe critical performance metrics
You have seen about how to collect data in both Azure Monitor and Windows Performance Monitor.
Metrics are gathered at regular intervals and are the gateway for alerting processes that will help to resolve issues quickly and efficiently.
Metrics is a powerful subsystem that allows you to not only analyze and visualize your performance data, but to also trigger alerts that notify administrators or automated actions that can trigger an Azure Automation runbook or a webhook. You also have the option to archive your Azure Metrics data to Azure Storage, since active data is only stored for 93 days.
The alerts can be configured in a static manner (for example, raise an alert when CPU goes over 95%) or in a dynamic fashion using Dynamic Thresholds.
Dynamic Thresholds learn the historical behavior of the metric and raise an alert when the resources are operating in an abnormal manner.
These Dynamic Thresholds can detect seasonality in your workloads and adjust the alerting accordingly.
Both types of alerts offer Booleans operators such as the ‘greater than’ or ‘less than’ operators.
Along with Boolean operators, there are aggregate measurements to select from such as average, minimum, maximum, count, average, and total.
The following options are available for defining the action to take:
· Automation Runbook
· Azure Function
· Email Azure Resource Manager Role
· Email/SMS/Push/Voice
· ITSM
· Azure Logic App
· Secure Webhook
· Webhook
Establish baseline metrics
A baseline is a collection of data measurements that helps you understand the normal “steady state” of your application or server’s performance.
Correlating SQL Server and operating system performance
If you are using Linux as the operating system, you will need to install InfluxDB, Collectd, and Grafana to capture data similar to Windows Performance Monitor.
Correlating SQL Server and operating system performanceProcessor(_Total)% Processor Time — This counter measures the CPU utilization of all of the processors on the server. It is a good indication of the overall workload, and when used in conjunction with other counters, this counter can identify problems with query performance.
Paging File(_Total)% Usage — In a properly configured SQL Server, memory should not page to the paging file on disk. However, in some configurations you may have other services running that consume system memory and lead to the operating system paging memory to disk resulting in performance degradation.
PhysicalDisk(_Total)\Avg. Disk sec/Read and Avg. Disk sec/Write — This counter provides a good metric for how the storage subsystem is working. Your latency values in most cases should not be above 20 ms, and with Premium Storage you should see values less than 10 ms.
System\Processor Queue Length — This number indicates the number of threads that are waiting for the time on the processor. If it is greater than zero, it indicates CPU pressure, indicating your workload could benefit from more CPUs.
SQLServer:Buffer Manager\Page life expectancy — Page life expectancy indicates how long SQL Server expects a page to live in memory. There is no proper value for this setting. Older documentation refers to 300 seconds as proper, but that was written in a 32-bit era when servers had far less RAM. You should monitor this value over time, and evaluate sudden drops. Such drops in the counter’s value could indicate poor query patterns, external memory pressure (for example, the server running a large SSIS package) or could just be normal system processing like running a consistency check on a large database.
SQLServer:SQL Statistics\Batch Requests/sec — This counter is good for evaluating how consistently busy a SQL Server is over time. Once again there is no good or bad value, but you can use this counter in conjunction with % Processor time to better understand your workload and baselines.
SQLServer:SQL Statistics\SQL Compilations/sec and SQL Re-Compilations/sec — These counters will be updated when SQL Server has to compile or recompile an execution plan for a query because there is no existing plan in the plan cache, or because a plan was invalidated because of a change. Recompiles can indicate T-SQL with recompile query hints, or be indicative of memory pressure on the plan cache caused by either many ad-hoc queries or simple memory pressure.
Wait statistics
When a thread is being executed and is forced to wait on an unavailable resource, SQL Server keeps track of these metrics.
This information is easily identifiable via the dynamic management view (DMV) sys.dm_os_wait_stats.
Explore extended events
The extended events engine in Azure SQL is a lightweight and powerful monitoring system that allows you to capture granular information about activity in your databases and servers.
Some examples of issues you might troubleshoot with Extended Events include:
· Troubleshooting blocking and deadlocking performance issues.
· Identifying long-running queries.
· Monitoring Data Definition Language (DDL) operations.
· Logging missing column statistics.
· Observing Memory Pressure in your database.
· Long-running physical I/O operations.
Extended events cover the full surface area of SQL Server, and are divided into four channels, which define the audience of an event.
· Admin — Admin events are targeted for end users and administrators. The events included indicate a problem within a well-defined set of actions an administrator can take. An example of this is the generation of an XML deadlock report to help identity the root cause of the deadlock.
· Operational — Operational events are used for analysis and diagnostics or common problems. These events can be used to trigger an action or task based on an occurrence of the event. An example of an operational event would be a database in an availability group changing state, which would indicate a failover.
· Analytic — Analytic events are typically related to performance events and are published in high volume. Tracing stored procedure or query execution would be an example of an analytic event.
· Debug — Debug events are not necessarily fully documented and you should only use them when troubleshooting in conjunction with Microsoft support.
SQL Server provides numerous templates which are grouped into the following categories:
· Locks and Blocks
· Profiler Equivalents
· Query Execution
· System Monitoring
Administrators can start, and stop extended event sessions at any time through the Extended Events node in SQL Server Management Studio.
You also have the option of enabling causality tracking, which adds a globally unique identifier (GUID) and sequence number to the output of each event, which allows you to easily step through the order that the events occurred.
Filters are a powerful feature of Extended Events that allow you to use granular control to capture only the specific occurrences of the event you want to capture.
An extended event session has a target — a target can be simply thought of as a place for the engine to keep track of occurrences of an event.
Two of the more common targets are event file which is a file on the file system that can store events, and in Azure SQL PaaS offerings this data is written to a blob storage.
Another common target is the ring buffer which is within SQL Server’s memory.
The ring buffer is most commonly used for live observation of an event session as it’s a circular buffer, and data is not persisted beyond a session.
Event sessions can be scoped to a server or a database.
Describe Azure Intelligent Insights
One of the benefits of using Azure SQL Database is that the baseline performance collection that is built into the Azure platform.
Beyond the simple Azure Monitor data collection, Azure SQL Database Intelligent Insights is a component of Azure SQL Database that allows you to analyze performance of your queries.
This feature is built using data from the Query Store, which is enabled in your Azure SQL Database at creation time.
In order to enable Intelligent Insights, you need to add diagnostic settings to your database.
Intelligent Insights storage options
If you choose Azure storage, your data is stored in extended events format with an XEL extension.
Those files can only be viewed on the Azure SQL server where they were created.
If you choose an event hub, the data is stored in Avro format, which is a binary JSON format used for event reporting.
If you use Log Analytics as a destination, your data is stored in Log Analytics, and can be queried using the Kusto Query Language.
Identify problematic queries
Azure SQL Database offers a tool called Query Performance Insight, that allows the administrator to quickly identity expensive queries.
The default value is Resource Consuming Queries.
This tab will show you the top five queries sorted by the particular resource that you select on the left.
Explore causes of performance issues
Describe SQL Server Query Store
The Query Store can be thought of as a flight data recorder for SQL Server.
It collects and permanently stores and aggregates performance information
There are two types of data being captured.
One is the data about the query itself (number of executions, the plan it used in execution, the query text) and the other is the performance information, which collects runtime statistics for each execution of the query.
The built-in reports include:
Regressed Queries — This report shows any queries where execution metrics have degraded in the period of time of interest (last hour, day, and week). This report is useful for evaluating the impact of minor or major changes in server configuration or database schema.
Overall Resource Consumption — This report allows you to quickly observe the most impactful queries in your database. This report allows click through to the “Top Resource Consuming Queries” report, which allows you to gather execution plan information.
Top Resource Consuming Queries — This report shows the query and query plan for the most impactful queries in a database for a time period. This data allows you to observe if a query has multiple execution plans, and whether or not those plans have high variability in performance.
Queries with Forced Plans — This report contains information about plan forcing, and any plan forcing failures (a situation where a forced execution plan was not honored).
Queries with High Variation — This report showcases queries that have a high degree of variance between executions and can be sorted by any of the above execution metrics.
Query Wait Statistics — This report allows you to see waits statistics aggregated, and drill-through to get further information on queries that spent the most time waiting. Note that this wait information is aggregated and not as detailed as what you might observe in the sys.dm_os_wait_stats
DMV.
Tracked Queries — This report is filtered by query_Id and allows you to view the performance of a specific query and its execution plans. You can manually enter a query_id or you can add queries from the regressed or top resource consuming query reports. The query_id can be captured from the catalog view sys.query_store_query
.
Describe blocking and locking in SQL Server
One feature of relational databases is locking, which is essential to maintain the atomicity, consistency, and isolation properties of the ACID model.
All RDBMSs will block actions that would violate the consistency and isolation of writes to a database.
On SQL Server, blocking occurs when one process holds a lock on a specific resource (row, page, table, database), and a second process attempts to acquire a lock with an incompatible lock type on the same resource.
Typically, locks are held for a very short period, and when the process holding the lock releases it, the blocked process can then acquire the lock and complete its transaction.
Another blocking scenario is deadlocking, which occurs when one transaction has a lock on a resource, and another transaction has a lock on a second resource.
Each transaction then attempts to take a lock on the resource which is currently locked by the other transaction.
Theoretically, this scenario would lead to an infinite wait, as neither transaction could complete.
However, the SQL Server engine has a mechanism for detecting these scenarios and will kill one of the transactions in order to alleviate the deadlock, based on which transaction has performed the least of amount of work that would need to be rolled back.
Deadlocks are recorded in the system_health extended event session which is enabled by defaul
Describe data file fragmentation
Fragmentation on storage occurs when a collection of data that should be stored together is broken up into many pieces that are not contiguous on disk.
Historically, non-contiguous data was problematic on systems with hard disk drives, which performed much better when executing sequential reads and writes (reading or writing a set of data in contiguous sectors on the disk).
Modern solid-state devices (SSDs) reduce the impact of fragmentation, at least at the operating system level, but it can still be impactful within the data files used by SQL Server.
Fragmentation occurs when indexes (both clustered and nonclustered) have pages in which the logical ordering of the index, which is based on the data value of the index key, does not match the physical ordering of the pages.
Configure SQL Server resources for optimal performance
Explain how to optimize Azure storage for SQL Server virtual machines
Storage performance is a critical component of an I/O heavy application like a database engine.
Blob Storage — Blob storage is what is known as object-based storage and includes cold, hot, and archive storage tiers. In a SQL Server environment, blob storage will typically be used for database backups, using SQL Server’s back up to URL functionality.
File Storage — File storage is effectively a file share that can be mounted inside a virtual machine, without the need to set up any hardware. SQL Server can use File storage as a storage target for a failover cluster instance.
Disk Storage — Azure managed disks offer block storage that is presented to a virtual machine. These disks are managed just like a physical disk in an on-premises server, except that they are virtualized. There are several performance tiers within managed disks depending on your workload. This type of storage is the most commonly used type for SQL Server data and transaction log files.
Azure managed disks
Azure managed disks are block-level storage volumes that are presented to Azure Virtual Machines.
Block level storage refers to raw volumes of storage that are created and can be treated as an individual hard drive.
These block devices can be managed within the operating system, and the storage tier is not aware of the contents of the disk.
Typically, production SQL Server workloads will use either Ultra disk or Premium SSD, or some combination of the two.
Premium SSDs also support read-caching, which can benefit read-heavy database workloads by reducing the number of trips to the disk.
Striping disks for maximum throughput
One of the ways to get more performance and volume out of Azure disks is to stripe your data across multiple disks.
This technique does not apply to Ultra disk, as you can scale IOPs, throughput, and maximum size independently on a single disk
SQL Server storage configuration best practices
· Create a separate volume for data and transaction log files
· Enable read caching on the data file volume
· Do not enable any caching on the log file volume
· Plan for an additional 20% of IOPs and throughput when building your storage for your VM to handle workload peaks
· Use the D: drive (the locally attached SSD) for TempDB files because TempDB is recreated upon server restart, so there is no risk of data loss
· Enable instant file initialization to reduce the impact of file-growth activities.
· Move trace file and error log directories to data disks
· For workloads requiring storage latency under one millisecond, consider using Ultra disk over Premium SSD.
Describe virtual machine resizing
There are many size options for Azure Virtual Machines.
For SQL Server workloads the main characteristics to look for are the amount of memory available, and the number of input and output operations (IOPs) the virtual machine can perform.
Optimize database storage
What is proportional fill?
If you are inserting one gigabyte of data into a SQL Server database with two data files, you would expect the size of each of your data files to increase by roughly 512 megabytes.
However, this equal growth is not necessarily the case, as SQL Server will insert data into data files in different volumes based on the size of the data file.
Tempdb configuration in SQL Server
For tempdb heavy workloads, there may be benefits to increasing the number of tempdb files beyond eight, to the number of CPUs on your machine.
To support this many DBAs used trace flag 1117, which forced all databases with multiple data files to grow them at the same rate.
Control SQL Server resources
Resource pools
A resource pool represents physical resources available on the server.
SQL Server always has two pools, default and internal, even when Resource Governor is not enabled.
The internal pool is used by critical SQL Server functions and cannot be restricted.
The default pool, and any resource pools you explicitly define, can be configured with limits on the resources it can use.
You can specify the following limits for each non-internal pool:
· Min/Max CPU percent
· Cap of CPU percent
· Min/Max memory percent
· NUMA node affinity
· Min/Max IOPs per volume
Workload group
A workload group is a container for session requests based on their classification by the classifier function.
Like resource pools there are two built-in groups, default and internal, and each workload group can only belong to one resource pool.
Classifier function
The classifier function is run at the time a connection is established to the SQL Server instance and classifies each connection into a given workload group
If the function returns a NULL, default, or the name of the non-existent workload group the session is transferred into the default workload group.
Configure databases for performance
Describe database scoped configuration options
SQL Server has always had configuration options that were set at the database level.
For example, the recovery model has always been a database setting, but as more complex features have been introduced to the database, more options have been added.
Many of these options are tied to the compatibility level of the database, which is itself a database level configuration option.
Database configuration options break down into two groups, with a minor difference:
· Options configured by the ALTER DATABASE SCOPED CONFIGURATION syntax in T-SQL
· Options configured by the ALTER DATABASE syntax in T-SQL
Describe intelligent query processing
IQP features break down into a few families of features:
· Adaptive Query Processing
§ Adaptive Joins — the database engine defers choice of join between hash and nested loops based in the number of rows going into the join. Adaptive joins currently only work in batch execution mode.
§ Interleaved Execution — Currently this feature supports multi-statement table-valued functions (MSTVF). Prior to SQL Server 2017, MSTVFs used a fixed row estimate of either one or 100 rows, depending on the version SQL Server. This estimate could lead to suboptimal query plans if the function returned many more rows. With interleaved execution, an actual row count is generated from the MSTVF before the rest of the plan is compiled.
§ Memory Grant Feedback — SQL Server generates a memory grant in the initial plan of the query, based on row count estimates from statistics. Severe data skew could lead to either over- or under-estimates of row counts, which can cause over-grants of memory that decrease concurrency, or under-grants, which can cause the query to spill data to tempdb. With Memory Grant Feedback, SQL Server detects these conditions and decreases or increases the amount of memory granted to the query to either avoid the spill or overallocation.
· Table Variable Deferred Compilation
· Batch Mode on Rowstore
· Scalar UDF Inlining
· Approximate Query Processing
§ This function guarantees a 2% error rate with a 97% confidence interval
Describe query store in Azure Database for MySQL and PostgreSQL
One of the value-added features in Azure Database for MySQL and PostgreSQL is an implementation of the Query Store in each database.
While a slightly different implementation of the feature than in Azure SQL, the Query Store offers insightful actionable performance information to the DBA.
Describe performance-related maintenance tasks in SQL Server
Maintain indexes
Rebuild and reorganize
Index fragmentation occurs when logical ordering within index pages does not match the physical ordering
The common guidance for index maintenance is:
· >5% but ❤0% Reorganize the index
· >30% Rebuild the index
Maintain statistics
Statistics are stored in the user database as binary large objects (blobs).
These blobs contain statistical information about the distribution of data values in one or more columns of a table or indexed view.
The query optimizer uses column and index statistics in order to determine cardinality, which is the number of rows a query is expected to return.
SELECT sp.stats_id,
name,
last_updated,
rows,
rows_sampled
FROM sys.stats
CROSS APPLY sys.dm_db_stats_properties(object_id, stats_id) AS sp
WHERE user_created = 1
Describe automatic tuning
Automatic tuning features
SQL Server 2017 introduced a feature called automatic tuning.
Automatic tuning allows for the gathering and applying machine learning against performance metrics to provide suggested improvements or even allow for self-correction.
Automatic tuning, whether on-premises or in the cloud, allows you to identify issues caused by query execution plan regression.
Additionally, in Azure SQL Database you have the option to further improve query performance by index tuning.
Azure SQL Database automatic tuning can identify indexes that should be added or even removed from the database to enhance query performance.
If the forced plan does not perform better than the previous plan, it will be subsequently unforced and force a new plan to be compiled.
If the forced plan continues to outperform the previously bad plan, it will remain forced until such time as a recompile occurs.