Marcel
Marcel That's me: Marcel

Common issues in Azure Virtual Desktop (VMs-Session Hosts relations)

Common issues in Azure Virtual Desktop (VMs-Session Hosts relations)

VM- Session Host relation

In Azure Virtual Desktop, session hosts in host pools are providing remote applications or desktops to users. A session host is an object in a host pool. The session host is NOT the virtual machine (you can see in Azure in the Azure Virtual Machine view) but has a reference in a property to the virtual machine:

The session host (object) is relevant for Azure Virtual Desktop. And is reporting the "state" to the AVD backend. The important states for a session host are: - Available (the session host is ready to accept connections) - Unavailable (the session host is not ready and mostly off/deallocated) - Shutdown (the session host is off/deallocated) - Other, which are not relevant in this case

The AVD agents installed on the virtual machine report the state of the session host. So, the virtual machine is responsible for the state of the session host object in a pool.

Common issues in this relationship

While the session host object and a virtual machine are independent resources and can have different problems, some behaviors must be monitored and mitigated if happening.

Orphan Virtual Machines

If a session host is deleted without the Virtual Machine, it will no longer appear in the AVD section in the Azure Portal and not in the session host list in Hydra for Azure Virtual Desktop. That is expected. But while the virtual machine was not deleted, you are charged for additional costs – even if you are not using the virtual machine as a session host. The costs are higher if the virtual machine is still running and "invisible" from the AVD view.

Unresponsive AVD Agents

As mentioned before, the AVD Agent (there are two involved services on each virtual machine) reports the state of a session host to the AVD backend. If the service cannot report the session host state to the AVD backend, the session hosts will stay in an "Unavailable" or "Shutdown" state. From the AVD perspective, it looks like the session host is not running even if the virtual machine runs. That means that customers are charged for the virtual machine but don’t get a value from the AVD perspective (users cannot log in; hosts look like they were deallocated). Possible causes of this issue: - A known issue is that the internal session host token expires after 90 days. So, if a session host is not running in 90 days to refresh the token, the host can no longer report the state to the AVD backend. https://learn.microsoft.com/azure/virtual-desktop/faq#how-often-should-i-turn-my-vms-on-to-prevent-registration-issues- - The OS crashed inside of the VM. If the OS crashes inside of Windows and is not restarted automatically and starts well, the AVD agent is not started and cannot send the "state" to the AVD backend. - The network is not working / cannot reach the AVD backend (Firewall): In this case, it’s technically not possible for the AVD Agent to report the state to the backend. - The AVD agent failed to start. If the AVD Agent fails to start, it cannot update the state to the AVD backend. The reason for that must be found in the log files and the event log of the OS on the VM. - If the network card or AVD services are disabled (which can be used to recreate this issue), the AVD agent cannot send the state to the AVD backend. - …

Other orphan resources

Orphan disks

Disks that are not assigned to a virtual machine. It still causes costs if forgotten to delete them.

Orphan network interfaces

A network interface is not connected to a virtual machine or used as a private endpoint. They don’t cause costs, but they reserve one of the available IPs per card. If no IP is available for the next host/virtual machine, this can cause rollout issues.

Orphan session hosts

The session hosts exist, but not the linked virtual machines. That doesn’t cause costs, but it can prevent autoscaling from working well.

Actions to prevent and monitor these kinds of issues

Using the AVD Deep Insights workbook

The workbook https://blog.itprocloud.de/AVD-Azure-Virtual-Desktop-Error-Drill-Down-Workbook/ is using the resource graph to show possible issues regarding both kinds of issues. In the section "Resources" are different types of possible issues are shown Important: Some of the listed issues are expected during rollout, maintenance (Windows Update), imaging, etc.

Orphan virtual machines

It shows virtual machines intended to be session hosts without a session host object. Only virtual machines with a tag with the name "AVD.Type" or "WVD.Type" and the value "SessionHost" are listed. WVDAdmin and Hydra for Azure Virtual Desktop set the tag automatically with each deployment. Unfortunately, hosts deployed by the Azure Portal are not listed, but the tag can be set after or during a deployment from the portal.

Unresponsive AVD agents

It shows the session hosts where the linked virtual machine is running, but the session host is in the state "unavailable" or "shutdown." That is expected during a start/stop/rollout/delete of a session host but not during the normal runtime.

Other orphan resources

The workbook also shows the following orphan resources: - Orphan disks: Disks that are not assigned to a virtual machine. It still causes costs if forgotten to delete them. - Orphan network interfaces: Don’t cause costs but will reserve one of the available IPs per card. This can cause rollout issues if no IP is available for the next host/virtual machine. - Orphan session hosts: The session hosts exist but not the linked virtual machines. That doesn’t cause costs, but it can prevent autoscaling from working well.

Deployment improvements with Hydra for Azure Virtual Desktop and WVDAdmin

Both tools can be used to easily deploy new session hosts. The newest versions install some scheduled tasks on the newly deployed session hosts / virtual machines to monitor the start-up of the AVD services. E.g., if a start of one service fails (AVD agent bootloader) in the first minutes after a start, the task will start the service again. That can help if the service is not running in an issue again.

Using Hydra for Azure Virtual Desktop

Failed start of session hosts (unresponsive AVD agent)

The start process of a session host is monitored If it is started with Hydra for Azure Virtual Desktop (in the Hydra portal or by autoscaling). If a session host is not changing the state to "available" or "updating" in 10 minutes, the host will be marked as "critically failed". This is visible on the dashboard and in the session host list:

Notification on the dashboard:

Notification in the session host list:

Note: If a host is started outside of Hydra—in the Azure Portal or with the Start VM on connect feature—Hydra cannot mark it. Tip: While Hydra's logging data are stored in Log Analytics, it is possible to create an alert to be notified by mail or to open a ticket via REST.

Notification of orphan VMs and unresponsive AVD agents

From version 1.0.7.4, Hydra will show a notification if orphan hosts or unresponsive AVD agents are detected:

The detection runs every 20 minutes, and resources seen three times in a row with the same possible issue are shown. So, the first notification is shown after one hour (to be changed). That should prevent the falsest alerts while such a state is possible during deployments, starts, stops, deletes, imaging, or if the service account does not have permission to see all needed resources.

Further orphan resources

Orphan disks and network interfaces will be listed (like virtual machines) in the near future in Azure Resources -> Orphan Resources in Hydra soon.

Next stepsf

In an upcoming version, the hosts and VMs will be directly visible in the Hydra portal, including the possibility of taking some actions (like deleting resources, reinstalling the AVD agent, etc.).

Conclusion

Like any other IT solution, Azure Virtual Desktop may have issues. Therefore, it is important to monitor and take action to avoid additional costs without gaining value from the AVD perspective. I personally use the tools I built to deploy hosts more reliably and to get notified about possible issues. And: It always makes sense to monitor the cost and cost prediction of subscriptions in Azure.