How UI Automation helped me build a tool for a user who’s blind and who has MND/ALS
Hi,
I’m a strong believer that if a developer has a friend or family member that would benefit from some assistive technology tool, the developer should be able to build that tool. Hopefully they’d also share it with anyone whose quality of life might be impacted through access to the tool. The Windows UI Automation (UIA) API can sometimes be a great help in building an assistive technology solution.
A few months ago I was introduced to a gentleman who wanted to work with e-mail and web browsing on his Windows 7 PC. He’d done these things some time ago, but since then some features of the software he’d used had changed, and he’d benefit from some help in learning the new functionality. In addition to getting familiar with his versions of Windows, Windows Live Mail and Internet Explorer (IE), being blind, he’d also need to become familiar with the current version of his screen reader. So over the weeks that followed, we ran through the steps for reading and composing e-mail and browsing the web.
During this period, I was also conscious of how someone with MND/ALS such as this gentleman, might find it a challenge to press all the keys required in order to browse the web. So this got me thinking about whether there’s anything I could build which might be useful in this situation. In particular, is there a way to perform specific actions by pressing only a single key, and minimizing the amount of hand movement? Perhaps one approach is to see what can be done through use of the Number Pad keys alone.
The screen reader used has a very useful feature whereby the default key combinations for certain actions can be replaced with other combinations, (or a single key press), which might be preferable to the user. So we could have changed the trigger for a certain set of actions to be NumPad keys. But in this case, we were interested in having a key-press perform custom action that went beyond controlling the screen reader. For example, say we have a key that should invoke the IE Favorites list. It’s possible that when that key’s pressed, IE is not in the foreground, or not even running at all. So in reaction to that key, I want to start IE if it’s not running, bring it into the foreground, and then invoke the Favorites list.
With this in mind, I set out of build a simple tool that would allow web browsing with his screen reader, only using single key presses on the NumPad. (If this seemed to have potential, I could enhance this to allow reading of e-mails too.) This is what I ended up with:
As part of building the app, I used the Windows UIA API. Below are a couple of related snippets.
More details on building the app, and how UIA helped me, can be found at https://herbi.org/WebKeys/WebKeysTechnicalDetails.htm.
This has been a very interesting project for me, and having built the tool, I’ve uploaded it to my web site and made it freely available for anyone who might find it useful. I can update it based on user requests.
Next I’d like to explore how the app can be controlled through a switch device. It would be straightforward for me to add scanning of the buttons shown in the UI, and have that controlled with a switch device. The challenge will be making that an efficient process, but all being well I can come up with some approach which provides a useable experience.
I hope you find building assistive technology apps with UIA as rewarding as I do.
Guy
Code which determines whether the IE Favorites list is visible
// UIA-related values taken from
// "C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include\UIAutomationClient.h".
// Defining them in the app rather than pulling the values from interop.UIAutomationCore
// made some build step a little simpler when I first did this, (but I don't remember
// the details of that.)
private int c_propertyIdControlType = 30003;
private int c_propertyIdName = 30005;
private int c_propertyIdAutomationId = 30011;
private int c_propertyIdTreeType = 50023;
// Detect whether the IE Favorites list is visible.
private bool IsFavoritesListVisible()
{
bool fShowingFavorites = false;
// Find the "IEFrame" window. We've already taken action to try to make sure IE is running.
IntPtr hwnd = Win32Interop.FindWindow(c_strBrowserWindowClass, null);
if (hwnd != IntPtr.Zero)
{
// Get the UIA element that represents the IE window.
IUIAutomationElement elementBrowser = m_uiautomation.ElementFromHandle(hwnd);
if (elementBrowser != null)
{
// Find an element whose control type is UIA_TreeControlTypeId.
IUIAutomationCondition conditionControlType =
m_uiautomation.CreatePropertyCondition(c_propertyIdControlType, c_propertyIdTreeType);
// Don't look for an element name "Favorites", as I expect that won't work on anything
// but US-English systems.
// IUIAutomationCondition conditionName =
// m_uiautomation.CreatePropertyCondition(c_propertyIdName, "Favorites");
// Find an element whose AutomationID is "100".
IUIAutomationCondition conditionName =
m_uiautomation.CreatePropertyCondition(c_propertyIdAutomationId, "100");
// Combine the control type condition and name condition into a single condition.
IUIAutomationCondition condition =
m_uiautomation.CreateAndCondition(conditionControlType, conditionName);
// No cached properties or patterns are going to be accessed after we've tried to find
// the Favorites list.
IUIAutomationCacheRequest cacheRequest = m_uiautomation.CreateCacheRequest();
// Now find the first element beneath the browser element that meets the condition.
IUIAutomationElement elementButton = elementBrowser.FindFirstBuildCache(
TreeScope.TreeScope_Descendants, condition, cacheRequest);
if (elementButton != null)
{
// We've found the Favorites list.
fShowingFavorites = true;
}
}
}
return fShowingFavorites;
}
Code which programmatically invokes an IE button
int c_patternIdInvoke = 10000;
private void InvokeButton(string buttonName)
{
// Find the "IEFrame" window. We've already taken action to try to make sure IE is running.
IntPtr hwnd = Win32Interop.FindWindow(c_strBrowserWindowClass, null);
if (hwnd != IntPtr.Zero)
{
// Get the UIA element that represents the IE window.
IUIAutomationElement elementBrowser = m_uiautomation.ElementFromHandle(hwnd);
if (elementBrowser != null)
{
// Create a cache request to get the Invoke pattern for the element. This means
// we don't incur a cross-proc call to get the pattern later.
IUIAutomationCacheRequest cacheRequest = m_uiautomation.CreateCacheRequest();
cacheRequest.AddPattern(c_patternIdInvoke);
// Search for a button whose name has been supplied to us. We could add other
// conditions here if we want to. For example, only look for elements with an
// IsEnabled property of true. We're not interested in getting a Back button
// that's not enabled. Or create a condition which means we're only interested
// in elements in the Control View of the UIA tree. That would mean we'll avoid
// searching through element which only appear in the Raw View of the tree.
// Being careful about what conditions are set up before a call to find an
// element can be a great way of optimizing performance.
IUIAutomationCondition conditionControlType =
m_uiautomation.CreatePropertyCondition(c_propertyIdControlType, c_propertyIdButtonType);
IUIAutomationCondition conditionName =
m_uiautomation.CreatePropertyCondition(c_propertyIdName, buttonName);
// Combine the control type condition and name condition into a single condition.
IUIAutomationCondition condition =
m_uiautomation.CreateAndCondition(conditionControlType, conditionName);
IUIAutomationElement elementButton = elementBrowser.FindFirstBuildCache(
TreeScope.TreeScope_Descendants, condition, cacheRequest);
if (elementButton != null)
{
// Get the Invoke pattern which we requested to be cached when the element was found.
IUIAutomationInvokePattern pattern =
(IUIAutomationInvokePattern)elementButton.GetCachedPattern(c_patternIdInvoke);
// Now invoke the button. This will incur a cross-proc call.
pattern.Invoke();
}
}
}
}
Comments
Anonymous
July 10, 2013
After I built the tool described above, (which uses a mix of UI Automation and keyboard simulation,) I updated it to work with the Narrator screen reader that comes with Windows 8. The only functionality that I needed to change related to the keyboard shortcuts used by the screen reader. This was straightforward for the link/heading/paragraph navigation, and the read-from-here command. I couldn't find a shortcut to move the Narrator cursor back to the top of a web page, so I just simulated F5 to refresh the page. That seemed to move the cursor to the top. Building this tool has been a really interesting project. I've ended up with a tool that helps a person to browse the web with very little hand movement, while using one of a couple of different screen readers. All being well, I can adjust it based on end-user feedback to make it more useful in practice. There's a demo video of tool being used in conjunction with Narrator at the top of herbi.org/.../WebKeys.htm. It's pretty cool what can be done with a mix of UIA and SendInput().Anonymous
July 30, 2013
Hi, great job, is source code available???Anonymous
July 30, 2013
Hi Paulo, I'll try to make the source code available in the next 2-3 weeks. I'm in the middle of writing a new series of posts, and hopefully I'll upload all that by the end of next week. All being well I can share out the source for the "Herbi Web Keys" tool in the week or two after that. I'll probably just make the VS project downloadable from my Herbi.org site. Thanks, GuyAnonymous
August 11, 2013
Hi Paulo, I've just made the source code for my Herbi Web Keys app available at herbi.org/.../WebKeysVSProject.htm. The zip file that you can download there contains the Visual Studio 2010 solution for the app. If you want to build the project, I expect you'll probably need to tweak a pre-build step, (as I mention at my site). The app itself isn't 100% robust, and I've listed a few known constraints at my site. These issues have been fine for my needs up to now, and I'll tighten the app up if I need to in the future. The code shows the keyboard simulation I did in order to control the screen readers and the browser. It also shows how the SpeechSynthesizer makes it incredibly easy to output speech from a C# app. The SpeechSynthesizer class is one of my favourite classes, in that it can provide such important functionality to an app, with so little code! :) But more importantly to this blog, the code has a couple of methods which use UIA, (through an interop assembly), to control the browser UI. It's a real blast building AT tools to explore how UIA can potentially be of use to people who want to interact with UI through non-mainstream ways. Thanks, GuyAnonymous
August 15, 2013
Hi Guy, great news!!, I'm gonna download the code and use it as a learning tool. If I find any error or issue I'll let you know Thank you very much for sharing your work. PauloAnonymous
April 29, 2014
Thanks, I'll download it too