|Home | About | Journals | Submit | Contact Us | Français|
Remote desktop technology, the enabler of access to applications hosted on remote hosts, relies primarily on scraping the pixels on the remote screen and redrawing them as a simple bitmap on the client’s local screen. Such a technology will simply not work with screen readers since the latter are innately tied to reading text. Since screen readers are locked-in to a specific OS platform, extant solutions that enable remote access with screen readers such as NVDARemote and JAWS Tandem require homogeneity of OS platforms at both the client and remote sites. This demo will present Sinter, a system that eliminates this requirement. With Sinter, a blind Mac user, for example, can now access a remote Windows application with VoiceOver, a scenario heretofore not possible.
Remote desktop technology is primarily used to access applications that are hosted on remote machines, such as a physician accessing a medical records application running on an office server from her home. The graphical display of the remote system is virtualized and shipped across the network to the client. Virtualization emulates the graphics card frame buffer in which the pixel values are scraped from the screen of the remote system, and redrawn as a simple bitmap on the client’s local screen. This approach yields seamless access to remote applications, as if they were running on the local desktop. Remote desktop access is indispensable to users engaged in activities such as telecommuting, distance learning, remote troubleshooting and maintenance.
Unfortunately, remote desktop technology in its current incarnation, namely via emulation of the graphics frame buffer, simply will not work for blind users. These users rely on screen readers to interact with digital content. Screen readers require semantic information, such as text and hierarchical relationships to narrate screen contents—all of this information is lost by the time the screen is rendered as a bitmap.
One obvious solution is to run a screen reader on the remote host and relay the synthesized audio to the client. But network delays, especially over WANs, can introduce unacceptable latencies to relaying audio, a fact validated by our experiments . A second approach, exemplified by commercial systems such as NVDARemote  and JAWS-Tandem , is to intercept text from the remote screen reader just before audio synthesis, relay this text to the local client and synthesize audio locally. Getting this to work requires identical screen readers, one running remotely and the other locally. The serious problem with such a solution is that screen readers are locked into a single operating system platform; the differences in the underlying accessibility APIs (e.g., Microsoft’s MSAA and UI Automation, Apple’s Accessibility, and GNOME’s ATK & AT-SPI) has been a strong barrier to portability. Consequently, NVDARemotelike solutions require both the remote and local platforms to be similar. Platform-specific remote access solutions are clearly inadequate in the modern computing era where computer users increasingly use applications designed for different OSes, spanning desktops, laptops, tablets, mobile devices, cloud and other platforms. For instance, a Mac user may access a cloud-based Windows remote desktop to run an application required for her job. Similarly, a Windows desktop user may use VMware workstation to develop and test a Linux server application. Such scenarios are the norm these days and, thus, calls for platform-independent remote access solutions for screen reader users. Our Sinter system  provides such a solution.
Sinter is predicated on the observation that applications on every OS consist of similar User Interface (UI) widgets such as buttons, drop-down menus, text fields, etc. The key idea then is to create a UI model of the application’s GUI from the existing, platform-specific accessibility APIs, analogous to an HTML document object model (DOM) tree, convert the model into a generic intermediate representation (IR), and finally render the IR encoding of the application GUI on a different platform, using native UI widgets. A screen reader on the client system can then read the native rendering of the IR. Fig. 1 is a high level schematic of this idea. The figure corresponds to a scenario where a user wants to run VoiceOver, the Mac screen reader, to read a Windows application on a remote system. In Sinter, a scraper on Windows mines the Windows-specific UI model of the application, and converts it into Sinter’s IR which is shipped to a client system–Mac in this example, where a proxy converts the IR back to a native representation of these elements that VoiceOver understands. The Sinter proxy relays keystrokes and other user inputs back to the scraper, and the scraper relays incremental changes in the UI back to the proxy. In Sinter, the IR is a generic XML which we found to be sufficiently expressive for encoding a majority of standard UI element types on Windows and Mac. Having an IR opens up powerful opportunities for personalizing or making application specific modifications of the UI itself. For example, we made a mega-ribbon for Word in Fig. 2, which avoids cumbersome ribbon navigation for frequently-used buttons.
The Sinter idea and the engineering that went into building the system was presented in the 2016 European Systems Conference . Since then, we have completed the implementation of a fully functional Sinter system. Furthermore, following IRB approval we also conducted a preliminary user study with 21 blind subjects at Lighthouse Guild. Currently, Sinter enables to screen-read the remote Windows application locally on a Mac with VoiceOver, and conversely, remote Mac application locally on Windows with JAWS and NVDA. Windows applications that can be accessed remotely from Mac clients include Microsoft Word, Windows Calculator, Windows Explorer, the Windows registry editor, and the DOS command line (command.exe) while remote Mac applications accessed from Windows clients include Apple Mail, HandBrake (a media ripping and transcoding utility), Messages, Calculator, and Contacts. Fig. 2 presents snapshots of Windows applications accessed remotely on an Mac client with Sinter.
A Sinter demo, which will be the first of its kind to the accessibility community, will exercise the functionalities of Sinter with these applications in place.
21 blind subjects participated in a preliminary study primarily primed to get an initial assessment of Sinter, and gather additional requirements from users. Towards that they were given tasks to calculate with a remote calculator, and edit documents using Notepad running remotely. A System Usability Scale (SUS) questionnaire was administered at the end of the study relating to usability, ease of use, learning curve, etc. Sinter averaged a SUS score of around 78 which is considered good from a usability perspective. More importantly, a noteworthy highlight that emerged from the study was that users very much liked the fact that they can access remote applications using their preferred screen reader with customized settings. This eliminates the need to learn new screen readers or platforms, and thus allows them to stay in their own comfort zone. They also felt Sinter had the potential to be a game changer since it does not need a screen reader to be installed on a remote host which is often the situation they encounter in practice.
Sinter requires only two simple conversions for each OS: one from the native accessibility API to the IR, and one from the IR back to the native accessibility API. Our experience with Sinter has been that, if one can reuse native libraries, this translation can be done in a few thousand lines of code in each direction. As a point of comparison, the NVDA Windows screen reader is over 50,000 lines of code. We found that IR transformation is a potent meta-programming tool for making accessibility enhancements that are desirable but missing in the original application. Their full potential remains to be explored. Finally, an extensive user study examining important aspects of Sinter such as impact of platform independence, and accessibility enhancing IR transformations is a task that we will undertake in the near future.
This research was supported in part by NSF grants IIS-1218570, IIS-1447549, CNS-1149229, CNS1161541, CNS-1228839, CNS-1405641, CNS-1408695, CNS1526707, and VMware.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).