Thursday, December 16, 2010

First Principles of UI Testing Drivers

A UI testing driver is a tool that helps automating interactions with a UI. Typically it is used in automated functional and regression test suites. Essentially a driver is a tool that can automatically drive through a UI.

I have worked on Frankenstein, have discussed and worked with Ketan on Twist while he was developing SWTBot and have brainstormed with Hakan on his new web testing driver Krypton (Sadly, I cannot find any public repository of this yet. Will post it when I get hold of it). While doing these, I found that there is something fundamentally similar to all UI testing drivers and if you understand these basic ideas, you can very easily work with any driver or implement your own for some new platform.

I would like to formalize the concepts of a UI testing driver in this post. This should give you a good mental model to understand the working of a driver. Before we start, let's answer a basic question:

How would you test UI manually?

Lets say you want to login into a typical application. You need to enter your username and password into the text field and the password field respectively and then click on the "Sign In" button. If you want to do it manually, you first search or locate where the username field is. Then, you interact with it i.e. type your username into the field. You would do the same with the password field. You would then search or locate where the "Sign In" button is and click it.

The two main tasks that you do manually are:

1) Locate what you want and
2) Interact with it

These form the basic operations of any UI testing driver. If you can automate these two operations, you can essentially test any UI automatically i.e. have a driver for that UI.

1. Locate Elements

The first thing a driver needs to provide is a mechanism to locate or search for elements. Most drivers use a concept called locator. A locator is like an address. Based on how accurate the locator is, a driver can get one or more UI elements that match the given locator. I want to stick with the same term - locator, because it pretty accurately describes what it does. The actual syntax of locators is left up to the driver implementers.

  1. .name could match an element whose CSS class is name.
  2. Table >> Chapter 1 >> Questions could match an element which is called "Questions" and is a list item of "Chapter 1" which in turn is a sub-list of "Table"

An important property of locators is that given a UI with some structure, a driver should return the same element or a set of elements for a given locator every time, even when there are new elements added or unrelated elements being deleted from that UI.

Example: If  the locator text_with_label['Foo'] identifies a text field whose label is 'Foo', adding a new text field, say, after this field should not change what element the driver returns.

While this property is important for the sake of stability and non-flaky tests, its not mandatory. Sometimes, non-deterministic locators which use the nearness in terms of distance on the UI or relative positioning is used. This can be quiet handy when testing a UI that is not so well written and cannot be changed easily.

Absolute Positions

Historically, tools like QTP used the absolute co-ordinates of a given UI component as a locator when recorded. This can be extremely flaky and should not be used as is. Most tools do not use this approach anymore and I have mentioned this just for the sake of legacy.

2. Interact with Elements

Once a driver knows what element it is dealing with, you can specify what you want to do with it. For example, you can click a button, choose an option from a drop down, drag and drop an image onto a Thrash Can icon etc. In order to do these interactions, a driver needs to simulate what a user does. Right from simulating events to actually moving the mouse and sending keyboard events, a driver can choose to do it in a few different ways. The following are some ways of doing this:
  • OS Native events - In this approach, the driver sends OS level native events to the element identified by the locator. For example, in order to click on a button, the mouse pointer is actually moved to the button and a real mouse button click is sent on a OS level.
    • Good thing about this is that it is as close to what happens in the real world as you can get in automated testing
    • Bad thing about this is that it in order to implement this, you would need to write very low level code or use libraries like Java's AWT Robot. Either way, the application under test needs to have focus, in which case you cannot run multiple tests on the same box and development becomes annoying.
    • Example: Frankenstein
  • Framework Native events - In this approach, the driver sends all possible events programmatically to an element that would make sense for a given interaction. For example, in SWTBot, in order to click on a button, the driver sends SWT Events such as MouseIn, MouseButtonDown, MouseButtonUp, MouseOut, MouseButtonClick etc to the button in the order in which the real events would be sent. 
    • Good thing about this approach is that it is very easy to develop. This can be run without giving the application under test focus. It works for the 98% case
    • Bad thing about this is the absence of the perceived safety of doing the real thing. No one would be calling these events manually in production environment, which make this seem like a very high level integration test. Though, this should not matter, it is sometimes brought up as an issue. In the 2% case, the issue could be that the event listener is hooked onto a different element - may be a container but the driver is sending the events to the element matched by the locator.
    • Example: Selenium, Sahi, SWTBot
  • Application Native events - In this approach, the driver sends events native to that application to an element. For example, inside a browser, you can send browser specific native events like COM events in IE, XPCOM events in Firefox etc. You would be working on a fairly high level compared to OS level events, but get the benefits of native events. This can be thought of as a middle ground between the first and second approaches. Webdriver uses this approach.
For the first and third approaches, the current position of the located element can be evaluated. Though this would be an absolute position, this is still OK as it wont be persisted. The event is then sent to the evaluated co-ordinate. This way, a driver would simulate a user's interaction.

You can pretty much map what most UI testing drivers do to the above 2 basic operations.

Thursday, December 9, 2010

Web UI driver comparison

I have had the (dis?)pleasure of working with 3 of the major Web testing drivers out there on my day job. I want to do a Selenium RC vs. Webdriver vs. Sahi post. A lot of people at work ask our team this question as we have used all 3. After doing a search, I realized that there are a lot of biased posts out there, so I have tried to be as objective as I can.

Selenium RC (pre 2.0)

  • Supports pretty much every major browser that exists out there
  • Supports writing code in Java, Ruby, Python, PHP and a few other languages
  • Has been around for quiet sometime and the community is pretty active. If you get stuck with something, the chances are, others have already faced the same and have talked about it on some Selenium forum
  • Provides different kinds of locators to identify elements on the browser - Name, ID, XPath and Dom.
  • Provides the ability to either inject JS or write extensions through user-extensions in order to enhance Selenium[1]
  • Has a recorder but only for Firefox
  • Has a notion of modes: Vanilla, Chrome, IEHTA, PI etc. Its very confusing to figure out what you want to use when you start off
  • Has a need for explicit waits for page loads, elements to appear etc.[2]
  • The architecture uses a proxy server that injects Selenium JS to each page in a different frame or into the page itself depending on the mode. On IE, one needs to setup the proxy server settings. This is very painful if you have a build farm with 20 IE machines. Also, one needs to manage the life cycle of the server which is again, well, work. 
  • Uses JS event emulation in order to do user actions[3]. This can cause 2 issues:
    • Selenium tries and sends all events that make sense for a given action. In order to click, for example, it would have to do a "mouse in", "mouse down", "mouse up", "click" etc. But, a user may have any sort of event listener like "blur" which is not possible to simulate with Selenium.
    • I have faced some issues with testing frameworks like DOJO using Selenium (IDE & RC)
  •  Frames and Windows are not easily testable.
  • You need to accept a Selenium certificate if you want to test HTTPS because of the proxy server.
  • Has a big flat interface with no notion of Browser elements.
My personal take on Selenium RC pre 2.0 is, it was a good contender, but there are better options now. I would not want to use this version anymore.

Webdriver (Pre Selenium 2.0)

  • Takes a different approach to implementation. Tries to be as close to native as possible with every browser i.e. IE has a plugin, Firefox has an addon etc. This gets rid of the need for a proxy server.
  • Tests are faster compared to Selenium RC
  • Creating a new webdriver instance is as simple as saying "new FirefoxDriver();", for example. Does not have the notion of modes
  • Though not very useful in an AJAX heavy application, supports HTTPUnit which means you can run headless tests.
  • Has a nice abstraction for UI elements like Button, Checkbox etc. Nicer API which allows for nicer OO code.
  • Dealing with HTTPS is straightforward as you just need to accept the real certificate from the application under test
  • The driver development is very active. The last time I checked Simon Stewart was working full time on this in Google.
  • Support on IE has a very major roadblock which has not been fixed in the last 5 months. This was the major reason why we had to ditch Webdriver. Basically the test hangs and we do not know what the problem is.
  • StaleElementException: When you do a getElement operation, Webdriver returns a list of elements. However, if due to some JS activity, the element gets replaced, you get an exception. This can be very tricky to deal with.
  • Though one does not need to wait for page loads, one still needs to explicitly wait for elements to appear. For example, if you do an action which results in Ajax fetching a link and then you want to click on the link, you have to wait for the link to appear. There is no implicit waiting. Apparently, this is in the backlog, but it was not there 3 months ago when I last worked on webdriver.
  • API unimplemented on IE! Though, the latest version has some of this fixed, there are some API which throw an exception on IE.
  • We realized that "class name" locators do not work on IE. So, we had to basically resort to XPaths for all locators. This can be tricky if you are not familiar with XPaths. Also, the locators become very verbose.
  • The move to Selenium 2.0 was not very well documented about 4 months back. However, I think it might have changed now.
  • No recorder yet
Overall, I like Webdriver for its simplicity in startup and usage. If only the IE issues were ironed out and the locators were made nicer, it would have been a keeper for me.


  • Provides all the features that Selenium does
  • Has a recorder that works on IE, FF and Safari
  • Implicitly tries out different locator strategies. It tries id, name, text and class name - in that order for a given locator. It also supports regular expression syntax for locators. This is immensely powerful.
  • Follows a concept of Element Stubs in its Java driver. This is very powerful. What it does is, when you say "driver.div('foo')", it returns an ElementStub. Whatever operation you perform on it, Sahi sends it over to the browser as a command and evaluates it there with implicit waits. This gets rid of
    • Need for explicit waiting because of JS or Ajax
    • The Stale Element issue faced in Webdriver.
  • Gets rid of all explicit waits. Sahi takes care of blocking when a page reloads, if there are ajax requests in progress and when you are trying to act on an element which is not present yet. This makes the tests a lot terser and stabler.
  • Has different locator mechanism - In, Under and Near. Also supports the traditional XPaths & Dom locators.
  • Uses the same architecture of Selenium RC for its Java driver because of which one needs to deal with the proxy issues. Especially painful for IE.
  • Under and Near locators are devious. Though they give a good mileage to start with, if your page has a lot of repeated entries (like a list of sorts), you may be hitting the wrong element. Since Sahi returns the first element that matches by default, your tests may become flaky because a wrong element got matched for your locator. Only 'In' is deterministic, while 'Under' and 'Near' are not. My advice is do not use them if you can avoid them.
  • HTTPS is again painful because one has to accept the Sahi certificates. Though this should be one time thing, somehow, I always have to do this on every Sahi upgrade. Doing this on a build farm is very painful.
  • There are some issues which are not solved yet because of which builds hang. The ElementStubs do not have a time out and they can potentially get stuck for ever. We have had builds which have stuck overnight and had to be manually killed.
  • The community is not big and its just a few people, albeit full time, working on the tool. Turn around time can be a little big.
Just the implicit waits and the locator friendliness (the strategy that goes through name etc) is so powerful that I am sticking with Sahi for now.

May be if I get time, I can post the code from our git history to show how webdriver and Sahi code looks like for the same operation. Sahi code is a lot smaller.

If any of you have evaluated Wati[rjn], Krypton or any other drivers, please do leave a comment.

[1] - Hakan and I actually wrote an extension that waits for any open Ajax calls implicitly so that the users do not have to explicitly wait.The user-extension concept is pretty useful.

[2] - Explicit waits in tests are a bad idea. I did a quick Google search and did not find many entries. A new one coming up.

[3] - I will talk about the implication of this in a different post.

In the next few days...

I started my professional career working on the testing domain. I am a developer, but I was working on Frankenstein the Swing UI testing driver and then on Twist. Given that I work in an organization where we care a lot about automated functional and acceptance tests for regression, I have been writing a lot of tests as well.

I want to write a series of blogs describing my experiences with working on different UI testing drivers and tools. I would also be writing about some inherent problems I, and most likely anyone who has ever written a serious automation suite, have faced and how I have or intend to solve them.

First Post

Hello, World.