068: How We Do UI Testing Here at The Frontside

After the cliffhanger left in Episode 62: UI for U and I, we follow up with a short discussion about how we specifically do UI Testing at The Frontside in Austin, Texas.

Resources:

Transcript:

CHARLES: Hello everybody and welcome to The Frontside Podcast, Episode #68. My name is Charles Lowell. I'm a developer here at The Frontside and podcast host-in-training. I'm here today with Jeffrey and Elrick, two other developers here at The Frontside. We are going to carry on where we left off back in Episode 62. There was an implicit promise to talk about the way that we do UI testing.

JEFFREY: We had a cliffhanger.

CHARLES: Yeah, we did. It ended with a cliffhanger and someone actually called us on it, which I hate it because they're making more work for you. But Ian Dickinson from Twitter writes, "Hey, you guys promised to talk about how you do UI testing at The Frontside but never actually delivered on that promise." We're going to try and make good on that today.

JEFFREY: We like to resolve our promises.

CHARLES: Oh!

[Laughter]

CHARLES: You've been on vacation for a week, Jeffrey and I forgot what it was like to have you in the office.

JEFFREY: Not enough code puns.

CHARLES: Oh, code pun. There you go. It's like CodePen except all of the code has to make puns. I like it. Internet is awash with people bloviating about testing so we figured we'd take our turn at it. We promise to keep it short. We'll keep it focused but it seems like a value that we do have so might as well talk about it a little bit. You guys ready?

JEFFREY: Let's talk about testing.

CHARLES: I think one of the best things to do is to use something concrete, not just to talk about abstractions, not just to talk about things that might be but we're actually starting a project here in three days. It's going to kick off on Monday and testing is going to be a key part of that. Why don't we talk a little bit about what it's going to look like as we kind of lay the groundwork for that project?

JEFFREY: As we start this project, the very minimum baseline that we want to get immediately is acceptance tests in the browser. We want to make sure that when you fire up this app, it renders, you get the fallbacks for the basic functionality of this app immediately. As we're building features on top of this app, that's when we bring in unit tests, as we say, "We're building this new component. We're building this new feature. That's part of this app. We're going to use test driven development and unit tests to drive the creation of that," but ultimately, our test of quality for that app and our assurance of quality over the long term comes from the acceptance testing.

CHARLES: People often ask the question, "When is it appropriate to write those unit tests? When is it appropriate to write those acceptance tests and how much time so I spend doing each one?" Personally, when I was starting out with testing many, many, many years ago, I really, really like unit tests and I like developing my code based around unit tests. The thing that I liked about them was that they were fast. The entire test suite ran in a matter of seconds or minutes.

JEFFREY: And you get coverage.

CHARLES: Yes.

JEFFREY: Like you get your coverage numbers up. It is like every line that I wrote here has some kind of code coverage.

CHARLES: Right and it feels really good. I also think that unit tests really are great for mapping out the functionality in the sense of getting an intuitive feel of what it's like to use a unit before it's actually written. You get the experience like, "This is actually a terrible API because I can't write the test for it," so obviously it's not flexible and it's really mungy so I really, really enjoyed that. The thing that I hated most is acceptance tests. They're hard to write. They were slow to run and it seemed like when they would break, it was not always clear why.

JEFFREY: Wait, so we were just singing the praises of unit tests. What's wrong with them?

CHARLES: Part of it is kind of the way that you conceive of a test: what are you actually doing? I think if you think of tests not so much as something that's there for regression or something that's there to drive your design, both of which are very, very important things but more is just kind of measurement: taking a data point, what is it that I want to measure?

In the case of a unit test, I want to measure that this library call does exactly what I think it's going to do so I can take that data point, I can put it on my spreadsheet and I can say, "I got that base covered." The thing is that an acceptance test just measures a completely separate thing and by acceptance test, we're talking about testing as much of the stack your entire integrated application as you can.

Oh, boy. Don't get me started on terminology but when you have an acceptance test, what you're measuring is, "Does my application actually function when all the pieces are integrated?" So you're just like a scientist in the laboratory and you're making an observation and you're writing it in your notebook. Yes or no, does it work?

In the same way that you're taking a perception when your eye perceived the photon of light, you can say, "That thing is there," when it bounces off the chair in the room, I know that the chair is there. If what you want to measure, what you want to perceive does in fact my application work for user, then an acceptance test is the only thing that will actually take that measurement. It will actually let you write that data point down in your notebook and move on.

I think that the problem with unit tests... Well, there's nothing wrong with them, it's just they don't observe your integrated application so that means you're blind. It's only part of the story so that's why we find that acceptance test really are like the highest value tests, even though they suck to write and even though they take a long time and even though when they break. Sometimes, they break at some weird integration point that you didn't see and that's really hard to diagnose. But you know what? That same integration point, that's a potential failure in your app and it's going to be just as weird and just is esoteric to track down. That's my take on it. One of the things that you're describing, Jeffrey is we set up those acceptance tests suites, why do we set them up? What, because they're part of like a process?

JEFFREY: Yeah, that process is that we start at the very beginning when our project on the very first day is when we like to set up continuous integration and make sure the repos all set up, make sure the deployment pipeline is set up.

CHARLES: What do you mean by deployment pipeline?

JEFFREY: The deployment pipeline looks like, "We've got our a good repo and our version control under consideration and from there, anytime there's a push to the master build, any time a pull request gets accepted, we build out an continuous integration server, whether that be Travis or Circle or any of the solutions out there. We want to run that entire suite of acceptance tests and whatever unit tests we have, every time we have a push of code to past a person's local box.

CHARLES: Right, a push to master is tantamount to a push to production.
JEFFREY: Yes, that is the ideal. Any time that the code has been validated that far, yes this is ready to go to production and our acceptance test suite validates that we feel good about this going out.

CHARLES: Yeah so the thing is if all you have is unit tests, you have no way of perceiving whether your application will actually work in a real browser, with real DOM events talking to a real server, do you really want to push to production?

[Laughter]

JEFFREY: We had some client work recently. We do have a lot of acceptance tests coverage and actually a lot of unit tests too. We changed the browser running on the CI server from Chrome 54 to 58 and uncovered a lot of bugs that acceptance tests coverage found, that unit tests just would never have revealed that the end user had a problem.

CHARLES: Now, why is that? Let’s take it down a layer now, when we do an acceptance test, what does that actually mean? What are we describing in terms of a web application?

JEFFREY: We're describing that we have an application that we're actually running in a real browser. Usually, you're going to have to have some kind of stubbed out things for authentication to make sure you can get around those real user problems but you're actually running the real application in a real browser, not in a headless browser but an actual browser with a visible DOM.

CHARLES: Yeah, not a simulated DOM.

JEFFREY: Exactly so you can surface the kinds of real problems that your customer will actually have.

CHARLES: In terms of interaction, you are not sending JavaScript actions.

JEFFREY: No, you're firing DOM events. You're saying, "Click this button," or, "Put this text into this input," and the underlying JavaScript should be pretty opaque to you. You shouldn't be calling directly at JavaScript events in any acceptance tests. You want to rely on solely what's on the DOM.

CHARLES: Yeah. I would say the ideal set up with an acceptance test, you should be able to have your acceptance tests suite run and then completely and totally rewrite your application, let's say rewrite it from Ember to Angular or something like that and you don't have to change your acceptance tests suite at all.

JEFFREY: Yeah, usually the only binding you have is IDEs or classes or whatever you're using to select elements and that should be it.

CHARLES: Right, you're interacting with the DOM and that's it, so now in terms of the server, most modern web apps -- in fact, all of them -- certainly in the kind of swimming pools in which we splash, have a very, very heavy server component. There's a lot of requests. Just even load the page and then throughout the life of the page, it's one big chatterbox with the server, what do we do there?

JEFFREY: That's when we need to pull on a tool that can mock a request for us, the one that we fall back on a lot is Ember CLI Mirage that is built on top of Pretender. It's a really nice way to run a fake server, basically. I would even take that a step further and would love for the tool to be another server completely that's running on your local box or your CI box or whatever so that you get the full debugging available in developer tools and you actually see those requests as if they were real ones that just coming from a fake server.

CHARLES: Right, as Jeffrey said right now, we use a tool called Mirage, which for our listeners in the Ember community, they know exactly what we're talking about. It's the gold standard. What it does is it actually stubs out XML HTTP request, which is what most of the network traffic in your browser after the initial load is encoded in. It's got a factory API for generating high quality stub data so your code is actually making a [inaudible] request and it's all working.

Unfortunately because it's stub, as you said none of the developer tools work. But there's been talk about making Mirage, you service workers so that you would have all of that running. You could still run Mirage inside your browser. It would be a different process. I think the server's workers run off thread. That actually is very exciting. It's a great tool. I think it's a great segue to talk about one of the things that we love. This is absolutely a mandatory requirement for us, when we start a project is to have acceptance testing in place.

Back in the battle days, it would take us, I want to say like two weeks just to set up our acceptance tests suite for a project. This is when we were doing Backbone and this is when we were doing the early, early days of Ember. We would start a project, we'd spend all of that time upfront just so that we could have an acceptance testing framework and that was actually a really hard sell because it's like, "What are you doing? You don't have anything to show," and it's like, "Well, we're actually setting up an app-a-scope," so that we can observe your application and make sure that it's running.

The very first thing that we do is we set up an app-a-scope and it's like, "Nobody sets up an app-a-scope," so they can actually see their application. But one of the great things that has happened and one of the reasons we love Ember so much is that you actually get this now pretty much for free. I think there's still some stuff that's very white box about Ember testing and a lot of times, we talked about that ideal of you should be able to swap out your entire implementation of your app but your acceptance test suite stays the same. That's not quite possible in Ember. You have to know about things like the run loop and it kind of creeps its way in. But it's 95% of the way there. It's mostly there. It's good enough. It's better than good enough. It's great. You just get that for free when you start a new Ember project.

JEFFREY: We've talked a lot about the Ember ecosystem and what we like there about testing and we're going to be doing some React work soon. What's the story there?

CHARLES: Well, I'm glad you asked, Jeffrey. Yes, we're going to be doing some React work soon so again, this is a new project and it's absolutely a 100% ironclad requirement that we're not going to develop applications without an app-a-scope but I think that the React community is in a place where you kind of have to build your own app-a-scope.

You know, actually having kind of scoured the blogosphere, there are a lot of people in the React community who care very deeply about acceptance testing but it does not seem like yet a mainstream concern or a mainstream pathway. For example, Jest which is the tool that is very, very popular in the React community and I was actually really excited about reading the documentation. It doesn't even run in the browser. It's Node.js only, which for us is it's a nonstarter. That makes it really fast but it actually is not an app-a-scope which is what we need.

It does not lead us to actually observe our application. It lets you observe the components from which your application is built but actually you're blind to what your application actually looks like in production. I was kind of bummed about that because I know that there is work and I know that the maintainers of Jest carry very deeply about making it run in the browser eventually. There are some experimental pull requests in the couple branches. Who knows? Maybe those are even merged right now but the point is, it's still very early days in there.

There are a couple of people who have used like Nightmare that they've booted up and they're kind of controlling the running acceptance tests in Nightmare from Jest. Now, that sounds great but part of your app-a-scope needs to perceive different browsers. In fact, users don't use Nightmare.js. They use Chrome, they use Mobile Safari, they use Firefox and normal Safari and Edge and what have you. There's actually a great set of tool. Testem and Karma are all set up to be able to run all these browsers and have those browsers connect to your test suite and run them in parallel.

Again, that's kind of a bar. That's what we're actually working towards right now. We're running some experiments to try and use Mirage and Karma with Mocha to actually get the multiplicity of browsers, actually using browsers, be able to use real DOM events and test a real API or as real as API as we can get.

I'm kind of excited about that work. I hope that the acceptance testing story gets a lot better in the React community and it's early days but like I said, we care very deeply about it. I know a lot of other people care very deeply about it. Some people have some different feel like it's not necessary. Some people feel like they don't need an app-a-scope. They're like, "You know what? My application is there and I know it. I don't actually want to look at my application but I know it's there." I think that's actually the pervasive model at Facebook and obviously, they're doing something right over there. Although, who knows? I'll keep my mouth shut.

No, but seriously, there's some good software that comes out of there. The Facebook application itself is pretty simple. Maybe it's enough to perceive your components and not have to perceive your application as a whole. Or there are other ways that you can go about it. If you've got lots and lots of money, you can have people do it and they do a pretty good job. I understand that's another strategy. In fact, I think that's what Facebook does.

I've read a lot of the debates between why they go with unit tests, why they don't go with acceptance tests? They’ve got a QA Team. They probably has their own tools or who knows? Maybe they use like Capybara or WebDriver or Selenium and those tools are really, really excellent and they are truly, truly black box tools.

JEFFREY: Elrick, you brought up an interesting new angle that I think is starting to come of age that I think as an awesome addition to the toolkit which is visual regression testing. How far that's come in the past couple years and how valuable that is for getting the CSS confidence. Unit test coverage is great for testing your individual components, for testing JavaScript functionality but ultimately, the confidence around CSS comes from visual regression testing. That's the only way you can get that and I think that's helping the CSS ecosystem to be a little healthier to encourage better practices there and having engineers who are more averse in JavaScript and not as averse in CSS, feel more comfortable making CSS changes because they have that type of testing to them.

ELRICK: When you're doing things programmatically, you wouldn't really know what's going on visually unless you physically go and check it and then that may not be the best solution because it takes a lot of time.

JEFFREY: Yeah.

CHARLES: The tool that we have that have the most direct hands on experience with is Percy and I think, you guys have more experience with it than I do but it's very intriguing. When I heard about this I was like, "How does this even work?"

JEFFREY: It's so great to have visual diffs. As it runs through your acceptance test suite, you specify, "Take a screenshot now," so you can compare against what came before. Sometimes you run into some finicky things that are like, "I have some data that's not completely locked in." One of the most common dis I get is, "Hey, there's deep change in the screenshot from acceptance tests." I'm like, "Oh, because I've hard code that date. Always do the same," but it's great at servicing and like, "This button move 10 pixels. Did you intend to do that?"

In particular, when we upgraded a component style guide library, we noticed a lot of changes that came out of that. Actually they were okay changes but it was important to know that those has changed.

CHARLES: Right and there were actually some visual regressions too. I remember there was some button turned red or something like that and it's like, "Oops, something got screwed up in the cascade," which always seems to happen.

JEFFREY: I think it has caught some bug before and some changes we've made to select component and like, "This has the same functionality," but actually the classes around this particular component changed. "It doesn't look the same. What happened?" It's been a really valuable tool.

CHARLES: The acceptance test from the code perspective actually completely and totally worked, right?

JEFFREY: Yes. Exactly.

CHARLES: But, yeah, acceptance test didn't catch it because they were not perceiving the actual visual style of the application. It's an enhancement or an add-on to your app-a-scope so that it can see more things and you can perceive more things. I love it. When you have an acceptance test suite, then it allows you to do those power add-ons because you're actually loading up your whole application in a real browser, in a real DOM and you're making it go through the paces that an actual user will do so you can do things like take visual diffs.

If you're just doing unit tests in a simulated DOM, that's just not a possibility because you're not actually running the painting algorithms. Whereas, an acceptance test, you are so it allows you to perceive more things and therefore, check for more things and catch for more things.

JEFFREY: I think my next area of exploration that I'm interested in is, let's say you have a web app with sound effects. Is there something to validate the sound effects? Or take a video capture of your app because that would be really cool. I'm sure there's something out there but I think I'll be my [inaudible] to go find out.

CHARLES: And then we actually touched on this on the episode on accessibility, you can test your accessible APIs when you're running acceptance test. Another thing that I might add is as terms of regression testing, acceptance tests have a much longer term payoff because I feel very strongly about test driven development for your unit tests. I think unit tests are really, really great and if you're building tiny units, especially novel ones that you haven't really built before, they're not cookie cutter things like a component or something like that. It's some unique service or just a utility library or some bundle of functions.

Using test to drive those out is extremely important. But I almost feel like once you've done that, you can throw those tests away. By all means, they're free to keep them around but your tests also serve as a fence, a wall, a kind of exoskeleton for code and that can be great except when you want to change and refactor the code around, you have to tear down the wall and you have to break the exoskeleton of the code.

If your code is siloed into all these tiny little exoskeletons, elsewhere it's going to be very hard to move and refactor or it's going to be hard to do without rearranging those walls. I guess my point is, with an acceptance test, you're making that wall very big. It covers a lot of area so you can have relative freedom to change things internally and rapidly. While the acceptance test is slow, the speed for internal change that engenders, I think is worth it. I think that's another pay off because I think that tests do have a shelf life and I think that the shelf life for unit tests is very small. Whereas for application level tests, it's very large.

Then I guess, the final thing is really, there's no such thing as acceptance tests and integration tests and unit tests and blah-blah-blah-blah-blah, all the different types of tests. It really is just a matter of scope. It's like how big are the lines that you're drawing so acceptance test is a type of test that we give a name for tests that have a very high amount of scope. They cover a lot, where in unit tests, have a very small scope but there's a whole spectrum for every piece of code that is running in between, there's a whole spectrum in between.

All righty. Well, this concludes Episode 68 of The Frontside Podcast. I told you it was going to be a short one but I think it's going to be a good one. I know that this is a subject that's very, very near and dear to our hearts. We don't dedicate explicit time to it all that often but I'm sure we will return to it again in the future. Jeffrey, Elrick, thank you guys for joining us for this discussion and we're out.