Add more documentation on why tests might hang when using runAsync #60136

mehmetf · 2020-06-23T22:13:34Z

Description

Widget tester supports runAsync operation which runs the callback in a real async zone. Today, this simply awaits for the callback. Much like we do for FakeAsync, we should keep track of microtasks and Timers and expose and API to resolve them prior to exiting the real async zone.

This is not a silver bullet solution. If apps/tests rely on long running async operations (such as file io, image processing, timers etc), it is highly likely that runAsync() will complete before all "side effects" are achieved. It is the test author's responsibility to wait for the right condition to test their widget tree state. This is discussed at length on b/157763566. However, we concluded that adding some level of tracking for these tasks and giving the users an API to let them await on these would be desirable. This PR implements that proposal.

EDIT: After numerous discussions with Flutter team, we have decided to make a documentation change only to guide users to write better tests. The API proposed might have the opposite effect of encouraging more flaky test code.

Related Issues

There's an internal issue for this: b/157763566.

Tests

I added many tests to make sure the new API can successfully wait for Timers and long running operations such as file IO.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

I read the Contributor Guide and followed the process outlined there for submitting PRs.
I signed the CLA.
I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
I read the Tree Hygiene wiki page, which explains my responsibilities.
I updated/added relevant documentation (doc comments with ///).
All existing and new tests are passing.
The analyzer (flutter analyze --flutter-repo) does not report any problems on my PR.
I am willing to follow-up on review comments in a timely manner.

Breaking Change

Did any tests fail when you ran them? Please read Handling breaking changes.

Widget tests are fine but I have not run their entire set of Flutter tests. In g3, this causes 2 esoteric failures which are likely timing issues.

No, no existing tests failed, so this is not a breaking change.
Yes, this is a breaking change.

chunhtai

I don't think this is the correct way to solve this, and this may have unexpected consequence if people start to rely on it. However, I don't have an absolutely way to resolve this problem and it has been haunting me for a long time as well. Maybe we should expose all the future inside widget for testing purpose, but that will be a pretty big change.

chunhtai · 2020-06-24T23:38:31Z

packages/flutter_test/test/widget_tester_test.dart

This is scary. I do agree it will be convenient to have ability to wait a period of time, but this may be a source of flakiness.

I was actually thinking about solving this problem before, is it possible to get all the the timer and microtask created within a zone(maybe be caching when someone calls ZoneSpecification.scheduleMicrotask and ZoneSpecification.createPeriodicTimer), and provide some kinda of api to expose those?

In general, I agree with you.

This PR does not make things less or more flaky. If your test relies on some condition to resolve asynchronously, then it may or may not get a chance to resolve when runAsync returns. This PR aims to make that a little better by trying to flush the microtask queue before returning. It makes sense but indeed, it is not a guarantee and thus it does not reduce flakes.

What it does do is to expose the pre-existing ugliness to developers. Now we have a ominous parameter minFramesUntilIdle which, if we want to, we can default to 0 so we don't wait at all (current behavior). With increased documentation and this new parameter, we are giving developers a strong signal that runAsync is working as intended and if they want less flakiness, they should consider changing their code.

Without this change, people are surprised by the behavior of runAsync and try to "fix" it by doing stuff like this:

runAsync(() => // some code that causes a side effect in my widgets); runAsync(() => pumpEventQueue(times: someRandomNumber)); // I found this API that's supposed to help, I guess?

Fiddle with someRandomNumber until it passes. There's no other solution to this that I am aware of.

I am trying to make this part of the API and documentation so at least developers hopefully understand what they are doing and why they are doing it. If they start seeing flakes, they can refer to documentation and see why it is happening.

Replace code above with:

runAsync( () => // some code that causes a side effect in my widgets, minFramesTillIdle: someRandomNumber, // Ah, but now I see this is a synchronization issue. );

As for exposing timers and microtasks, I think that's a problem for an IDE to solve. While debugging these tests, the users should be able to see what kind of timers and futures are now "frozen" because we ended the real async zone. I have a generic bug for this that I am hoping will get some traction.

I tried to save this information and perhaps print a warning message that there are unresolved tasks and such but it is too noisy. It alerts when there's no need to alert (such as no side effect) and it doesn't alert when it needs to (e.g. some IO operation will eventually post a microtask but we have no evidence of that now).

The final thing we agree on: This problem bothers me :-). I think this PR makes things a bit more transparent (but, like you said, not any better).

For exposing timer and microtasks, I meant expose a handle for the user to await for them. The problem is some of our code (image decoding for example) create futures without any reference for test to await for, and that is the main reason people want to do pumpEventQueue and hope those futures get resolved.

It is not just Future's. Image has streams and other subscription mechanisms as well. Is it possible to capture all those in a zone? I know that Future.delayed creates a Timer but I don't think an ImageCompleter stream would create anything that's observable in a zone. I will test this out and get back to you.

We would also need to create a polling mechanism with a completer to track timers and return when all timers return false for https://api.dart.dev/stable/2.8.4/dart-async/Timer/isActive.html as Timers themselves do not provide this functionality AFAIK.

I believe the imageCompleter is just a wrapper over a codex future, the future will either be a wrapper of timer or microtask, but then i think you are right that we need to find a way to flush the timer. flushing microtasks should be as easy as creates await Future.delay(0), but timer is tricky.

I think you meant exposing Timers/Futures etc that we use in the widgets for testing. Even so, there are certain things where we "have to" rely on real async such as file io. Widget tests need to be able to load images from assets which means a certain part of widget testing cannot be in fake async zone.

I am not familiar with file io, why couldn't it run in fake async? The assets loading part should be fine because we mocked the asset channel to read file synchronously

flutter/packages/flutter_test/lib/src/_binding_io.dart

Line 40 in aa0382e

void mockFlutterAssets() {

The reason I brought up fake async is that i am thinking we can move away from writing test in runAsync if we can make the fakeasync in testWidgets expose the timer and microtask.

There has been numerous discussions about trying to run everything in fake async. See related issue: #23179 and specifically: #23179 (comment). For this to happen, we need to enumerate all the possible things that Flutter widgets might do and provide sync implementations of them (much like you did for file loading). I am not sure how feasible that is.

Even if it was feasible, do you think we can still make progress in making runAsync more robust and chase that problem separately? We should seek incremental and innovative solutions together. If we get rid of runAsync at some point, I would be very happy but it does not seem possible for the time being.

I see yes you are right, I don't have an idea to fix the issue you mentioned neither. i agree we should make runAsync more robust at least for now.

I updated the PR. I still had to introduce some "wait time" in there because microtasks can be scheduled by any async operation "back to back". However, I feel better about this approach because it is more deterministic and easy to explain. Let me know what you think.

chunhtai · 2020-06-26T21:04:42Z

packages/flutter_test/lib/src/binding.dart

The purpose of additional time is to increase the timeout value for the test section, I think we should not overload its usage. As of the current implementation it both increases the wait time for the timer/microtask and also extends the timeout.

It also raise another question, what if there are unresolved futures that user does not care about? currently it will always fail if there are unresolved futures after the additionalTime.

How about we expose a new api, something like tester.resolveTimerAndMicrotask(). this api create a periodic timer just like you do here, but the sideEffectCompleter will only resolve when there are no more timercache and microtasks.
In the test, the developer can decide if they want to wait for the future by calling await tester.resolveTimerAndMicrotask(). If they have a bad test that creating unresolved timer or microtask, it will just hang.

We don't do any waiting at the runAsync tear down, it will be developers choices to decide what they want to do.

The purpose of additional time is to increase the timeout value for the test section, I think we should not overload its usage. As of the current implementation it both increases the wait time for the timer/microtask and also extends the timeout.

Fair point. I can introduce a new parameter and also use this parameter for signaling that runAsync() should not wait for additional futures.

waitForIdle: const Duration(seconds: 2),

I would rather not introduce a new API. I would expect, in the normal case, for users to actually care about the Futures that their widget code introduces in runAsync. They would not be using runAsync otherwise. We should optimize for that most common case. Once this solution makes it to SO, everyone will simply copy and use them together anyway. The APIs being separate will lose its meaning.

I agree with you that there will be advanced cases where these futures should not be awaited but then we are getting into highly custom test code category. Users can then create a future that they want to await and do that. I will put all this in actionable error messages.

// Do not wait for idle. await runAsync(() => some code, waitForIdle: null); await runAsync(() => await someSpecialFuture);

Another reason: The interplay between the two APIs would need to be carefully covered. For instance what if this happens:

await runAsync; await resolveTimers; // What if runAsync runs into error. Now the user needs to check return value and make sure there was no error in the async block. // // some code later // await resolveTimers; // Does this throw? Does it just return? Either option is error prone.

Let me know what you think.

I think i agree with most of it, but there are some more concerns i have.

waitForIdle: const Duration(seconds: 2),

I think we don't need the duration here. this can be a true false value, and the test will either resolve the timer/mt or just timeout the test section.

I agree with you that there will be advanced cases where these futures should not be awaited but then we are getting into highly custom test code category. Users can then create a future that they want to await and do that. I will put all this in actionable error messages.

I think part of this is also we have some unexposed future in our built-in widget as well (for example the image decoding in Image widget or localization asset loading). unless we expose those future in the widget state, user will not be able to resolve them If they want to test the result after those async load. That is why I kinda want to have a separate api to resolve this future. However, I am ok with not provide an API too since we can use the runAsync. waitForIdle to resolve the future. It just the test code maybe a bit more complicated.

Another reason: The interplay between the two APIs would need to be carefully covered. For instance what if this happens:

I will expect resolveTimers called without an async zone (not in the runAsync callback) will just throw an error?

I see what you mean now. I thought you were proposing a separate API to await after runAsync was done.

Yes, providing a helper function to call inside runAsync callback is tempting. I still prefer the default behavior to be more robust because when things fail today, they fail without giving developers any direction on what might be the problem (such as defining a custom task future to await inside runAsync()).

I will think about this a bit more. Running this PR over google3 broke 24 tests. At least 2 of them were pre-existing bugs in the tests themselves that this uncovered! However, many are due to premature timeouts. Perhaps I will have to do what you are proposing after all @chunhtai and expose the periodic timer as a Future for apps to listen on. I will dive deep into some use cases and see what I missed. The good news is there are at least 9000 tests that exercise realAsync and only 24 failed.

Latest results: 10 failures.

2 due to pre-existing bugs in tests.

4 due to the tested code creating a periodic timer that never resolves.

I am inclined to warn people about this rather than awaiting on them.

4 due to running a lot of tests in a single file. This is the problematic one. Since we now wait up to 500ms to make sure there are no new microtasks created, each test (that uses runAsync) is lengthened by half a second which means if you are running 120+ tests, your test now runs at least a minute longer.

Due to #2 and #3, I am now leaning towards your proposal to expose this as a separate method. It would be nice to make the default method more robust but it should not come at an expense of causing half a second delay per test.

I will put in some robust documentation as well so people can make an informed decision on when to use this API.

chunhtai · 2020-06-26T21:04:54Z

packages/flutter_test/lib/src/binding.dart

debug statement

Will be removed once we agree on implementation :-)

chunhtai · 2020-06-26T21:05:42Z

packages/flutter_test/lib/src/binding.dart

where is this used?

Will be removed once we agree on implementation.

chunhtai · 2020-06-27T00:18:40Z

This may change the direction on our testing framework, I would like to get some more attentions. cc @goderbauer @jonahwilliams

mehmetf · 2020-06-29T00:49:33Z

OK. The main implementation is somewhat done, please take a look. If all of you say "aye", I will fix the documentation, add better error messages add a couple of more tests (such as calling resolveFutures in fake async zone).

chunhtai

looks aye to me

packages/flutter_test/lib/src/widget_tester.dart

mehmetf · 2020-07-01T16:13:38Z

Gentle ping :-) I would like to work on bringing this to g3 over the break so I can properly fix client code.

goderbauer

I am not really comfortable with exposing an API where the recommendation is to not use it (https://github.com/flutter/flutter/wiki/Style-guide-for-Flutter-repo#avoid-apis-that-encourage-bad-practices). I am afraid that people will just use that as an escape hatch to pretend that the synchronization issues are resolved even though the underlying flakiness is still there and may cause trouble at any point in the future again.

I agree that we should document these problems, though, and tell people what they should do instead (e.g. restructure your code so you don't need runAsync or expose relevant futures so they can be awaited properly).

packages/flutter_test/lib/src/binding.dart

packages/flutter_test/lib/src/widget_tester.dart

goderbauer · 2020-07-01T20:27:56Z

packages/flutter_test/lib/src/widget_tester.dart

Adding API that we do not recommend to be used seems ... wrong.

I would also love to add "not recommended" to runAsync FWIW :-)

what's wrong with runAsync?

Zones.. testWidgets() has two zones, which itself is tough to grasp. runAsync within testWidgets ups that to three. Most developers don't understand zones and the side effects are bizarre and difficult to debug.

Makes it too easy to run into synchronization problems which is another thing that's difficult to debug.

Sorry for being generic. I can probably compile concrete examples for these. These are some of the difficulties I myself have run into and seen other developers struggle with.

packages/flutter_test/lib/src/widget_tester.dart

mehmetf · 2020-07-01T21:07:46Z

I understand your concern Michael but I don't think documentation is going to cut it. At the end of the day, "stuff" happens in runAsync that developers do not understand. I would like to expose the internals of that (by exposing microtasks and timers to them) so it is easier to trace the code and understand what it is doing.

Originally I wanted runAsync to simply "handle" it for the user but I agree that that's more like hiding the problem. resolveFutures() though is pretty transparent in what it is doing.

goderbauer · 2020-07-01T21:16:35Z

I would like to expose the internals of that (by exposing microtasks and timers to them) so it is easier to trace the code and understand what it is doing.

I would be fine with exposing the problem to them, but to me resolveFutures seems to suggest that it can solve the problem even though it just moves the source of flakiness to a different place.

chunhtai · 2020-07-01T21:17:50Z

One thing the documentation may not be enough is that we have the exact same problem in some of our widgets. There is currently no way to properly test those widgets

mehmetf · 2020-07-01T21:56:35Z

What if we simply exposed the Timer and Microtask queue to widget_tester API?

This can be done by passing a real async state object to the runAsync callback but it would be a breaking change.

mehmetf · 2020-07-01T23:57:17Z

I don't fully like exposing these queues either. It feels too low level and the APIs are dangerous. What would users do with this information? There are so many ways to use them in the wrong way so they feel more like a foot gun.

I like the current PR because I don't feel like we are moving flakiness from one side to another, it feels more like exposing it in an easier-to-digest manner. As I was showing this code to a couple of colleagues, they commented that at least now they understand the issue is synchronization.

I reckon this is the series of decisions made by a developer using this:

Use runAsync.
Everything works... great!
If not, Google it or read documentation. Learn about synchronization.
- Make the code more testable... great!
- If not, find the API called resolveFutures(). Does it work?.. Great!
  - If not, look at the implementation and read its documentation. Put some breakpoints and see why the test is failing. I can now put breakpoints in resolveFutures(). Aha! I see that the task I was waiting for is not being executed. Trace where it was created, expose its Future and now my test works!

Prior to this API, microtasks and timers are just an implementation detail which was easily missed (I have seen it happen). Now this method is bringing them forward (but not exposing them as an API, which is dangerous). The users will be "led" to investigate these and figure out what's wrong with the code they are trying to test.

If you disagree with this, that's fine. I learned a lot along this discussion and I will create a mini version of this library in google3 since I can create a child zone inside the realAsync zone and capture these callbacks myself. I still believe in the value of this API (see above) but if you disagree and you don't want it to be part of the testing framework, I understand.

mehmetf · 2020-07-02T16:08:39Z

@Hixie for the final call on this. Ian, this is related to the thread I had created with subject Is there a pumpAndSettle() equivalent for "real" async work?.

Relevant comments:
#60136 (comment)
#60136 (review)
#60136 (comment)

Let me know what you decide. Thanks!

packages/flutter_test/lib/src/binding.dart

packages/flutter_test/lib/src/widget_tester.dart

Hixie · 2020-07-05T10:10:09Z

As @chunhtai, @goderbauer, and the style guide already pointed out: this is an API to make your test guaranteed flaky. We should not add such an API.

Hixie · 2020-07-05T10:16:41Z

I don't really understand the problem being solved by this PR. If you have an async API, then you should wait until it's done. Why would you need to wait on timers or whatnot? Just have the API tell you when it's done, and await that.

mehmetf · 2020-07-05T14:22:56Z

This is an API to make your test guaranteed flaky. We should not add such an API.

This API won't change flakiness. Your test will still be flaky if it was already flaky before. This makes your next question a very good one.

I don't really understand the problem being solved by this PR. If you have an async API, then you should wait until it's done.

Developers often have to do unholy amounts of debugging just to figure out what async API is causing flakiness. This PR brings that problem to front and center. If you set breakpoints in an IDE to the places where timers/microtasks are added and where the timer repeatedly checks their status, you can figure out where the source of flakiness is coming from and understand what's happening. I agree that this is a lousy way, which is why I left the final decision with you. I was told the better way (flutter/devtools#2152) is not feasible in the near future.

I will strip this PR off of all APIs and make it only about documentation. I believe we all agree that's a good idea.

mehmetf · 2020-07-05T20:15:58Z

Done. I incorporated the doc comments and left this only as an additional prose for runAsync method itself. Thanks for the discussion!

goderbauer

LGTM

goderbauer · 2020-07-06T16:32:27Z

packages/flutter_test/lib/src/binding.dart

nit: maybe we should restructure this block a little bit for clarity and lead with "if you're experiencing a hanging test, ". I imagine, if you scan the docs for a solution, it is because of a hanging test and you don't know yet that it is because of an un-completed task.

fluttergithubbot · 2020-07-06T21:27:38Z

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat.

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

fluttergithubbot · 2020-07-06T22:29:02Z

This pull request is not suitable for automatic merging in its current state.

The status or check suite docs-linux has failed. Please fix the issues identified (or deflake) before re-applying this label.

mehmetf · 2020-07-07T03:19:51Z

I am not sure why dartdoc is failing with this PR. I am inclined to say that it is unrelated but since this is a documentation change, I am hesitant. Does anyone know what's up with it?

dartdoc:stderr: dartdoc failed: type 'FunctionTypeElementType' is not a subtype of type 'DefinedElementType'
dartdoc:stderr: package:dartdoc/src/model/method.dart 102:39                                    Method.modelType
dartdoc:stderr: package:dartdoc/src/model/method.dart 99:34                                     Method.linkedReturnType
dartdoc:stderr: dart:mirrors                                                                    _InstanceMirror.getField
dartdoc:stderr: package:mustache/src/renderer.dart 245:29                                       Renderer._getNamedProperty

goderbauer · 2020-07-07T04:06:35Z

Can you rebase this to the latest master? This is supposed to be fixed in the latest dartdoc version according to #60719.

Hixie · 2020-07-07T06:39:59Z

packages/flutter_test/lib/src/widget_tester.dart

+  ///
+  /// If your widget test hangs and you are using [runAsync], chances are your
+  /// code depends on the result of a task that did not complete. Fake async
+  /// environment is unable to resolve a future that was created in [runAsync].


"The fake async environment..."

Hixie · 2020-07-07T06:42:05Z

definitely +1 to adding docs

Hixie · 2020-07-07T06:43:22Z

I'm interested in learning more about how you used your original proposal here in the context of debugging a test. I'm definitely open to having better ways to debug flaky tests. Maybe this would be a good topic to discuss at an API design review meeting?

mehmetf · 2020-07-09T04:09:04Z

Thanks Ian. I will share an internal CL with you where I implemented a different approach. That also contains the API that I proposed here but with debug statements.

mehmetf · 2020-07-09T04:57:59Z

Here's the internal CL: cl/319879256

fluttergithubbot added a: tests "flutter test", flutter_test, or one of our tests framework flutter/packages/flutter repository. See also f: labels. labels Jun 23, 2020

googlebot added the cla: yes label Jun 23, 2020

mehmetf requested a review from chunhtai June 23, 2020 22:14

chunhtai reviewed Jun 24, 2020

View reviewed changes

chunhtai reviewed Jun 26, 2020

View reviewed changes

mehmetf changed the title ~~runAsync attempts to drain the microtask queue~~ Track tasks created during runAsync Jun 29, 2020

chunhtai reviewed Jun 29, 2020

View reviewed changes

packages/flutter_test/lib/src/widget_tester.dart Outdated Show resolved Hide resolved

goderbauer reviewed Jul 1, 2020

View reviewed changes