-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Chaos mode MVP: Skip branch optimization in MachBuffer #6039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I suspect that might be the case. The In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works) For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way. |
|
Thanks so much for making a start on this problem -- the infrastructure will surely pay off in a bunch of ways! I've read over most of this prototype; but before a detailed line-by-line review, I wanted to offer some high-level design feedback. I think it might be related to (or rather, might address) the issue with the
Does that make some sense at least? Basically, I want to push this all toward a more idiomatic Rust ownership model, which I think will have the side-effect of removing nondeterminism and making compilation a true pure function of a "control" input for one function body. |
|
Thank you both for the great feedback! I totally agree that a more idiomatic Rust ownership model would be much better. I remember starting out with mutable references and running into compiler errors about mutable aliasing. I was concerned there could be way too many of these issues while threading the control plane through every code path, and I didn't know how difficult it would be. I probably was scared away too easily. I'll have another go at implementing it that way. I understand that the use of // Take lengths from the end of the data, since the `libFuzzer` folks
// found that this lets fuzzers more efficiently explore the input
// space.I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse 🤔 |
|
Yeah, that's fair -- I suppose it's not the worst thing in the world to take a |
Another question that came up while we tried that at the beginning was about the size of this. A simple test indicates that references to zero-sized types are not optimized away: struct Foo;
println!("{}", std::mem::size_of::<&Foo>()); // prints 8...Am I missing something with that? How can we be certain that the performance of release builds is not affected? |
|
Instead of passing |
|
I think I'm misunderstanding something, the function signature you're describing looks like this, right? (ignoring the lifetimes) impl ControlPlane {
fn as_mut(&mut self) -> Self {
// ?
}
}I don't know how this function can be written without reference counting? |
|
You did do something like struct ControlPlane<'a>(&'a mut ControlPlaneInner);
impl ControlPlane<'_> {
/// Reborrow `ControlPlane`.
fn as_mut(&mut self) -> ControlPlane<'_> {
ControlPlane(/*this does an implicit reborrow*/self.0)
// Equivalent to:
// ControlPlane(&mut *self.0)
}
}And then use it like fn example(control_plane: ControlPlane<'_>) {
foo(control_plane.as_mut());
bar(control_plane.as_mut());
baz(control_plane);
}Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5 |
cranelift/chaos/src/enabled.rs
Outdated
| struct ChaosEngineData { | ||
| /// # Safety | ||
| /// | ||
| /// This field must never be moved from, as it is referenced by | ||
| /// the field `unstructured` for its entire lifetime. | ||
| /// | ||
| /// This pattern is the ["self-referential" type]( | ||
| /// https://morestina.net/blog/1868/self-referential-types-for-fun-and-profit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a safety perspective, the other extremely important detail of this trick is that ChaosEngineData must also never move once references to engine.data are taken. so the "this must not move"-ness of data kind of percolates through to any enclosing type until it's somewhere that won't move (which works out here because ChaosEngineData ends up owned by an Arc where it oughtn't be moved out of.
as an example that certainly won't come up here but would be Technically Possible, Arc::new(some_chaos_engine.data.try_unwrap().expect("that there is one reference in this example")) would yield a ChaosEngineData whose unstructured points to somewhere else, and would (hopefully! :D) fault on use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! Does it also apply if some type is heap allocated? I think the article I got this from used a Box to create a level of indirection. The idea being that if the Box itself is moved, the values on the heap won't. So any existing references into that heap allocation would still be valid. In this case, the Vec is supposed to serves the same purpose as the Box in the article.
That being said, I just noticed that I got the order of the fields wrong, which the article warns against. data will be dropped before unstructured, which creates a dangling pointer and UB.(?) oops 😄 I definitely prefer a safe solution as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaand reading a bit further, I also forgot the thing about AliasableBox so you're definitely right, moving data would also be UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's the part that makes the blog post's solution a little more robust to moves - a &AliasableBox<ZipArchive<File>> could be made to dangle, but with private internals you can ensure that wouldn't happen. anyway, hopefully threading a &mut ControlPlane through the compiler as appropriate lets you avoid the whole construction, and double-hopefully the extra arguments don't affect compile time all that much :)
I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific I think I want the design to follow these principles:
This has a few nice properties:
Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain (structured) data. That leads to less friction with Rust's ownership model as well as more determinism and control. IMHO the "reimplement Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter |
2de2de3 to
be761cf
Compare
|
The rename seems uncontroversial, so I did that right on this branch.
From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison. Usage of
|
|
I would prefer if we went with a In addition to the disadvantages you named, using a But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a
is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a Anyway, given all that, I would really strongly prefer the suggestions I gave above: a |
|
just in the interest of being explicit: i think part of the feedback here is, if there is an overhead for passing (also, i think the trick about holding a this happens to make me think that passing around a |
be761cf to
727982e
Compare
|
The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target |
cfallin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable so far -- thanks for all of the efforts!
Some comments from our discussion just now; also, in the fuzz testcase toplevel, I think it might be slightly clearer to have TestCase::functions be a Vec<(Function, ControlPlane)>. You may need to clone the control plane or take ownership / destruct the TestCase to get a mut ControlPlane but that should be a reasonable refactor I think.
|
|
||
| for inst in mem_insts.into_iter() { | ||
| inst.emit(&[], sink, emit_info, state); | ||
| inst.emit(&[], sink, emit_info, state, ctrl_plane); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctrl_plane can probably go inside the state (EmitState)?
The issue with lifetimes that this would otherwise create (&mut ControlPlane inside of the struct) can be resolved I think by std::mem::move to take ownership of the control plane temporarily in places where we emit.
cranelift/codegen/src/context.rs
Outdated
| pub want_disasm: bool, | ||
|
|
||
| /// TODO chaos: is this the right location to hold ownership? | ||
| pub ctrl_plane: ControlPlane, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably better to pass in the control-plane state with each call to compile; the CompilerContext is otherwise not that semantically meaningful (meant to enable reuse).
| if let Some(true) = ctrl_plane.get_decision() { | ||
| println!(""); | ||
| println!(""); | ||
| println!("branch optimizations skipped"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(remove debugging printlns before merging)
cranelift/control/src/chaos.rs
Outdated
| /// The control plane of chaos mode. | ||
| /// Please see the [crate-level documentation](crate). | ||
| /// | ||
| /// **Clone liberally!** The chaos engine is reference counted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated comment?
cranelift/control/src/chaos.rs
Outdated
| // backtrace::Backtrace::force_capture() | ||
| //); | ||
| //None | ||
| panic!("trying to get a decision from a noop chaos engine"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this is_noop mechanism before merging.
cranelift/control/src/chaos.rs
Outdated
| } | ||
|
|
||
| /// TODO chaos: should be explained | ||
| pub fn no_chaos() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this body makes sense as the Default impl (an empty ControlPlane should have no affect on Cranelift's behavior as it is today -- this also implies how to use bools, i.e. false should make no change).
cranelift/control/src/chaos.rs
Outdated
| } | ||
|
|
||
| impl ControlPlane { | ||
| pub fn get_decision(&mut self) -> Option<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would return just bool (use .unwrap_or(false) on the pop).
Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
…ack by reference Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Remo Senekowitsch <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
cfallin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small things below -- we're almost there! Thanks for your patience here!
| /// mutable reference to it may be used to: | ||
| /// - move out of it, e.g. using [std::mem::swap] | ||
| /// - access that control plane temporarily | ||
| fn get_ctrl_plane(&mut self) -> &mut ControlPlane; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- Usually a mut-accessor will be named like
fn ctrl_plane_mut(&mut self) -> ... - Let's have a different one too,
fn take_ctrl_plane(self), that consumes the emit-state and gives us back the control-plane state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn take_ctrl_plane(self)
I think that actually caught a mistake. I was taking the control plane out of the emission state inside a loop, where later loop iterations would use the state with a now-empty control plane.
| /// The default value `false` will always be returned if the | ||
| /// pseudo-random data is exhausted or the control plane was constructed | ||
| /// with `default`. | ||
| pub fn get_decision(&mut self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small tweak to the conditional-compilation strategy: I had been thinking that we could have the methods that produce decisions, like get_decision here, return a default value (false here) as a constant in the non-chaos-feature case; then the sites where we use these decisions, like in MachBuffer, don't require annotation with conditional compilation. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about the potential performance impact in release builds, but I guess it's safe to assume the compiler inlines a constant false and removes the resulting if false {}.
In my view, it would be a nice aspect of the control plane that there is no way to (mis-)use it in regular builds. But I guess that every control plane API needs to have some default output value anyway... and that can probably always be inlined as well? And we can annotate these default-returning functions with #[inline].
What is the downside of conditional compilation at the call sites? It seemed like an easy way to be really, really sure nothing bad happens in release builds, but on second thought, it doesn't seem necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we should be able to trust the branch-folding here.
What is the downside of conditional compilation at the call sites?
The main downside is that it spreads the implementation of a conditional decision across distributed points -- the alternative, where everything is wired to a single module where all conditional-compilation logic lies, makes it easier to make changes in the future. (Another example of this principle in action is the memfd pooling-allocator mechanism in Wasmtime: when I implemented this in #3697 last year I originally had feature-conditional code in many places, but Alex convinced me to centralize everything into two versions of one module and remove conditionals everywhere else. The result is far cleaner!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this goes for the Arbitrary implementation as well, so I removed the conditional compilation here too.
The shim control plane's Arbitrary implementation now returns the default without consuming any bytes.
Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
Also cleanup a few straggling dependencies on cranelift-control that aren't needed anymore. Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
cfallin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patience -- this all looks good now, and I think is a solid base for the remainder of the chaos-mode work.
|
As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on |
|
@remlse it looks like we'll need to add the new crate to a list in |
| &mut self, | ||
| func: FuncId, | ||
| ctx: &mut Context, | ||
| ctrl_plane: &mut ControlPlane, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think define_function should get this argument. If you need this fine control you should probably use define_function_bytes instead. This doesn't work with a module that serializes functions rather than immediately compiles them and it is confusing for most users of cranelift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the place we actually needed that argument. So that would have to be rewritten with define_function_bytes. The module there is a JITModule and its define_function and define_function_bytes methods are not trivial, so it's not obvious to me how to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cranelift-object implements define_function as ctx.compile_and_emit(self.isa(), &mut code, ctrl_plane) followed by define_function_bytes. You could do the same in TestFileCompiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjorn3 in general the approach we've been taking is to thread through the control-plane everywhere compilation can be invoked; conceptually it's now another input along with the CLIF. (It does have a Default implementation.) If there's a way to rename this variant to a third option, and then have a variant that uses a default control plane, we can perhaps do that. Would you be willing to do that in a followup PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a usability perspective having another method would work. But when serializing rather than compiling, a ControlPlane argument doesn't really make any sense as you can't serialize ControlPlane. (I have local changes to make a serializing Module which I want to upstream. I'm using it to allow using cranelift-interpreter in cg_clif with minimal changes to cg_clif.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why serialization of modules implies the need to serialize a ControlPlane -- it is given as an argument, it isn't stored -- but please do create an issue or PR with a fix if you have one in mind. In the meantime I'll go ahead and merge this PR (which has been under review for a while and we have general consensus on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the passed in ControlPlane should affect the eventual compilation of the function, it did need to be serialized. If not, there it doesn't really make much sense to pass in ControlPlane.
In the meantime I'll go ahead and merge this PR (which has been under review for a while and we have general consensus on).
👍
prtest:full Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
0fe0d00 to
bc76cae
Compare
Co-authored-by: Falk Zwimpfer <[email protected]> Co-authored-by: Moritz Waser <[email protected]>
The conflict was that all the versions were bumped from 0.95 to 0.96. I changed the version of cranelift-control accordingly.
|
@cfallin the full tests are passing now, maybe we can try to merge again? |
This is a draft of the MVP for chaos mode (#4134).
Edit: The implemented fuzz target changed to
cranelift-fuzzgen.It extends the fuzz target
cranelift-icachefor now, by allowing it to run with the featurechaosenabled. This will pseudo-randomly toggle branch optimization inMachBuffervia the new chaos mode control plane in the cratecranelift-chaos.Quick command for the documentation:
Running the fuzz target with chaos mode enabled:
Passing a reference counted chaos engine around is not that bad, the diff is less noisy than I would've expected. I'm still planning to make an equivalent POC with private, global, mutable state in the
cranelift-chaoscrate to get a better idea of the trade-offs.Note that because of this zulip topic, I didn't bump the version of
arbitraryin this PR to keep those issues isolated. Once that's resolved, we think it's probably a good idea to updatearbitrarywhile we're working with it.I've added a couple print statements during development, and it seems the branch optimization is more often carried out than skipped. I guess this is consistent with libfuzzer's goal of generating data in a way that code coverage is maximized.
I also ran into a crash while running this fuzz target. The crash happens at
cranelift-icache.rs:220:Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar? In any case, I'll investigate to see if the panic is caused by my changes or something different.
Questions
ArcandMutexinstead ofRcandRefCellin the control plane, because the compiler was complaining about theSendtrait not being implemented. So if Cranelift runs in parallel, won't that interfere with our plans with the fuel parameter? If fuel from the chaos engine is requested in a different order every time, we won't be able to deterministically reproduce bugs and pinpoint their origin.-> answer:
ArcandMutexmust not be used.ChaosEngine::todo()s in the wild. Is it ok to merge these in principle or should we find a different solution for adding the chaos engine everywhere incrementally? -> these have been removedTodos
Base64: Av////////8AAAIAAAAAAAD5jIyMjAAKAAAAAPHx8fERDgcAAAAAAJkBAAAAAAAAKwBp/5r//wAAAAAAAAAHbS45azEAAAAACF0=)-> most likely due to usage of
ArcandMutexcargo fuzz run --features chaos $TARGET)