Skip to content

Conversation

@senekor
Copy link
Contributor

@senekor senekor commented Mar 17, 2023

This is a draft of the MVP for chaos mode (#4134).

Edit: The implemented fuzz target changed to cranelift-fuzzgen.

It extends the fuzz target cranelift-icache for now, by allowing it to run with the feature chaos enabled. This will pseudo-randomly toggle branch optimization in MachBuffer via the new chaos mode control plane in the crate cranelift-chaos.

Quick command for the documentation:

cargo doc -p cranelift-chaos --document-private-items --open

Running the fuzz target with chaos mode enabled:

cargo fuzz run --no-default-features --features chaos cranelift-icache

Passing a reference counted chaos engine around is not that bad, the diff is less noisy than I would've expected. I'm still planning to make an equivalent POC with private, global, mutable state in the cranelift-chaos crate to get a better idea of the trade-offs.

Note that because of this zulip topic, I didn't bump the version of arbitrary in this PR to keep those issues isolated. Once that's resolved, we think it's probably a good idea to update arbitrary while we're working with it.

I've added a couple print statements during development, and it seems the branch optimization is more often carried out than skipped. I guess this is consistent with libfuzzer's goal of generating data in a way that code coverage is maximized.

I also ran into a crash while running this fuzz target. The crash happens at cranelift-icache.rs:220:

if expect_cache_hit {
    let after_mutation_result_from_cache = icache::try_finish_recompile(&func, &serialized)
        .expect("recompilation should always work for identity");
    assert_eq!(*after_mutation_result, after_mutation_result_from_cache); // <-- this assert fails

Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar? In any case, I'll investigate to see if the panic is caused by my changes or something different.

Questions

  • I needed to use Arc and Mutex instead of Rc and RefCell in the control plane, because the compiler was complaining about the Send trait not being implemented. So if Cranelift runs in parallel, won't that interfere with our plans with the fuel parameter? If fuel from the chaos engine is requested in a different order every time, we won't be able to deterministically reproduce bugs and pinpoint their origin.
    -> answer: Arc and Mutex must not be used.
  • Did I get the "paperwork" right?
    • version 0.95.0 like other cranelift crates
    • license
    • Cargo.toml
    • ... etc. ?
  • There are a several ChaosEngine::todo()s in the wild. Is it ok to merge these in principle or should we find a different solution for adding the chaos engine everywhere incrementally? -> these have been removed

Todos

  • Add appropriate explanations to the commit messages
  • Investigate crash (Base64: Av////////8AAAIAAAAAAAD5jIyMjAAKAAAAAPHx8fERDgcAAAAAAJkBAAAAAAAAKwBp/5r//wAAAAAAAAAHbS45azEAAAAACF0=)
    -> most likely due to usage of Arc and Mutex
  • Extend existing fuzzing documentation with an overview of chaos mode as well as how to run a target with chaos mode enabled (cargo fuzz run --features chaos $TARGET)

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Mar 17, 2023
@afonso360
Copy link
Contributor

afonso360 commented Mar 17, 2023

Maybe someone has an intuition along the lines of: "Oh yes, of course that will fail when branch optimization is randomly skipped", or similar?

I suspect that might be the case. The icache fuzzer tests that our function cache is working properly. It does this by generating a random function, applying a random mutation and then recompiling and checking if the cache key matches or not.

In this case what I suspect might be happening is that It selected that no mutation should be applied and the cache key should be valid. But if we skip branch opts in the recompile and not in the original compilation or vice-versa, then it should panic! (This is a guess, I'm not too familiar with how our caching mechanism works)

For icache it would be nice to reset the chaos engine bytes on the second recompile so that we make the same decisions along the way.

@cfallin
Copy link
Member

cfallin commented Mar 17, 2023

Thanks so much for making a start on this problem -- the infrastructure will surely pay off in a bunch of ways!

I've read over most of this prototype; but before a detailed line-by-line review, I wanted to offer some high-level design feedback. I think it might be related to (or rather, might address) the issue with the icache fuzzer above too.

  • I'd prefer if we actually generalize just a little bit more and call the struct that is threaded through the compilation pipeline the ControlPlane; this abstracts over randomized decisions (chaos mode) but also optimization fuel and maybe user-directed optimization instructions and the like later. So, would you mind renaming both the types and the crate itself (maybe cranelift-control, though bikeshed comments welcome here)?
  • In general, we want to avoid unsafe code as much as possible. The "self-referential struct trick" wherein there's a Vec and then there's an Unstructured that holds a borrow to that Vec's contents, transmuted to erase the lifetime and get a &'static, is simply too unsafe for us to include in Cranelift code...
  • ... fortunately, though, I suspect it's not really necessary. I suspect that much of the trouble comes from trying to hold ownership of the ChaosEngine in various places, rather than threading it through the stack as needed. I suspect it should be possible, in principle to have a ControlPlane such that:
    • We have &mut self methods that ask it for decisions;
    • We pass a &mut ControlPlane into toplevel compilation entry points, and down the stack as necessary;
    • If we need to (e.g. when building the Lower context), we can store a &'a mut ControlPlane, but that 'a is internal to the compiler, and not exposed to the top-level entry point.
    • This ControlPlane should itself not have any lifetime parameters, ideally. I suspect that means we don't actually hold an Unstructured, since it expects to borrow data owned elsewhere; rather for now we can implement our own sort of equivalent, where we keep e.g. a Vec<bool> and take decisions off the back of it. (e.g., self.chaos_decisions.pop().unwrap_or(false) or something similar.)
    • We can then impl Arbitrary for ControlPlane, perhaps just the automatic derivation, and use this when fuzzing: take an arbitrary Function and compile it with an arbitrary ControlPlane.
  • The sharing of the ChaosEngine via an Arc, and the Mutex and such, are another red flag and warning that we're going down a nondeterministic path. I suspect this is a contributing factor to the icache fuzzer issues above. The issue is that it allows different parts of the compiler to pull random choices in some potentially unordered way, potentially across threads, and that gives up reproducibility. Instead, we should have one ControlPlane per function, because that is the unit of compilation in Cranelift. At the fuzzer level, we can easily ask for a Vec<ControlPlane>; that automatically impls Arbitrary if ControlPlane does.

Does that make some sense at least? Basically, I want to push this all toward a more idiomatic Rust ownership model, which I think will have the side-effect of removing nondeterminism and making compilation a true pure function of a "control" input for one function body.

@senekor
Copy link
Contributor Author

senekor commented Mar 17, 2023

Thank you both for the great feedback!

I totally agree that a more idiomatic Rust ownership model would be much better. I remember starting out with mutable references and running into compiler errors about mutable aliasing. I was concerned there could be way too many of these issues while threading the control plane through every code path, and I didn't know how difficult it would be. I probably was scared away too easily. I'll have another go at implementing it that way.

I understand that the use of unsafe for storing an Unstructured is a non-starter. The nice thing about it would've been that we get all the functionality of arbitrary for free and the API of the control plane would be very flexible with little effort. I'm still feeling a bit unsure about taking the manual approach. Not primarily because it would be more work, but because I expect the work would in many cases just be reimplementing Unstructured. And probably even in a worse way: Casually reading the source code of arbitrary, I'm seeing comments like this:

// Take lengths from the end of the data, since the `libFuzzer` folks
// found that this lets fuzzers more efficiently explore the input
// space.

I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?

I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse 🤔

@cfallin
Copy link
Member

cfallin commented Mar 17, 2023

Yeah, that's fair -- I suppose it's not the worst thing in the world to take a ctrl_plane: &mut ControlPlane<'_> everywhere and then keep an Unstructured inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!

@senekor
Copy link
Contributor Author

senekor commented Mar 18, 2023

We have &mut self methods that ask it for decisions

Another question that came up while we tried that at the beginning was about the size of this. A simple test indicates that references to zero-sized types are not optimized away:

struct Foo;
println!("{}", std::mem::size_of::<&Foo>()); // prints 8...

Am I missing something with that? How can we be certain that the performance of release builds is not affected?

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 18, 2023

Instead of passing &mut ControlPlane<'_> you could pass ControlPlane<'_> and add a as_mut() method which takes &mut self and returns another ControlPlane<'_> which can be used to pass ControlPlane<'_> to multiple functions. Then ControlPlane<'_> could be a mutable reference during fuzzing and a ZST when not fuzzing.

@senekor
Copy link
Contributor Author

senekor commented Mar 18, 2023

I think I'm misunderstanding something, the function signature you're describing looks like this, right? (ignoring the lifetimes)

impl ControlPlane {
    fn as_mut(&mut self) -> Self {
        // ?
    }
}

I don't know how this function can be written without reference counting?

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 18, 2023

You did do something like

struct ControlPlane<'a>(&'a mut ControlPlaneInner);

impl ControlPlane<'_> {
    /// Reborrow `ControlPlane`.
    fn as_mut(&mut self) -> ControlPlane<'_> {
        ControlPlane(/*this does an implicit reborrow*/self.0)
        // Equivalent to:
        // ControlPlane(&mut *self.0)
    }
}

And then use it like

fn example(control_plane: ControlPlane<'_>) {
    foo(control_plane.as_mut());
    bar(control_plane.as_mut());
    baz(control_plane);
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5cde2151f4f2b1c6a9ad5ea959b2f5f5

Comment on lines 8 to 15
struct ChaosEngineData {
/// # Safety
///
/// This field must never be moved from, as it is referenced by
/// the field `unstructured` for its entire lifetime.
///
/// This pattern is the ["self-referential" type](
/// https://morestina.net/blog/1868/self-referential-types-for-fun-and-profit)
Copy link
Contributor

@iximeow iximeow Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a safety perspective, the other extremely important detail of this trick is that ChaosEngineData must also never move once references to engine.data are taken. so the "this must not move"-ness of data kind of percolates through to any enclosing type until it's somewhere that won't move (which works out here because ChaosEngineData ends up owned by an Arc where it oughtn't be moved out of.

as an example that certainly won't come up here but would be Technically Possible, Arc::new(some_chaos_engine.data.try_unwrap().expect("that there is one reference in this example")) would yield a ChaosEngineData whose unstructured points to somewhere else, and would (hopefully! :D) fault on use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! Does it also apply if some type is heap allocated? I think the article I got this from used a Box to create a level of indirection. The idea being that if the Box itself is moved, the values on the heap won't. So any existing references into that heap allocation would still be valid. In this case, the Vec is supposed to serves the same purpose as the Box in the article.

That being said, I just noticed that I got the order of the fields wrong, which the article warns against. data will be dropped before unstructured, which creates a dangling pointer and UB.(?) oops 😄 I definitely prefer a safe solution as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaand reading a bit further, I also forgot the thing about AliasableBox so you're definitely right, moving data would also be UB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's the part that makes the blog post's solution a little more robust to moves - a &AliasableBox<ZipArchive<File>> could be made to dangle, but with private internals you can ensure that wouldn't happen. anyway, hopefully threading a &mut ControlPlane through the compiler as appropriate lets you avoid the whole construction, and double-hopefully the extra arguments don't affect compile time all that much :)

@cfallin
Copy link
Member

cfallin commented Mar 18, 2023

I'm sure we would miss such fuzzing specific optimizations if we reimplemented it ourselves, degrading the efficiency of our fuzz testing. On the other hand, we could just liberally copy-paste from the source code of arbitrary whatever we need, only adapting it so it works on an owned vector of bytes instead of a slice with some lifetime?

I'm just thinking it's a little sad that a small thing like a lifetime would stand in the way of such a good opportunity of code reuse 🤔

Yeah, that's fair -- I suppose it's not the worst thing in the world to take a ctrl_plane: &mut ControlPlane<'_> everywhere and then keep an Unstructured inside. Ideally then the way we cut off lifetime proliferation is to be strict about passing this down the stack only, and not storing it in structs (requiring an extra lifetime to that struct). A prototype of this option would probably tell us pretty quickly how tenable that is!

I let this bounce around in my head a bit more and I think I'm coming back to my original position: it's probably better not to carry the fuzzing-specific Unstructured everywhere, and instead build an abstraction around it. I got to this position by starting with a separation-of-concerns mindset but I think it has other nice properties.

I think I want the design to follow these principles:

  • Compilation output of some CLIF is a function of that CLIF, compiler settings, and a ControlPlane composed of Rust primitives (bools, later a fuel counter, etc);
  • The fuzzing infrastructure can create these ControlPlanes with an Arbitrary impl.

This has a few nice properties:

  • It avoids lifetime proliferation.
  • It allows us to more naturally handle the multiple-function case: it's much easier to ask a top-level Unstructured for a Vec<bool> per function than it is to somehow try to split the random input into chunks.
  • It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.
  • It means that we don't have to propagate "not enough data" errors upward from Unstructured.
  • It means that we could later drive the mechanism with some other source of decisions; we're not tied to arbitrary.

Basically, the compiler's core is too late for construction of structured data from random bytes; we should build an input for the compiler that is just plain (structured) data. That leads to less friction with Rust's ownership model as well as more determinism and control.

IMHO the "reimplement Arbitrary" concern is somewhat smaller in comparison: we shouldn't have to implement any of the detailed logic but rather the "pop a bool off a Vec" approach, and likewise for other data types we need. It's true that the fuzzer may give us only N bools in our Vec<bool>, and a particular compilation path asks for M > N bools (the last M - N of which are just defaulted to false); but libFuzzer is generally good about learning the structure of data that leads to better coverage.

Finally, a thought on zero-sized types: I think it will be fine to take the cost of an extra parameter &mut ControlPlane; as long as its methods in a chaos-disabled build return constants, and the ControlPlane itself is zero-sized, my intuition is that this will be minimal. (Basically, we're not injecting new paths into any hot loops, we're just adding one live usize-sized value passed into a few functions per compilation.) If it turns out to be measurable then we can definitely revisit but let's not worry much until we see that impact, IMHO.

@senekor senekor force-pushed the fuzz-skip-branch-opt branch 3 times, most recently from 2de2de3 to be761cf Compare March 19, 2023 09:38
@senekor
Copy link
Contributor Author

senekor commented Mar 19, 2023

The rename seems uncontroversial, so I did that right on this branch.

  • crate cranelift-chaos renamed -> cranelift-control
  • ChaosEngine renamed -> ControlPlane
  • should the feature also be renamed from chaos to control? I left it as is, assuming that future potential features of the control plane outside the scope of chaos mode might be implemented as different compilation features.

From my understanding, there are two major, orthogonal problems with the architecture. I think it's best to branch off from here so we can evaluate the possible approaches separately. I will create a draft PR for each approach so we have a basis for comparison.


Usage of unsafe vs. lifetime proliferation to make use of Unstructured

I agree with the feedback that unsafe must be avoided and that spreading &'a mut ControlPlane everywhere is also a bad idea.

approaches:

Edit: Currently, the internal representation is a Vec<bool> without using Unstructured. When it is useful to do so, we can use it in the future, internally, and without leaking any lifetimes. As an implementation detail, it's easy to change.


Non-deterministic order of perturbations because of multi-threading combined with Arc<Mutex<_>>

approaches:

  • Replace with Rc<RefCell<_>>, the compiler will then prevent control planes from being shared across threads.
  • Propagate &mut ControlPlane through the call stack instead of owned ControlPlanes. (use bjorn3's pattern to make the reference zero-sized)

Maybe this question can be discussed already, before I'm done with the POCs: Is there a downside to using internal mutability when multi-threading is prevented with Rc<RefCell<_>>? What is the benefit of regular mutable references? Off the top of my head, the downsides of internal mutability are in general:

  • It is error prone because users of &Foo may erroneously assume the state of Foo not to change.
  • It is less performant, because ownership rules are checked at runtime.

In our case, I would say these two do not apply: users of ControlPlane shouldn't be concerned with its internal state anyway and the small performance hit during fuzz testing shouldn't matter IMO. If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

It allows us to more naturally integrate into the cache infrastructure: compilation is a property of this ControlPlane, which is a piece of data that can impl Hash and Eq like anything else.

This may be an issue though, as Hash and Eq cannot be derived on Rc and RefCell. I am somewhat confident that we can write a straight forward manual implementation for those without much trouble, but I'll have to show that in the POC.

@cfallin
Copy link
Member

cfallin commented Mar 19, 2023

I would prefer if we went with a &mut ControlPlane as well, rather than subverting the ownership system with a Rc<RefCell<ControlPlane>>. Given the uncertainty here, I want to understand a bit more: are you proposing the alternative because you'd rather not have to pass an extra function parameter? Or for some other reason?

In addition to the disadvantages you named, using a RefCell will actually have a measurable impact on compile time, I suspect: every query of the control plane in an inner loop will require a dynamically-checked borrow. This alone is reason not to do it IMHO. There's also the fact that if stored anywhere, it would make data structures !Send which is problematic for our internal compilation parallelism.

But the other argument I would make is that subverting the Rust ownership model should not be a "why not" sort of discussion, but should be a "why is this the only possibility" sort of discussion. The alternative proposed here is "just pass a &mut around" and I don't really see the downsides. In particular, this statement

If single-threaded internal mutability works just as well for our purposes, I would prefer it over mutable references because of the better developer experience. It would be regrettable if issues with mutable references ever get in the way of regular Cranelift development.

is somewhat perplexing and is exactly the opposite of my experience with building large systems with Rust. Internal mutability is a "cheat code" that arises because of unavoidable pressure from the outside. The Rc<RefCell<T>> pattern is a bit of a crutch, in that it emulates how one can write code in other languages with less restrictive sharing of aliases; as a result, it is sort of a way to give up on thinking through the ownership model, and can result in unexpected logic bugs, as we've seen here already with nondeterminism. I would expect that, contrary to such a context parameter "getting in the way", this internal-mutability trick would get in the way by violating expectations and leading to all sorts of issues.

The "better developer experience" bit I would question specifically: what downsides are we avoiding by not passing a &mut context parameter around? Just that (the need for the parameter)? Or a perceived difficulty with &mut references in general? Or something else?

Anyway, given all that, I would really strongly prefer the suggestions I gave above: a &mut ControlPlane, passed as a normal function parameter; no internal mutability or other trickery, only idiomatic Rust; and for now let's not try to make the borrow itself zero-sized, because that adds complexity and I suspect won't matter much.

@iximeow
Copy link
Contributor

iximeow commented Mar 19, 2023

just in the interest of being explicit: i think part of the feedback here is, if there is an overhead for passing &mut ControlPlane around, it is overhead Cranelift would need to deal with at some point in the future anyway, for other reasons [like user-directed opt hints or optimization fuel], so that definitely won't be held against your MVP here. doing the simple thing with &mut ControlPlane is Probably Fine, and the relative benefit of having Chaos mode in the compiler almost certainly outweighs a small compile time regression if one is measurable.

(also, i think the trick about holding a &'a mut ControlPlaneInner actually ensures that ControlPlane is always a usize - a ref of a zero-size type is still a reference and still a usize wide. this example might help show which changes are actually resulting in optimizations: rustc deletes the ControlPlane argument when it's not needed, even if it is non-zero-size, but it turns out that if foo/bar/baz are left non-side-effectful they aren't useful to demonstrate the calling convention changes anyway!

this happens to make me think that passing around a &mut ControlPlane promises to have no cost - rather than "not enough to care" cost - in the chaos mode is disabled case after all :D)

@senekor senekor force-pushed the fuzz-skip-branch-opt branch from be761cf to 727982e Compare March 22, 2023 14:11
@senekor
Copy link
Contributor Author

senekor commented Mar 23, 2023

The approach with the mutable references is now working, thanks mostly to @MzrW. For now we made sure the fuzz target cranelift-fuzzgen compiles (with feature chaos) and works to validate the architecture.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable so far -- thanks for all of the efforts!

Some comments from our discussion just now; also, in the fuzz testcase toplevel, I think it might be slightly clearer to have TestCase::functions be a Vec<(Function, ControlPlane)>. You may need to clone the control plane or take ownership / destruct the TestCase to get a mut ControlPlane but that should be a reasonable refactor I think.


for inst in mem_insts.into_iter() {
inst.emit(&[], sink, emit_info, state);
inst.emit(&[], sink, emit_info, state, ctrl_plane);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctrl_plane can probably go inside the state (EmitState)?

The issue with lifetimes that this would otherwise create (&mut ControlPlane inside of the struct) can be resolved I think by std::mem::move to take ownership of the control plane temporarily in places where we emit.

pub want_disasm: bool,

/// TODO chaos: is this the right location to hold ownership?
pub ctrl_plane: ControlPlane,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably better to pass in the control-plane state with each call to compile; the CompilerContext is otherwise not that semantically meaningful (meant to enable reuse).

if let Some(true) = ctrl_plane.get_decision() {
println!("");
println!("");
println!("branch optimizations skipped");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(remove debugging printlns before merging)

/// The control plane of chaos mode.
/// Please see the [crate-level documentation](crate).
///
/// **Clone liberally!** The chaos engine is reference counted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated comment?

// backtrace::Backtrace::force_capture()
//);
//None
panic!("trying to get a decision from a noop chaos engine");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this is_noop mechanism before merging.

}

/// TODO chaos: should be explained
pub fn no_chaos() -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this body makes sense as the Default impl (an empty ControlPlane should have no affect on Cranelift's behavior as it is today -- this also implies how to use bools, i.e. false should make no change).

}

impl ControlPlane {
pub fn get_decision(&mut self) -> Option<bool> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would return just bool (use .unwrap_or(false) on the pop).

senekor and others added 6 commits April 3, 2023 17:08
Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
…ack by reference

Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Remo Senekowitsch <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
@senekor senekor requested a review from cfallin April 3, 2023 18:08
@senekor senekor marked this pull request as ready for review April 3, 2023 18:20
@senekor senekor requested review from a team as code owners April 3, 2023 18:20
@senekor senekor requested review from elliottt and pchickey and removed request for a team April 3, 2023 18:20
@elliottt elliottt removed their request for review April 3, 2023 22:18
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things below -- we're almost there! Thanks for your patience here!

/// mutable reference to it may be used to:
/// - move out of it, e.g. using [std::mem::swap]
/// - access that control plane temporarily
fn get_ctrl_plane(&mut self) -> &mut ControlPlane;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • Usually a mut-accessor will be named like fn ctrl_plane_mut(&mut self) -> ...
  • Let's have a different one too, fn take_ctrl_plane(self), that consumes the emit-state and gives us back the control-plane state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn take_ctrl_plane(self)

I think that actually caught a mistake. I was taking the control plane out of the emission state inside a loop, where later loop iterations would use the state with a now-empty control plane.

/// The default value `false` will always be returned if the
/// pseudo-random data is exhausted or the control plane was constructed
/// with `default`.
pub fn get_decision(&mut self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small tweak to the conditional-compilation strategy: I had been thinking that we could have the methods that produce decisions, like get_decision here, return a default value (false here) as a constant in the non-chaos-feature case; then the sites where we use these decisions, like in MachBuffer, don't require annotation with conditional compilation. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the potential performance impact in release builds, but I guess it's safe to assume the compiler inlines a constant false and removes the resulting if false {}.

In my view, it would be a nice aspect of the control plane that there is no way to (mis-)use it in regular builds. But I guess that every control plane API needs to have some default output value anyway... and that can probably always be inlined as well? And we can annotate these default-returning functions with #[inline].

What is the downside of conditional compilation at the call sites? It seemed like an easy way to be really, really sure nothing bad happens in release builds, but on second thought, it doesn't seem necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we should be able to trust the branch-folding here.

What is the downside of conditional compilation at the call sites?

The main downside is that it spreads the implementation of a conditional decision across distributed points -- the alternative, where everything is wired to a single module where all conditional-compilation logic lies, makes it easier to make changes in the future. (Another example of this principle in action is the memfd pooling-allocator mechanism in Wasmtime: when I implemented this in #3697 last year I originally had feature-conditional code in many places, but Alex convinced me to centralize everything into two versions of one module and remove conditionals everywhere else. The result is far cleaner!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this goes for the Arbitrary implementation as well, so I removed the conditional compilation here too.

The shim control plane's Arbitrary implementation now returns the default without consuming any bytes.

senekor and others added 2 commits April 4, 2023 20:38
Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
Also cleanup a few straggling dependencies on cranelift-control
that aren't needed anymore.

Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience -- this all looks good now, and I think is a solid base for the remainder of the chaos-mode work.

@cfallin
Copy link
Member

cfallin commented Apr 4, 2023

As a timing note, I'm going to wait to merge this until tomorrow, after our next release's beta branch is cut; I want it to bake on main a little more than the two-week minimum with the beta-to-release delay (and want to give a chance to notice if we have any unexpected regressions, etc).

@cfallin cfallin added this pull request to the merge queue Apr 5, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 5, 2023
@cfallin
Copy link
Member

cfallin commented Apr 5, 2023

@remlse it looks like we'll need to add the new crate to a list in scripts/publish.rs; this was only caught in the full CI run in the merge queue. (If you want to test in your commit to fix this, you can add prtest:full to the commit message.)

&mut self,
func: FuncId,
ctx: &mut Context,
ctrl_plane: &mut ControlPlane,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think define_function should get this argument. If you need this fine control you should probably use define_function_bytes instead. This doesn't work with a module that serializes functions rather than immediately compiles them and it is confusing for most users of cranelift.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the place we actually needed that argument. So that would have to be rewritten with define_function_bytes. The module there is a JITModule and its define_function and define_function_bytes methods are not trivial, so it's not obvious to me how to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cranelift-object implements define_function as ctx.compile_and_emit(self.isa(), &mut code, ctrl_plane) followed by define_function_bytes. You could do the same in TestFileCompiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorn3 in general the approach we've been taking is to thread through the control-plane everywhere compilation can be invoked; conceptually it's now another input along with the CLIF. (It does have a Default implementation.) If there's a way to rename this variant to a third option, and then have a variant that uses a default control plane, we can perhaps do that. Would you be willing to do that in a followup PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a usability perspective having another method would work. But when serializing rather than compiling, a ControlPlane argument doesn't really make any sense as you can't serialize ControlPlane. (I have local changes to make a serializing Module which I want to upstream. I'm using it to allow using cranelift-interpreter in cg_clif with minimal changes to cg_clif.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why serialization of modules implies the need to serialize a ControlPlane -- it is given as an argument, it isn't stored -- but please do create an issue or PR with a fix if you have one in mind. In the meantime I'll go ahead and merge this PR (which has been under review for a while and we have general consensus on).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the passed in ControlPlane should affect the eventual compilation of the function, it did need to be serialized. If not, there it doesn't really make much sense to pass in ControlPlane.

In the meantime I'll go ahead and merge this PR (which has been under review for a while and we have general consensus on).

👍

prtest:full

Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
@senekor senekor force-pushed the fuzz-skip-branch-opt branch from 0fe0d00 to bc76cae Compare April 5, 2023 17:44
senekor and others added 2 commits April 5, 2023 20:05
Co-authored-by: Falk Zwimpfer <[email protected]>
Co-authored-by: Moritz Waser <[email protected]>
The conflict was that all the versions were bumped from 0.95 to 0.96.
I changed the version of cranelift-control accordingly.
@senekor
Copy link
Contributor Author

senekor commented Apr 5, 2023

@cfallin the full tests are passing now, maybe we can try to merge again?

@cfallin cfallin added this pull request to the merge queue Apr 5, 2023
Merged via the queue into bytecodealliance:main with commit 7eb8914 Apr 5, 2023
@senekor senekor deleted the fuzz-skip-branch-opt branch April 16, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants