-
Notifications
You must be signed in to change notification settings - Fork 149
zeroize: replace atomic_fence with optimization_barrier
#1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I really don't understand the motivation for this or why you're making it a public function.Something like this seems like a potential replacement for the existing Can you start with a demonstration of a problem in the existing implementation, then show what this solves? Making it public seems like an implementation detail leaking out of the API. |
|
You can see a motivation example in the function docs. We can not "zeroize" types like I encountered a need for this type when working on block buffer for
Yes. I plan to do it in a separate PR. |
|
I'm not sure I understand the issue in RustCrypto/stream-ciphers#491 or what it's even trying to do... zeroize the entire keystream it generates? |
|
The current With the // in rand_core
impl<R: Generator> BlockRng<R> {
pub fn zeroize(&mut self) {
self.results = Default::default();
}
}
// in chacha20
struct ChachaRng(BlockRng<ChaChaBlockRng>);
impl Drop for ChachaRng {
fn drop(&mut self) {
self.0.zeroize();
zeroize::observe(self);
}
} |
|
I'm still not sure I follow... what calls the Where exactly is the "hack"? |
|
How is... impl Drop for ChachaRng {
fn drop(&mut self) {
self.0.zeroize();
zeroize::observe(self);
}
}...any different from... impl Drop for ChachaRng {
fn drop(&mut self) {
self.0.zeroize();
}
} |
The
Note that |
|
So the problem is that |
|
In a certain (uncharitable) sense, yes. But while |
|
Perhaps I do still think something like this, internally within |
Not everyone needs RNG buffer zeroization. Introducing a crate feature for it is also not desirable. I think that zeroization should be handled on the I don't understand why you are against exposing the function. It a simple safe functon which can not be misused (as opposed to |
|
I'm a bit worried about the notion that there's a user pulling in one crate which is expected to do some otherwise insecure zeroing, then separately pulling in Perhaps for a start you could use this technique as an internal replacement for |
Yes, it's unfortunate, but sometimes there is just no other way. Potentially unreliable erasure is better than no erasure at all. And it's much better than risking UB with Also note my point about efficiency. struct Foo {
a: [u8; 32],
b: [u64; 16],
c: u64,
}
impl Drop for Foo {
fn drop(&mut self) {
// This impl is more efficient
self.a = [0; 32];
self.b = [0u64; 16];
self.c = 0;
zeroize::observe(self);
// than this impl
self.a.zeroize();
self.b.zeroize();
self.c.zeroize();
}
}See https://rust.godbolt.org/z/K7hxoePKE
Personally, I don't see much point it it, but sure. |
The existing performance problems are already noted in #743. If you replaced |
Well, IIUC even cc @RalfJung |
observe functionatomic_fence with observe
atomic_fence with observeatomic_fence with optimization_barrier
Do you have more information about that? I wasn't aware compilers would change codegen based on an analysis of the ASM. |
|
I did not mean that existing compilers do it, but that AFAIK it's not explicitly forbidden. For example, an advanced LTO pass could in theory apply optimizations directly to generated assembly, thus removing "useless" writes to stack frame which immediately gets released. I tagged Ralf in the hope that he would clarify this moment in the case if I am mistaken. |
|
The details around
At the same time, the entire concept of zeroize is to try to do something that can't reliably be done in the Rust AM. From a formal opsem perspective, reliable zeroizing is impossible. (I wish it were different, but sadly that's where we are. Even if we wanted we couldn't do much about that in rustc until LLVM has proper support for this.) Best-effort zeroizing is a lot more about what compilers and optimizers actually do than about their specifications, and I'm the wrong person to ask about that. |
@RalfJung what term would you use for this sort of tactical defense against code elimination that, at some future date, could be subject to another round of cat-and-mouse when the compiler outsmarts it? |
|
I still think the best thing we could do with |
|
If we are to trust the compiler to not mess with |
The paragraph you quoted was talking about reliable / sound reasoning principles. |
|
I think we only need one implementation of memset/bzero per architecture. The implementation of zeroize itself is already abstracted such that there are two places to actually invoke it from: |
I’ve heard “optimization barrier” used for this by several people, though if that sounds like it has a property this isn’t providing, perhaps something like “optimization impediment” is clearer? |
I only see disadvantages when compared to the observation We don't win anything in language guarantees. While by making it general you prevent the compiler from applying optimizations which we want, such as unrolling loops, reusing zeroed registers, or SIMD-ifying code when appropriate. And it obviously much harder to write correctly than slapping the same empty |
That sounds like it may be in conflict with self-modifying code which needs the asm blocks to be unchanged. OTOH something like BOLT could do all sorts of "fun" things and we'd probably say it's fine, it's just incompatible with self-modifying code. But you're asking about things way outside of what I can confidently talk about. I suggest you bring this to the t-opsem Zulip; there are other people there that know more about these low-level things.
I always interpreted "barrier" as something that actually guarantees certain reorderings don't happen. So yeah I think a term that makes the best-effort nature more clear would be preferable. |
By using We get somewhat similar guarantees out of If we remove the volatile writes to improve performance, I don't think we really have any guarantees at that point, just something future compiler versions are unlikely to see through. Maybe that's fine, as I said earlier it seems fine for many e.g. libc implementations and their |
|
Last time I asked about LTO I was told LTO would not touch |
My point was that it also fully applies to
IIUC they rely on dynamic linking to act as an optimization "impediment" (in other words, on keeping the functions "external" and thus out of the optimizer reach). We probably do not want to rely on it by default. |
The major difference is you're still relying on Rust code to do the actual writing to memory, and I would be a lot more worried about that code being eliminated than I would anything in the
@newpavlov I think the most common way to implement them is with an ASM barrier immediately after calling the normal Sidebar: we could potentially use those libc functions (#1254) |
If we are to trust the black box model of Technically, IIUC it's allowed for the compiler to allocate a similarly sized buffer on stack, zero it out, and then pass pointer to it. But the same hypothetical also applies to "secure" memsets as well (i.e. the compiler may call it on a copy of zeroized object). But I believe we are in the realm of wild speculations at this point. |
| unsafe { | ||
| volatile_set((self as *mut Self).cast::<u8>(), 0, size_of::<Self>()); | ||
| } | ||
| unsafe { ptr::write_bytes(self, 0, 1) }; | ||
|
|
||
| optimization_barrier(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the removal of volatile writes be done in a separate PR, so this PR does what it says on the label: replaces atomic_fence with optimization_barrier.
The removal of volatile writes removes the longstanding defense this crate has been built on in the past, and I think that much deserves to be in the commit message (and I would also prefer it be spelled out nicely, and not jammed in like "replace atomic_fence with optimization_barrier and removes volatile writes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we should probably leave the volatile writes in place unless we're using asm!. Otherwise we're relying on black_box as the only line of defense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the removal of volatile writes be done in a separate PR
Ok. Will do it tomorrow.
Otherwise we're relying on black_box as the only line of defense.
Well. AFAIK it works without issues on the mainline compiler. Yes, we don't have any hard guarantees for it, but, as we discussed above, it applies to the asm! approach as well. IIRC black_box gets ignored on an alternative compiler (I forgot the name), but I don't think it supports the target arches in question in the first place and I am not sure how it handles volatile OPs.
We could discuss potential hardening approaches for black_box, but I would strongly prefer to concentrate on optimization_barrier doing what we want instead of piling different hacks on top of each other. In other words, I want for the snippet in this comment to be practice worthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's unacceptable for black_box to be the only line of defense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer to concentrate on optimization_barrier doing what we want instead of piling different hacks on top of each other.
I also think this is a gross overstatement of the situation. This PR does not notably make the codebase smaller. What it does do is delete a lot of the SAFETY comments that describe the strategy the library uses.
Had you just changed volatile_write and volatile_set to call ptr::write_bytes, the PR itself could've included considerably fewer changes. They exist so there's a single consistent strategy used throughout the library, and as a place to document how that strategy works.
Adding a little bit of compile-time gating to implement those functions in terms of either volatile or non-volatile depending on if an asm! barrier is in-place is neither particularly complicated nor "piling different hacks on top of each other".
It's a question of "we get the guarantee from a volatile write" vs "we get the guarantee from an asm! optimization barrier" (which is, itself, perhaps overstating the situation). black_box has no guarantees. You would need to delete all the documentation that says this library has guarantees, and that's a change I don't want to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, we could look into ways to harden black_box. For example, see here. We also could write:
pub fn observe<R>(val: &R) {
let _ = unsafe{ core::ptr::read_volatile(val) };
}But, as expected, it results in a pretty atrocious codegen. We could improve it a bit by casting the reference to [usize; N] if size allows, but the #[inline(never)] solution looks better to me. Finally, some of the leftover targets support asm! on Nigthly, so we could add an unstable feature for them.
FWIW, I agree -- |
|
@RalfJung pub struct Foo {
a: u64,
}
impl Drop for Foo {
fn drop(&mut self) {
self.a = 0;
unsafe {
core::arch::asm!(
"# {}",
in(reg) &self.a,
options(readonly, preserves_flags, nostack),
);
}
}
}
impl Drop for Foo {
fn drop(&mut self) {
unsafe {
core::arch::asm!(
"mov qword ptr [{}], 0",
in(reg) &mut self.a,
options(preserves_flags, nostack),
);
}
}
}Assuming that the compiler is forbidden from analyzing the |
|
As I said above, that's outside my expertise and I recommend asking on the t-opsem Zulip. At least to a first-order approximation, I'd prefer the 2nd variant as that guarantees that the actual instructions you wrote there are in the final binary, so you don't even have to begin reasoning about what exactly the compiler may or may not do. |
No description provided.