Skip to content

[WIP] Enable guest tracing #695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

dblnz
Copy link
Contributor

@dblnz dblnz commented Jul 7, 2025

Description

This PR updates the work @syntactically has done in #103 to latest main and extends it to support tracing the guest for performance.
This effort is part of the #669 issue.
Improvements on the trace file output will be addressed in #704 .

How it works

It defines a TraceBuffer in the guest that holds TraceRecords. These records are only sent to the host when the buffer gets full, or an exit (Out/Halt instructions) is anticipated.
The records are sent to the host using an Out instruction and providing the pointer to an array of TraceRecords.
The host accesses the guest memory, reads these records and writes them to the trace file.

The guest uses the Invariant TSC and rdtsc instruction to read the clock cycles in the guest and calculate the timestamp on the host.
The timestamp is relative to when the TraceInfo started (before creating the Hypervisor Driver in this case).

Two additional crates define the guest logic for tracing. One defines some macros meant to make the annotation of functions simpler to the user

#[hyperlight_guest_tracing_macro::trace_function]
fn function_to_trace() {}

Note: The data processing could be improved (maybe OTEL), I've just used what we had at hand.

Results

Below are some results I have generated based on our hello-world example in the hyperlight repository.

1. My host machine on WSL2 running KVM on Intel(R) Core(TM) Ultra 7 165H

Has Invariant TSC

[338µs    ] BLAKE3 hash of binary is 05e1c8a51d535507af833a5f4e5e4033a42f4e395cdba4cfe7453dcffe3ea76d
[58.197ms ] > entrypoint
[58.294ms ]   > load_gdt
[58.314ms ]   < load_gdt
[58.314ms ]   > load_idt
[58.328ms ]     > init_idt
[58.356ms ]     < init_idt
[58.356ms ]   < load_idt
[58.44ms  ]   hyperlight_main
[58.574ms ] > halt
[59.385ms ] > dispatch_function
[59.386ms ]   > internal_dispatch_function
[59.408ms ]     > try_pop_shared_input_data_into
[59.423ms ]       Start converting buffer
[59.566ms ]       Finish converting buffer
[59.567ms ]     < try_pop_shared_input_data_into
[59.567ms ]     > call_guest_function
[59.567ms ]       Calling guest function
[59.628ms ]       > call_host_function
[59.628ms ]         > call_host_function_without_returning_result
[59.671ms ]           > push_shared_output_data
[59.683ms ]             Start copy of data
[59.683ms ]             Finish copy of data
[59.683ms ]           < push_shared_output_data
[59.683ms ]           > out32
[60.936ms ]           < out32
[60.936ms ]         < call_host_function_without_returning_result
[60.937ms ]         > get_host_return_value
[61.002ms ]           > try_pop_shared_input_data_into
[61.002ms ]             Start converting buffer
[61.046ms ]             Finish converting buffer
[61.046ms ]           < try_pop_shared_input_data_into
[61.046ms ]         < get_host_return_value
[61.046ms ]       < call_host_function
[61.05ms  ]     < call_guest_function
[61.05ms  ]     > push_shared_output_data
[61.05ms  ]       Start copy of data
[61.05ms  ]       Finish copy of data
[61.05ms  ]     < push_shared_output_data
[61.05ms  ]   < internal_dispatch_function
[61.051ms ] > halt
2. Azure VM running mshv3 on AMD EPYC 7763 64-Core Processor Does not report an invariant TSC. I am currently investigating if I can get an Azure VM that reports supporting an invariant TSC.
[67µs     ] BLAKE3 hash of binary is 56561e9811e4a1907f2883cf34345cab6e48ad444201d8a9678860f46749dd41
[941µs    ] > entrypoint
[943µs    ]   > load_gdt
[943µs    ]   < load_gdt
[943µs    ]   > load_idt
[943µs    ]     > init_idt
[944µs    ]     < init_idt
[944µs    ]   < load_idt
[945µs    ]   hyperlight_main
[956µs    ] > halt
[1.431ms  ] > dispatch_function
[1.446ms  ]   > internal_dispatch_function
[1.446ms  ]     > try_pop_shared_input_data_into
[1.446ms  ]       Start converting buffer
[1.448ms  ]       Finish converting buffer
[1.448ms  ]     < try_pop_shared_input_data_into
[1.448ms  ]     > call_guest_function
[1.449ms  ]       Calling guest function
[1.45ms   ]       > call_host_function
[1.45ms   ]         > call_host_function_without_returning_result
[1.452ms  ]           > push_shared_output_data
[1.453ms  ]             Start copy of data
[1.453ms  ]             Finish copy of data
[1.453ms  ]           < push_shared_output_data
[1.453ms  ]           > out32
[1.908ms  ]           < out32
[1.908ms  ]         < call_host_function_without_returning_result
[1.909ms  ]         > get_host_return_value
[1.909ms  ]           > try_pop_shared_input_data_into
[1.909ms  ]             Start converting buffer
[1.91ms   ]             Finish converting buffer
[1.91ms   ]           < try_pop_shared_input_data_into
[1.91ms   ]         < get_host_return_value
[1.91ms   ]       < call_host_function
[1.912ms  ]     < call_guest_function
[1.912ms  ]     > push_shared_output_data
[1.912ms  ]       Start copy of data
[1.913ms  ]       Finish copy of data
[1.913ms  ]     < push_shared_output_data
[1.913ms  ]   < internal_dispatch_function
[1.913ms  ] > halt

TODO

  • For now, the timestamp of the traces is not calculated correctly. It needs to be calculated relative to the start of the guest and base of on the invariant TSC frequency.
  • Check the features introduced in the crates to ensure consistency
  • Expand dump_trace functionality to visualize the traces
  • Investigate invariant TSC not being correctly reported in Azure VMs.

@dblnz dblnz added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Jul 7, 2025
@dblnz dblnz force-pushed the guest-tracing branch 3 times, most recently from d3ece3e to 9de4e6d Compare July 8, 2025 13:15
In the future, the outb handler will need to take a Hypervisor
instance in order to be able to access register and memory state of
the VM, so it doesn't make sense for these interfaces to be more
public than the `Hypervisor` trait.  Nobody outside of Hyperlight
seems to use these at the moment, so it's probably simplest to
restrict these to `pub(crate)`.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This adds (unused) support for creating trace files for sandboxes and
passing them around to relevant sandbox event handler code. This will
be used for collecting debug trace and profiling information.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
This will be useful in the near future, when it will allow
transforming the exe_info into unwind information without an extra
copy.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
syntactically and others added 6 commits July 9, 2025 13:08
This adds a new interface which tracing code can use to request the
values of registers from the hypervisor supervising a sandbox.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
This allows for producing profiles of memory usage.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
dblnz added 2 commits July 9, 2025 15:42
…ords

- `hyperlight-guest-tracing` defines a `TraceBuffer` that keeps
  `TraceRecord`s that the guest issues. When the buffer capacity is
  reached, it automatically issues an `Out` instruction with the
  corresponding info for the host to retrieve the buffer.
- The guest can issue `TraceRecord`s by using the
  `hyperlight-guest-tracing-macro` crate that pushes new records to the
  buffer.
- `hyperlight_common` contains the definitions for the frame types a
  guest can send to the host using the `Out` instruction.

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
- When an `Out` instruction is intercepted, the Hypervisor checks for
  the frame Id, to verify what type of exit it is.
  Based on this, when a trace record type is received, we copy the array
  of trace records from the guest's memory, calculate the timestamp
  based on the cycles returned by the guest's RDTSC and write the record
  to the trace file.

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
dblnz added 6 commits July 9, 2025 17:18
- Add traces wherever we think it might be of use to help us better
  profile the codebase.
- Add flush instructions before halting or running an `OutB` instruction
  to ensure the records are sent before the guest exits due to an error
  or a normal exit

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
@dblnz
Copy link
Contributor Author

dblnz commented Jul 9, 2025

The output data should be changed to something that other tools can open #704.

@dblnz dblnz marked this pull request as ready for review July 9, 2025 16:53
@dblnz dblnz requested a review from danbugs as a code owner July 9, 2025 16:53
@syntactically
Copy link
Member

Does not report an invariant TSC. I am currently investigating if I can get an Azure VM that reports supporting an invariant TSC.

Just to document this: it was confirmed that the host under a nested MSHV will incorrectly not see the invariant TSC bit, and this should be ignored; on the azure platforms, at least, the TSC is in fact invariant.

Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Could we add a step where the test binaries are executing in the CI pipeline? Right now it looks like none of this code is being exercised since it is all behind feature flag.


Hyperlight provides advanced observability features for guest code running inside micro virtual machines. You can enable guest-side tracing, stack unwinding, and memory profiling using the `trace_guest`, `unwind_guest`, and `mem_profile` features. This section explains how to build, run, and inspect guest traces.

### Building a Guest with Tracing Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is focused on how to use the development tools to test from a hyperlight developers perspective but it would be nice to think about it from a user of hyper light. For instance, if I am writing my own guest what do I need to do? I can of course look at these samples but having some docs about it would be useful here.

Comment on lines +28 to +30
trace_guest = []
unwind_guest = []
mem_profile = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for three separate features? From a usability standpoint it seems like a lot of things to keep track of.

From a maintainer standpoint it also feels like a large additional matrix of things to test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the confusion for me is that these seem like three related (as in use some base functionality) but different features but are wrapped up into a single PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant They kind of are, since Doru built this on top of my old memory profiling PR from way back when. The idea with that was that the base tracing feature that lets you generate a file of fine-grained events quickly is useful for other things (and indeed it is being used it here for the performance events), while the full memory profiling build is extraordinarily slow, since it generates a trace event on every allocation/deallocation. Carrying around the registers to unwind the guest stack also seemed like something that would be useful in other contexts but a different amount of overhead & not always needed, e.g. the performance events here don't I believe use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense that you may want to turn those on/off due to overhead. I think it would make for an easier review if these were split out, but I am fine with it if others are.

Copy link
Member

@syntactically syntactically Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend reviewing commit-by-commit; I put a lot of effort into making sure that each commit itself is small/atomic/reviewable. That means that we can have a PR like this one where we only merge things after they are useful (& consequently we know that the design is usable), but reviewers can understand and review each step along that path in isolation.

pub extern "C" fn abort() -> ! {
abort_with_code(&[0, 0xFF])
}

/// Exits the VM with an Abort OUT action and a specific code.
#[hyperlight_guest_tracing_macro::trace_function]
pub fn abort_with_code(code: &[u8]) -> ! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to flush on an abort too?

pub(crate) fn outb(port: u16, data: &[u8]) {
// Ensure all tracing data is flushed before sending OUT bytes
hyperlight_guest_tracing_macro::flush!();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the abort section this means this gets called twice, is that expected?

@@ -60,6 +61,7 @@ pub(crate) fn call_guest_function(function_call: FunctionCall) -> Result<Vec<u8>
core::mem::transmute::<usize, GuestFunc>(function_pointer)
};

hyperlight_guest_tracing_macro::trace!("Calling guest function");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you need a corresponding "guest function completed"?

If so this seems like a common pattern we could include in the macro

as is done in the io.rs:

   hyperlight_guest_tracing_macro::trace!("Start copy of data");
   odb[stack_ptr_rel as usize..stack_ptr_rel as usize + data.len()].copy_from_slice(&data);
   hyperlight_guest_tracing_macro::trace!("Finish copy of data");

if let Ok(parsed) = syn::parse2::<TraceInput>(input2) {
let trace_message = match parsed.message {
Lit::Str(lit_str) => lit_str.value(),
_ => "expression".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fail or print a message that indicates something when wrong using the marco?

@@ -616,6 +617,7 @@ impl ExclusiveSharedMemory {
/// Return the address of memory at an offset to this `SharedMemory` checking
/// that the memory is within the bounds of the `SharedMemory`.
#[instrument(err(Debug), skip_all, parent = Span::current(), level= "Trace")]
#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all the deadcode allows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are to get rid of clippy warnings when you build with the basic trace_guest infrastructure support but not with any of the things that use it. Maybe another option here is to replace the feature with a cfg item that is any(features_which_use_this)

.as_ref()
.shared_mem
.copy_to_slice(buffer, ptr as usize - SandboxMemoryLayout::BASE_ADDRESS)
// .read::<u64>((addr - SandboxMemoryLayout::BASE_ADDRESS as u64) as usize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this commented line

let start = hyperlight_guest_tracing::invariant_tsc::read_tsc();
let start_time = std::time::Instant::now();
// Sleep for 1 second to get a good sample
std::thread::sleep(std::time::Duration::from_secs(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause a delay when running with tracing feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants