Skip to content

Conversation

@goatchurchprime
Copy link
Contributor

@goatchurchprime goatchurchprime commented Jul 19, 2025

This PR is an alternative to #100508 and #105244 and is designed to fix the reliability issues on the microphone data discovered when implementing VoIP as identified in godotengine/godot-proposals#11347

A comprehensive demo is at:
https://github.com/goatchurchprime/godot-demo-projects/tree/gtch/micplotfeed/audio/mic_feed

As discussed at length in the Audio Group meetings, the API needed to future-proofed for the case where there might be more than one microphone data stream, even though only a single microphone input has been hard-coded throughout the AudioDriver code on all platforms.

Accordingly, I have modeled a framework based on the CameraServer and CameraFeed objects and created the new object MicrophoneServer that has a single MicrophoneFeed.

This MicrophoneFeed contains the following functions:

is_active() -> bool
set_active(p_is_active : bool) -> void

get_frames_available() -> int
get_frames(p_frames ->) -> PackedVector2Array
get_buffer_length_frames() -> int

I have renamed the functions from their original names get_frames_available() and get_buffer() in AudioEffectCapture, because they were confusing as you are getting frames from a buffer, not the buffer. I have also added a function to access the size of the internal input buffer so its overflow condition is predictable.

Known issues

  • I cannot see why MicrophoneFeed.xml is not being picked up by the documentation system. Probably some stupid typo I can't find.
  • I don't understand the remaining issues in the automatic integration checks
  • I haven't found a good way to detect and handle the lack of the MicrophoneServer object in older versions. The function ClassDB.class_exists("MicrophoneServer") works to discriminate whether it is there or not, but there is no equivalent of ClassDB.instantiate("MicrophoneServer") for singleton global objects.
  • Need to check that the use of AudioStreamMicrophone in same project doesn't break the system. (not advised but doesn't cause any unresolvable issues).
  • The code hasn't yet been tested on platforms other than Linux

Future work

When this is stable and in use, we can go back through AudioStreamMicrophone and rewrite it depend on this API as well as mitigate the flaws in its design with regards to the assumption that the mix-rate of the input and output streams always match to a high precision without any slippage.

It would also be useful after all this complexity has been added to demonstrate at least one example case of more than microphone being implemented in any platform.

@goatchurchprime goatchurchprime requested review from a team as code owners July 19, 2025 11:44
@goatchurchprime goatchurchprime changed the title Gtch/micserver Add MicrophoneFeed with direct access to the microphone input buffer Jul 19, 2025
return buf.size() / 2;
}

PackedVector2Array MicrophoneFeed::get_frames(int p_frames) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong since it mimics the audio capture api but does not use a ring buffer. You must use a ring buffer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AudioDriver input_buffer is already a ring buffer, so I'm not sure where another ring buffer would come into play.

The current audio capture apis (capture effect and record effect) use separate ring buffers because they're receiving a fixed, transient batch of data from the AudioDriver input_buffer (pulled off it in AudioStreamMicrophone) as part of the audio server _mix_step, and need to store it until the user can request it in their own code, outside the audio server thread/loop. This implementation bypasses that, allowing users to pull from the audio driver ring buffer directly. Each feed can have its own 'buffer_ofs', so multiple places in the code can be retrieving microphone data from the same device without stepping on each others toes.

The issue I have with this is it should need to lock the AudioDriver, since input_buffer is the single buffer that is being written to by the driver in the AudioDriver thread. As far as I can tell, not locking will lead to race conditions in multithreaded environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the locking in the current implementation is inconsistent.

There is no locking applied when the samples are added to the buffer by
AudioDriver::input_buffer_write(int32_t sample), but there is a lock applied when the samples are taken from the buffer by AudioStreamPlaybackMicrophone::_mix_internal(AudioFrame *p_buffer, int p_frames).

That means the locking isn't doing anything useful since it is on only one half of the transaction. Assuming it was meant to do something, I added a corresponding lock into the input_buffer_write() function in my first PR. But there were complaints that I was adding a lock into the very time-sensitive audio thread which could be potentially harmful.

The lack of a lock in input_buffer_write() evidently caused an index out-of-range crash that was mitigated by inserting in an extra boundary test on the index, instead of finding the root cause, which could only be due two threads executing input_position++ at the same time:

void AudioDriver::input_buffer_write(int32_t sample) {
	if ((int)input_position < input_buffer.size()) {
		input_buffer.write[input_position++] = sample;
		if ((int)input_position >= input_buffer.size()) {
			input_position = 0;
		}
	} else {
		WARN_PRINT("input_buffer_write: Invalid input_position=" + itos(input_position) + " input_buffer.size()=" + itos(input_buffer.size()));
	}
}

In any case, there should never be two threads entering this function since it would result in choppy out-of-order audio chunks being pulled from the operating system and buffered. I've only seen it happen in circumstances when AudioDriverPulseAudio::input_start() was called a second time. Some of the code in my current PR protects this from happening again on the various different platforms.

With regards to the race condition, I think it is safe since the input_buffer is never getting realloced and the MicrophoneFeed::get_frames() function doesn't write to any conflicted values and can tolerate an out-of-date input_position value.

Indeed, there is no point in adding in a lock into this function without adding the corresponding lock into input_buffer_write() function. Unfortunately we don't know what the consequences of acquiring a lock at the rate of 88.2kHz in the audio thread would be, and it has persisted for this long without it being a problem that it would be a risk to change it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drivers lock themselves in their own threads before making changes to input_buffer (except maybe the Android driver).

driver->lock();

You may be right that accessing input_buffer from multiple threads is fine (due to no reallocation), but it really doesn't feel correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected. Note that the lock is per chunk of audio, not per sample.

Yes, it was the Android version which had all the bugs I have been most interested in fixing.

@Alex2782
Copy link
Member

Many GH checks failed,
here for GHA / 🍎 macOS / Template (target=template_release) (pull_request)Failing after 4m

[doctest] doctest version is "2.4.12"
[doctest] run with "--help" for options
===============================================================================
./tests/core/object/test_class_db.h:879:
TEST SUITE: [ClassDB]
TEST CASE:  [ClassDB] Add exposed classes, builtin types, and global enums
  [ClassDB] Validate exposed classes

./tests/core/object/test_class_db.h:467: FATAL ERROR: REQUIRE_FALSE( !p_context.has_type(p_method.return_type) ) is NOT correct!
  values: REQUIRE_FALSE( true )
  logged: Method return type 'MicrophoneFeed' not found: 'MicrophoneServer.get_feed'.

===============================================================================
[doctest] test cases:    1223 |    1222 passed | 1 failed | 3 skipped
[doctest] assertions: 2349280 | 2349279 passed | 1 failed |
[doctest] Status: FAILURE!


@Calinou Calinou added this to the 4.x milestone Jul 22, 2025
@goatchurchprime
Copy link
Contributor Author

@AThousandShips AThousandShips removed request for a team September 30, 2025 08:59
@goatchurchprime
Copy link
Contributor Author

@adamscott

The deeper proposal is to separate the Audio Input (microphone code) from Audio Output (speakers) and move this code into a default microphone feed class.

Unfortunately the Godot codebase does not make this separation easy to do.

Here is why.

The Godot AudioDriver class (which contains the single microphone input buffer ) has a derived class for each platform: AudioDriverPulseAudio, AudioDriverXAudio2, AudioDriverWeb, AudioDriverOpenSL, AudioDriverALSA, AudioDriverCoreAudio, and AudioDriverWASAPI.

Each of these classes manages the single input and single output for that platform -- sometimes with the same function.

For example, in AudioDriverWASAPI there is a 300 line initialization function that has a parameter p_input to say whether it is setting up the input or the output device:

Error AudioDriverWASAPI::audio_device_init(AudioDeviceWASAPI *p_device, bool p_input, bool p_reinit, bool p_no_audio_client_3)

Here are the options of what can be done.

Option 1: Separate the Audio Input and Output at a deep level

This requires me to write a new AudioInputDriver class along with 7 derived classes AudioInputDriverPulseAudio, AudioInputDriverXAudio2, AudioInputDriverWeb, AudioInputDriverOpenSL, AudioInputDriverALSA, AudioInputDriverCoreAudio, and AudioInputDriverWASAPI and cut-and-paste the input related functions into each of these for each platform.

The case of AudioInputDriverWASAPI would require me to copy-paste that 300 line audio_device_init() function, which would cause complaints about code duplication.

Option 2 (what I have implemented): Touch the AudioDriver code as little as humanly possible so nothing breaks:

Leave all the code and its technical debt (in relation to the feature of multiple microphones) in place, including its single microphone input buffer on which the 7 different platform implementations depend, and simply access this single buffer from the one and only MicrophoneFeed.

Option 3: Something in between

Leave the AudioDriver class and its 7 derived classes as they are but extend the core function AudioDriver::input_buffer_write(int32_t sample) to take a microphone_feed_id parameter that enables each platform to push the audio data to one of multiple microphone buffers in the core class.

Since none of the platforms implement multiple microphones, the value of microphone_feed_id will always be zero. So although I could write code that looked like it did something in terms managing and looking up multiple MicrophoneFeed objects, it would do nothing more than the current PR, so I think what is in the PR is more honest because it is not trying to fool anyone with code that doesn't do anything.

In my opinion the appropriate time to implement the code that can manage multiple MicrophoneFeeds is when at least one of the platforms' AudioDrivers has been extended to support it, because otherwise its implementation will be speculative and likely to be wrong.

@BuzzLord
Copy link

I made a similar implementation of a MicrophoneServer/MicrophoneFeed (branch here) but with a slightly different interface. I wasn't happy with some aspects of it (I forget specifically what, though...), and so didn't turn it into a PR after seeing this PR, but maybe we can pull something from it into here.

My implementation has the MicrophoneServer create a MicrophoneFeed on demand, based on an input_device name (which comes from the AudioServer.get_input_device_list). Multiple feeds can be used to record mic data independently, since each feed has its own buffer offset into the AudioDriver input_buffer.

However, since only one real microphone exists in the AudioServer (as currently implemented), the feed will throw an error if you try to start recording with a different device than the active one. I removed access to setting the input_device via AudioServer, and only let it happen through the MicrophoneServer/Feed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants