Skip to content
Draft
Show file tree
Hide file tree
Changes from 22 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 19 additions & 6 deletions app/background.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>

#include <lib/libbackscrub.h>

// Internal state of background processing
struct background_t {
int debug;
Expand All @@ -18,6 +20,7 @@ struct background_t {
int frame;
double fps;
cv::Mat raw;
int bg_stored;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool perhaps as it's used only for true/false?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I have modified this. Initializing was done with false, setting with true.

std::mutex rawmux;
cv::Mat thumb;
std::mutex thumbmux;
Expand Down Expand Up @@ -130,6 +133,7 @@ std::shared_ptr<background_t> load_background(const std::string& path, int debug
pbkd->debug = debug;
pbkd->video = false;
pbkd->run = false;
pbkd->bg_stored = false;
pbkd->cap.open(path, cv::CAP_ANY); // explicitly ask for auto-detection of backend
if (!pbkd->cap.isOpened()) {
if (pbkd->debug) fprintf(stderr, "background: cap cannot open: %s\n", path.c_str());
Expand All @@ -143,7 +147,7 @@ std::shared_ptr<background_t> load_background(const std::string& path, int debug
// if: can read 2 video frames => it's a video
// else: is loaded as an image => it's an image
// else: it's not usable.
if (pbkd->cap.read(pbkd->raw) && pbkd->cap.read(pbkd->raw)) {
if (cnt > -1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, I chose this current method as cnt could be > 1 for some image files (multiple resolutions?) but they would not play as a video... please test with all the variations in backgrounds folder 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will perform the tests. Using fps which shall be greater than 0.00000 for a video may be better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, my system don't work with animated.gif. Due to this I can't check if the fps which would be processed on systems allowing this will return a frame rate of 45 Fps (value obtained while converting the file via ffmpeg to a stream.

Jpg file may contain a thumbnail image, do reading twice work, and the file is recognized as video. Reading 3 times will give the expected result, but this shall not be the right way.

The test for picture/video is now based on the fps. I think that this shall always work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll grab your PR and check what I find here in the next day or so - thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've checked the behaviour as the current PR has it (fps > 0), and it detects all the backgrounds as videos, but does not fail, since the video loop logic resets the position on each request... we may be able to simplify all this to assume video at all times?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised!
The backsrcrub binary for the cropping branch work as expected, a video is recognized as video and pictures as pictures.

For total_landscaping.jpg I get as output:

background properties:
	vid: no
	fcc: 00000000 ()
	fps: 0.000000
	cnt: -1

What tell your system?

I use Fedora 36 XFCE with opencv 4.5.5. What do you use?
May be that we check for fps > 0.001 in order to have the right condition.

I have tried the following on my development environment:

int w1,w2=0;
if(pbkd->cap.read(pbkd->raw)) {
	w1=pbkd->raw.cols;
	if (pbkd->cap.read(pbkd->raw)) {
		w2=pbkd->raw.cols;
	}
}
if(w2 == w1) {

This is basically the same as the old condition, but I assume that the width of 2 consecutive images has to be the same.
The final test will pass if we have a video, fail if we have an image, even width thumbnails (must have a lower size).

The detection was okay on my system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to opencv cap.get(cv::CAP_PROP_FRAME_COUNT) return the 'Number of frames in the video file'
The first intention to use the count as flag for video or still image, what okay. Asking for CAP_PROP_FPS shall also work.

Work with images as if they were a video is not a great idea. If the image has a big size, reading, ... will take more time as
Retrieving the thumb as now realized within background.cc (Your suggestion).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phlash If I recall, I chose this current method as cnt could be > 1 for some image files (multiple resolutions?) but they would not play as a video... please test with all the variations in backgrounds folder smile
This was your statement, this should not be true if I see the definition for CAP_PROP_FRAME_COUNT.
For my image, which is recognized as video (old code) I get -1 as value. Am image is not considered as frame.

Copy link
Collaborator

@phlash phlash Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjsarton I'm running on Debian stable (11), OpenCV 4.5.1, here's what I get for each background type:

  • animated.gif => vid: yes, fps: 45.0.., cnt: 36
  • background_bauhaus.png => vid: yes, fps: 25.0.., cnt: -2147483648
  • rotating_earth.webm => vid: yes, fps: 30.0.., cnt: 916
  • total_landscaping.jpg => vid: yes, fps: 25.0.., cnt: 1

..so this looks like we are heavily dependant on unstable OpenCV behaviour 😞, and the reason I chose to ignore both fps and cnt, instead attempting to load two frames in sequence.

// it's a video, try a reset and start reader thread..
if (pbkd->cap.set(cv::CAP_PROP_POS_FRAMES, 0))
pbkd->frame = 0;
Expand Down Expand Up @@ -183,13 +187,22 @@ int grab_background(std::shared_ptr<background_t> pbkd, int width, int height, c
if (pbkd->video) {
// grab frame & frame no. under mutex
std::unique_lock<std::mutex> hold(pbkd->rawmux);
cv::resize(pbkd->raw, out, cv::Size(width, height));
cv::Rect crop = bs_calc_cropping(pbkd->raw.cols, pbkd->raw.rows, width, height);
cv::resize(pbkd->raw(crop), out, cv::Size(width, height));
frm = pbkd->frame;
} else {
// resize still image as requested into out
cv::resize(pbkd->raw, out, cv::Size(width, height));
frm = 1;
}
if (!pbkd->bg_stored) {
// resize still image as requested into out
cv::Rect crop = bs_calc_cropping(pbkd->raw.cols, pbkd->raw.rows, width, height);
// Under some circumstances we must do the job in two steps!
// Otherwise this resize(pbkd->raw(crop), pbkd->raw, ...) may fail.
pbkd->raw(crop).copyTo(pbkd->raw);
cv::resize(pbkd->raw, pbkd->raw, cv::Size(width, height));
pbkd->bg_stored = true;
}
out = pbkd->raw ;
frm = 1;
}
return frm;
}

Expand Down
57 changes: 47 additions & 10 deletions app/deepseg.cc
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ std::optional<std::pair<size_t, size_t>> geometryFromString(const std::string& i
}

// OpenCV helper functions
cv::Mat convert_rgb_to_yuyv( cv::Mat input ) {
cv::Mat convert_rgb_to_yuyv(cv::Mat input) {
cv::Mat tmp;
cv::cvtColor(input, tmp, cv::COLOR_RGB2YUV);
std::vector<cv::Mat> yuv;
Expand Down Expand Up @@ -372,6 +372,7 @@ int main(int argc, char* argv[]) try {
bool flipVertical = false;
int fourcc = 0;
size_t blur_strength = 0;
cv::Rect crop_region(0, 0, 0, 0);

const char* modelname = "selfiesegmentation_mlkit-256x256-2021_01_19-v1215.f16.tflite";

Expand Down Expand Up @@ -568,6 +569,12 @@ int main(int argc, char* argv[]) try {
if (expWidth != vidGeo.value().first) {
fprintf(stderr, "Warning: virtual camera aspect ratio does not match capture device.\n");
}
// calculate crop region, only if result always smaller
if (expWidth != vidGeo->first) {
crop_region = bs_calc_cropping(
capGeo->first, capGeo->second,
vidGeo->first, vidGeo->second);
}

// dump settings..
printf("debug: %d\n", debug);
Expand Down Expand Up @@ -600,7 +607,11 @@ int main(int argc, char* argv[]) try {
}
}
// default green screen background (at capture true geometry)
cv::Mat bg = cv::Mat(capGeo.value().second, capGeo.value().first, CV_8UC3, cv::Scalar(0, 255, 0));
std::pair<size_t, size_t> bg_dim = *capGeo;
if (crop_region.height) {
bg_dim = {crop_region.width, crop_region.height};
}
cv::Mat bg(bg_dim.second, bg_dim.first, CV_8UC3, cv::Scalar(0, 255, 0));

// Virtual camera (at specified geometry)
int lbfd = loopback_init(s_vcam, vidGeo.value().first, vidGeo.value().second, debug);
Expand All @@ -613,11 +624,24 @@ int main(int argc, char* argv[]) try {
loopback_free(lbfd);
});


// Processing components, all at capture true geometry
cv::Mat mask(capGeo.value().second, capGeo.value().first, CV_8U);
std::pair<size_t, size_t> mask_dim = *capGeo;
if (crop_region.height) {
mask_dim = {crop_region.width, crop_region.height};
}
cv::Mat mask(mask_dim.second, mask_dim.first, CV_8U);

cv::Mat raw;
CalcMask ai(s_model.value(), threads, capGeo.value().first, capGeo.value().second);
int aiw,aih;
if (!crop_region.width) {
aiw=capGeo->first;
aih=capGeo->second;
} else {
aiw=crop_region.width;
aih=crop_region.height;
}
CalcMask ai(*s_model, threads, aiw, aih);

ti.lastns = timestamp();
printf("Startup: %ldns\n", diffnanosecs(ti.lastns,ti.bootns));

Expand All @@ -631,22 +655,35 @@ int main(int argc, char* argv[]) try {
// copy new frame to buffer
cap.retrieve(raw);
ti.retrns = timestamp();

if (raw.rows == 0 || raw.cols == 0) continue; // sanity check

if (crop_region.height) {
raw(crop_region).copyTo(raw);
}
ai.set_input_frame(raw);
ti.copyns = timestamp();

if (raw.rows == 0 || raw.cols == 0) continue; // sanity check
// do background detection magic
ai.get_output_mask(mask);
ti.copyns = timestamp();

if (filterActive) {
// do background detection magic
ai.get_output_mask(mask);

// get background frame:
// - specified source if set
// - copy of input video if blur_strength != 0
// - default green (initial value)
bool canBlur = false;
if (pbk) {
if (grab_background(pbk, capGeo.value().first, capGeo.value().second, bg)<0)
int tw,th;
if (crop_region.height) {
tw = crop_region.width;
th = crop_region.height;
} else {
tw = capGeo->first;
th = capGeo->second;
}
if (grab_background(pbk, tw, th, bg) < 0)
throw "Failed to read background frame";
canBlur = true;
} else if (blur_strength) {
Expand Down
27 changes: 26 additions & 1 deletion lib/libbackscrub.cc
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,11 @@ bool bs_maskgen_process(void *context, cv::Mat &frame, cv::Mat &mask) {

// scale up into full-sized mask
cv::Mat tmpbuf;
cv::resize(ctx.ofinal(ctx.in_roidim),tmpbuf,ctx.mroi.size());
// with body-pix-float-050-8.tflite the size of ctx.ofinal is 33x33
// and the wanted roi may be greater as 33x33 so we can crash with
// cv::resize(ctx.ofinal(ctx.in_roidim),tmpbuf,ctx.mroi.size());
ctx.ofinal.copyTo(tmpbuf);
cv::resize(tmpbuf, tmpbuf, ctx.mroi.size());
Copy link
Collaborator

@phlash phlash Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is correct? Selecting an ROI from the final output is done because we may have centred the frame into the model earlier (line 289), this change removes the selection step. The calculations at line 237 onward should ensure that in_roidim cannot be larger than ofinal (unless the model itself has output dim < input dim, which I have never seen but I guess could occur? which is indeed the case for the bodypix model). Looks like some assumptions in this code need reviewing, and an additional model output roidim should be calculated to use here.

After going away and thinking about this - this is a separate bug that you do not need to fix in this PR. Let's raise another issue for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the original code because I had a crash within opencv.
I ha done this because the model used had an output of 33x33.
I have already performed a lot of tests and never shown an error here. I will check this again.TT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just tested this again. With the old code, using backscrub/models/body-pix-float-050-8.tflite I get again a crash, with my correction this work well.
the real size for ofinal is 33x33 and with the ctx.in_roidim we have a size of 256x256!!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - it's because they have an output stride variable in the model which defaults to 32, so we get input dim/32 == 257/32 ~= 33.

I'm happy to leave the crash for now, you are officially "not making it worse" with this PR 😁, and we'll probably want to revisit the logic in multiple places to fix this properly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now covered by issue #156, and thus can be fixed by a separate PR that properly deals with the fact that we do not account for models that have different input and output sizes at all.


// blur at full size for maximum smoothness
cv::blur(tmpbuf,ctx.mroi,ctx.blur);
Expand All @@ -375,3 +379,24 @@ bool bs_maskgen_process(void *context, cv::Mat &frame, cv::Mat &mask) {
return true;
}

cv::Rect bs_calc_cropping(int inWidth, int inHeight, int targetWidth, int targetHigh) {
// if the input and output aspect ratio are not the same
// we can crop the source image. For example if the
// input image has a 16:9 (1280x720) ratio and the output is 4:3 (960x720)
// we will return the cropRegion set as x=160, width=960, y=0, height=720
// which is the centered part of the original image
cv::Rect cropRegion = {0, 0, 0, 0};
float sc = (float)targetWidth / inWidth;
float st = (float)targetHigh / inHeight;
sc = st > sc ? st : sc;

int sx = (int)(targetWidth / sc) - inWidth;
cropRegion.x = (sx < 0 ? -sx : sx) / 2;

int sy = (int)(targetHigh / sc) - inHeight;
cropRegion.y = (sy < 0 ? -sy : sy) / 2;

cropRegion.width = inWidth - cropRegion.x * 2;
cropRegion.height = inHeight - cropRegion.y * 2;
return cropRegion;
}
2 changes: 2 additions & 0 deletions lib/libbackscrub.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,6 @@ extern void bs_maskgen_delete(void *context);
// Process a video frame into a mask
extern bool bs_maskgen_process(void *context, cv::Mat& frame, cv::Mat &mask);

extern cv::Rect bs_calc_cropping(int inWidth, int inHeight, int targetWidth, int targetHight);

#endif