Skip to content

Conversation

@amadio
Copy link
Member

@amadio amadio commented May 10, 2017

Passes test_snapshot.C on my computer. However, I believe there are problems elsewhere when more than one cluster exists in the generated trees (i.e. when ForeachSlot actually has more than one slot). We need to make sure to test TDataFrame with trees with multiple slots in the future.

Not closing the file breaks code that needs to open the merged output
file immediately afterwards. For example, the TDataFrame::Snapshot()
action.
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with CMake flags -Dvc=OFF -Dimt=ON -Dccache=ON

@amadio amadio requested a review from dpiparo May 10, 2017 11:06
@amadio
Copy link
Member Author

amadio commented May 10, 2017

For example, the script below sometimes fails because the Mean results are different due to the different ordering of the data. Setting the auto flush to a large number then should get rid of the problem because then there will be only a single cluster in the input tree.

#include "TFile.h"
#include "TH1F.h"
#include "TTree.h"

#include <iostream>
#include <random>

#include "ROOT/TDataFrame.hxx"

void fill_tree(const char *filename, const char *treeName)
{
   TFile f(filename, "RECREATE");
   TTree t(treeName, treeName);

   t.SetAutoFlush(64); // for creation of clusters to test in MT mode

   int b1;
   float b2;
   t.Branch("b1", &b1);
   t.Branch("b2", &b2);
   for (int i = 0; i < 1024 * 64; ++i) {
      b1 = i;
      b2 = std::rand();
      t.Fill();
   }
   t.Write();
   f.Close();
   return;
}

class A {
public:
   A(){};
   A(int i):fI(i){}
   int GetI(){return fI;}
private:
   int fI = 0;
};

int test_snapshot()
{
   ROOT::EnableImplicitMT();

   auto fileName = "test_snapshot.root";
   auto outFileName = "test_snapshot_output.root";
   auto treeName = "myTree";

   fill_tree(fileName, treeName);

   ROOT::Experimental::TDataFrame d(treeName, fileName);

   auto d_cut = d.Filter("b1 % 2 == 0");

   auto d2 = d_cut.Define("a","A(b1)")
                  .Define("b1_square", "b1 * b1")
                  .Define("b2_vector", [](float b2){ std::vector<float> v; for (int i=0;i < 3; i++) v.push_back(b2*i); return v;}, {"b2"});

   auto snapshot_tdf = d2.Snapshot<int, int, std::vector<float>, A>(treeName, outFileName, {"b1", "b1_square", "b2_vector", "a"});

   // Open the new file and list the branches of the tree
   TFile f(outFileName);
   TTree* t;
   f.GetObject(treeName, t);
   for (auto branch : *t->GetListOfBranches()) {
      std::cout << "Branch: " << branch->GetName() << std::endl;
   }
   f.Close();

   auto max_b1 = snapshot_tdf.Max("b1");
   auto max_a = snapshot_tdf.Define("a_val",[](A& a){return a.GetI();},{"a"}).Max("a_val");

   std::cout << "Maximum: max(b1) = " << *max_b1 << ", max(A(b1)) = " << *max_a << std::endl;

   if (*max_b1 != *max_a) {
      std::cerr << "Error: the maximum values of two branches which are supposed to be identical differ!\n";
      return 1;
   }

   return 0;
}

fCV.notify_one();

fMergingThread->join();
fFile->Close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should'nt it be SafeDelete(fFile);
i.e. is the object pointed to by fFile re-used after this statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, fFile is owned by the TBufferMerger class, it's a unique_ptr. It gets destroyed with the server. I just noticed, however, that the documentation on the header is wrong, I will update it.

unsigned int nSlots = df->GetNSlots();
TBufferMerger merger(filename.c_str(), "RECREATE");
std::vector<std::shared_ptr<TBufferMergerFile>> files(nSlots);
std::vector<std::shared_ptr<TTree>> trees(nSlots);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 'finding' the pointer where the shared_ptr are copied/shared. Wouldn't this work the same with unique_ptr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[As a side note since ownership is 'implicitly' shared with the TFile. A raw pointer here would work too].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files need to be shared, because the server returns std::shared_ptr and keeps a weak_ptr to them to check they are expired when the server goes down. The TTrees can probably be unique_ptr, but there is not std::make_unique in C++11, only make_shared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but there is not std::make_unique in C++11
So? you can still make unique_ptr, the syntax is just slightly longer (unique_ptr(new X(...))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will make it unique_ptr, although I don't think there is that big of an advantage over shared_ptr here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, should we just backport std::make_unique to be able to use it now? If string_view was backported, I think backporting std::make_unique would be even more useful to write better code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@pcanal pcanal May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, should we just backport std::make_unique to be able to use it now? If string_view was backported, I think backporting std::make_unique would be even more useful to write better code.

@Axel-Naumann What's is your opinion on this?

@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with CMake flags -Dvc=OFF -Dimt=ON -Dccache=ON

@phsft-bot
Copy link

Build failed on mac1012/native.
See console output.

Errors:

  • clang: error: linker command failed with exit code 1 (use -v to see invocation)

@amadio
Copy link
Member Author

amadio commented May 10, 2017

Failed with 13:53:02 FAILED: lib/libOracle.so. Is master broken?

@amadio amadio force-pushed the tdf-parallel-snapshot branch from 157fed0 to 77ff196 Compare May 10, 2017 12:03
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with CMake flags -Dvc=OFF -Dimt=ON -Dccache=ON

@phsft-bot
Copy link

Build failed on mac1012/native.
See console output.

Errors:

  • clang: error: linker command failed with exit code 1 (use -v to see invocation)

@phsft-bot
Copy link

@dpiparo dpiparo merged commit 2e2e90d into root-project:master May 10, 2017
@amadio
Copy link
Member Author

amadio commented May 10, 2017

@dpiparo Thank you for merging. Let's keep track of that failing test, however. It runs on my machine, but there may be an actual problem there due to the different environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants