Skip to content
This repository was archived by the owner on Nov 15, 2021. It is now read-only.

Conversation

@lanfeust69
Copy link
Contributor

Not clear to me how to unit test this...

When OpenCover runs on a number of subprocesses, OpenCover.Console.exe could crash due to the problem in InstrumentationPoint.cs.
Additionally, some of these subprocesses could die badly (since the error occurs in the native profiler dll) because of the issue in ProfilerCommunication.cpp. With a small test exe intensive on thread creation, I could make it crash almost instantly, but of course not anymore after the fix.

I also added the apparently "normal" line feed in traces, else it would behave poorly in windbg.

+ InstrumentationPoint.cs : concurrent calls to InstrumentationPoint's
constructor could result in a mismatch between its index in
InstrumentationPoints and its UniqueSequencePoint. Moreover, if things can go
really bad if the InstrumentationPoints.Add() triggers a resize.

+ ProfilerCommunication.cpp : the *read* of m_visitmap must also be
synchronized, since it is in an "unreadable" state while written to.
@sawilde sawilde merged commit f6e0330 into OpenCover:master Dec 15, 2015
@sawilde
Copy link
Member

sawilde commented Dec 15, 2015

I made a change to ProfilerCommunication.cpp to only apply lock if necessary.

Are you able to test to see if the fix working in your setup? The change was for performance reasons as I didn't want to have an unnecessary overhead if the entry already existed.

@lanfeust69
Copy link
Contributor Author

I'm afraid that's not correct. A lookup in an unordered_map is not atomic, and can easily fail if the map is modified while being read. Besides, the symptom in #373 is an infinite loop in operator[], which could happen during the first read (the one outside the critical section). I haven't seen my TreadIntensive.exe sample crash in Release build, but in Debug build I do have assertion failures :

Assertion Failed : Expression: vector subscript out of range

OpenCover_Profiler!std::_Debug_message+0x40
OpenCover_Profiler!std::vector<std::_List_unchecked_iterator<std::_List_val<std::_List_simple_types<std::pair<unsigned long const ,_MSG_SendVisitPoints_Request * __ptr64> > > >,std::_Wrap_alloc<std::allocator<std::_List_unchecked_iterator<std::_List_val<std::_List_simple_types<std::pair<unsigned long const ,_MSG_SendVisitPoints_Request * __ptr64> > > > > > >::operator[]+0x51
OpenCover_Profiler!std::_Hash<std::_Umap_traits<unsigned long,_MSG_SendVisitPoints_Request * __ptr64,std::_Uhash_compare<unsigned long,std::hash<unsigned long>,std::equal_to<unsigned long> >,std::allocator<std::pair<unsigned long const ,_MSG_SendVisitPoints_Request * __ptr64> >,0> >::_Vec_lo+0x3c
OpenCover_Profiler!std::_Hash<std::_Umap_traits<unsigned long,_MSG_SendVisitPoints_Request * __ptr64,std::_Uhash_compare<unsigned long,std::hash<unsigned long>,std::equal_to<unsigned long> >,std::allocator<std::pair<unsigned long const ,_MSG_SendVisitPoints_Request * __ptr64> >,0> >::_Begin+0x37
OpenCover_Profiler!std::_Hash<std::_Umap_traits<unsigned long,_MSG_SendVisitPoints_Request * __ptr64,std::_Uhash_compare<unsigned long,std::hash<unsigned long>,std::equal_to<unsigned long> >,std::allocator<std::pair<unsigned long const ,_MSG_SendVisitPoints_Request * __ptr64> >,0> >::lower_bound+0x76
OpenCover_Profiler!std::_Hash<std::_Umap_traits<unsigned long,_MSG_SendVisitPoints_Request * __ptr64,std::_Uhash_compare<unsigned long,std::hash<unsigned long>,std::equal_to<unsigned long> >,std::allocator<std::pair<unsigned long const ,_MSG_SendVisitPoints_Request * __ptr64> >,0> >::find+0x44
OpenCover_Profiler!ProfilerCommunication::GetVisitMapForOSThread+0x57
OpenCover_Profiler!ProfilerCommunication::AddVisitPointToThreadBuffer+0x3f
OpenCover_Profiler!CCodeCoverage::AddVisitPoint+0xbd
OpenCover_Profiler!InstrumentPointVisit+0x2c

so even if it seems to work by chance (which is logical : there is a much higher chance of things going really bad when modifying concurrently), it isn't reliable.

However, this gives me a baseline for performance that I didn't have before (where the "fast" version crashed almost instantly), so I can look again at the alternatives.

On a somewhat related note, I found other issues (including very likely other race conditions) that I'm still investigating, I'll try to push a PR when our "real" coverage runs are consistent (which is not the case for now, by far), those are big things, with a few thousand processes spawned, and a cumulated run time of 8 days over an elapsed time of 5 hours. I'll soon be away for the holidays though, and I'm really not sure I'll be done before, so it'll likely be next year.

@sawilde
Copy link
Member

sawilde commented Dec 19, 2015

thanks - do you think switching to concurrent collections might be better - a balance between performance and stability?

@lanfeust69
Copy link
Contributor Author

Yes, concurrent collections are very likely the way to go, even if there may be some specific places were a still better option is possible.
BTW, at least one other issue is the concurrent use of Module.Aliases, which can cause exceptions in BasePersistance.GetMethod().
As expected, I haven't been able to finish my investigations before leaving for the holidays, but I'll be back on it next year.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants