Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Fix getting affinity set on MUSL on Jetson TX2
The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the
number of processors reported as configured in the system is always
larger than the maximum CPU index. However, it turns out that it is not
true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and
5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports
6 as the number of configured CPUs, however MUSL reports just 4. The
PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as
configured as the upper bound for scanning affinity set, so on Jetson
TX2, the affinity mask returned had just two bits set while there were
4 CPUs. That triggered an assert in the GCToOSInterface::Initialize.

This change fixes that by reading the maximum CPU index from the
/proc/cpuinfo. It falls back to using the number of processors
configured when the /proc/cpuinfo is not available (on macOS, FreeBSD,
...)
  • Loading branch information
janvorli committed Nov 22, 2019
commit 3b9f16d40bc280236b99055781c131635671b5fa
3 changes: 3 additions & 0 deletions src/coreclr/src/pal/src/include/pal/palinternal.h
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,9 @@ typedef enum _TimeConversionConstants
bool
ReadMemoryValueFromFile(const char* filename, uint64_t* val);

int
GetMaxCpuIndex();

#ifdef __APPLE__
bool
GetApplicationContainerFolder(PathCharString& buffer, const char *applicationGroupId, int applicationGroupIdLength);
Expand Down
49 changes: 48 additions & 1 deletion src/coreclr/src/pal/src/misc/sysinfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ DWORD
PALAPI
PAL_GetLogicalCpuCountFromOS()
{
static int nrcpus = -1;
static volatile int nrcpus = -1;

if (nrcpus == -1)
{
Expand All @@ -167,6 +167,53 @@ PAL_GetLogicalCpuCountFromOS()
return nrcpus;
}

/*++
Function:
GetMaxCpuIndex

The GetMaxCpuIndex function returns maximum index of CPU available on the system. On some systems, it doesn't match
the PAL_GetTotalCpuCount() - 1, as the CPU indices may form a noncontinuous range. E.g. on Jetson TX2, there are
four CPUs available and their indices are 0, 3, 4, 5. While glibc returns 6 as the number of CPUs configured,
MUSL returns 4.
--*/
int GetMaxCpuIndex()
{
static volatile int maxCpuIndex = -1;

if (maxCpuIndex == -1)
{
FILE* cpuInfoFile = fopen("/proc/cpuinfo", "r");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under #ifdef __linux__ or something like this. (procfs exists in other Unices but cpuinfo is Linux-only.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intentionally not like that. I try to avoid checking for specific platform if possible. This code is run just once and on non-linux platforms, it fails to open the file and uses the fallback path.

if (cpuInfoFile != NULL)
{
char *line = nullptr;
size_t lineLen = 0;

while (getline(&line, &lineLen, cpuInfoFile) != -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not use fscanf() directly? Saves a malloc roundtrip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run this just once, so couple of mallocs don't hurt. And this is a pattern that we use at quite a number of places, so I wanted to be consistent. Also, for fscanf, I would need to use a fixed line buffer, assume max line size etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No need to pass a buffer to fscanf() if all you're looking for is an integer with %d.)

{
int cpuIndex;
int fieldsParsed = sscanf(line, "processor : %d", &cpuIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I looked at this, but the format of /proc/cpuinfo is completely different for each and every architecture supported by Linux. It just so happens that both x86 (32 and 64-bit) and ARM (32 and 64 bit) have similar structure there as far as lines matching that scanf() format string, but this is quite fragile.

A better way on Linux is to read /sys/devices/system/cpu/online (for the online CPUs) or /sys/devices/system/cpu/possible (for the possible CPUs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't know the format is so architecture specific that it may not contain the "processor:" line. Thank for pointing it out.


if ((fieldsParsed == 1) && (cpuIndex > maxCpuIndex))
{
maxCpuIndex = cpuIndex;
}
}

free(line);
fclose(cpuInfoFile);
}

if (maxCpuIndex == -1)
{
// Reading the /proc/cpuinfo has failed, return the total CPU count - 1 as a fallback as we don't
// have anything better
maxCpuIndex = PAL_GetTotalCpuCount() - 1;
}
}

return maxCpuIndex;
}

/*++
Function:
GetSystemInfo
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/src/pal/src/thread/thread.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2935,10 +2935,10 @@ PAL_GetCurrentThreadAffinitySet(SIZE_T size, UINT_PTR* data)
if (st == 0)
{
const SIZE_T BitsPerBitsetEntry = 8 * sizeof(UINT_PTR);
int nrcpus = PAL_GetTotalCpuCount();
int maxCpuIndex = GetMaxCpuIndex();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use CPU_COUNT macro instead of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU_COUNT counts number of set bits in the set, that's not what we want. We could possibly use the CPU_SETSIZE macro, however it would mean that we would scan 1024 bits (the current value of the macro). But maybe you are right, we call this function just once in the whole process lifetime, so it wouldn't hurt. And if we ever need to use it more, we can always add this optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU_COUNT counts number of set bits in the set

You can potentially use it to exit the loop once all set bits are processed, without going all the way to 1024.

But I agree that looping till CPU_SETSIZE should be ok for one-time initialization like this.


// Get info for as much processors as it is possible to fit into the resulting set
SIZE_T remainingCount = std::min(size * BitsPerBitsetEntry, (SIZE_T)nrcpus);
SIZE_T remainingCount = std::min(size * BitsPerBitsetEntry, (SIZE_T)maxCpuIndex + 1);
SIZE_T i = 0;
while (remainingCount != 0)
{
Expand Down