Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Request Cached Methods List from JIT Server #20129

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

luke-li-2003
Copy link
Contributor

@luke-li-2003 luke-li-2003 commented Sep 6, 2024

Added an option that makes the client request a list of cached methods from the server on bootstrap, and set the count for those methods as scount to improve rampup

Depends on eclipse/omr#7458

@luke-li-2003
Copy link
Contributor Author

@mpirvu

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Sep 6, 2024
Copy link

@marius-pirvu marius-pirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided a partial review of the code.

return _cacheName;
}
private:
std::string _cacheName;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be marked const because nobody should be modifying it.

{
return "Requesting AOT cache content";
}
std::string getCacheName() const

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could return a const reference to the embedded string

class StreamAotCacheMapRequest: public virtual std::exception
{
public:
StreamAotCacheMapRequest(std::string cacheName) : _cacheName(cacheName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's better to use a const reference for the parameter.

@@ -6159,6 +6159,9 @@ typedef struct J9JavaVM {
UDATA closeScopeNotifyCount;
#endif /* JAVA_SPEC_VERSION >= 22 */
UDATA unsafeIndexableHeaderSize;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid adding new fields to the JavaVM struct. If only the JIT needs them, then J9JITConfig is a better place. The new fields are specific to JITServer so they need to be protected with #if defined(J9VM_OPT_JITSERVER)

return rc;
}

JITServer::ClientStream *client =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server is not available, this will throw an exception which needs to be caught.

@@ -1101,6 +1109,43 @@ TR::CompilationInfoPerThreadRemote::processEntry(TR_MethodToBeCompiled &entry, J
stream->writeError(compilationLowPhysicalMemory, (uint64_t) computeServerMemoryState(getCompilationInfo()));
abortCompilation = true;
}
catch (const JITServer::StreamAotCacheMapRequest &e)
{
std::string aotCacheName = e.getCacheName();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return just a const reference to the internal cache name.

if (TR::Options::getVerboseOption(TR_VerboseJITServer))
{
TR_VerboseLog::writeLineLocked(TR_Vlog_JITServer,
"compThreadID=%d handling request for AOT cache list %s",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"handling request for AOT cache list %s" --> "handling request for AOT cache %s method list"

auto cachedAOTMethod = aotCache->getCachedMethodHead();

std::vector<std::string> methodSignaturesV;
for (;cachedAOTMethod != NULL;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to acquire the lock that protects the map/list of methods.
We also need to account for possible exceptions (like not enough memory to allocate)

auto aotCache = aotCacheMap->get(aotCacheName, 0, pending);
auto cachedAOTMethod = aotCache->getCachedMethodHead();

std::vector<std::string> methodSignaturesV;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you know the number of elements in the vector you are building, it's better to size the vector to the right length, to avoid multiple resize operations later on which involves copying object.

cachedAOTMethod = cachedAOTMethod->getNextRecord())
{
const SerializedAOTMethod &serializedAOTMethod = cachedAOTMethod->data();
std::string sig = std::string(serializedAOTMethod.signature());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, the code can throw bad_alloc so we need another try region.

@luke-li-2003 luke-li-2003 force-pushed the GetServerAOTCacheList branch 4 times, most recently from 1796628 to 0952284 Compare September 17, 2024 14:27
@luke-li-2003
Copy link
Contributor Author

Forgot to force-push, should be fine now

@luke-li-2003 luke-li-2003 force-pushed the GetServerAOTCacheList branch 3 times, most recently from dcc00d3 to cf589c4 Compare September 17, 2024 15:31
int32_t sigLen = J9UTF8_LENGTH(className) + J9UTF8_LENGTH(name) +
J9UTF8_LENGTH(signature) + 2;

char *sigC = new char[sizeof(char) * (sigLen + 1)];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenJ9 has its own memory management that provides accounting, so using the system allocator is not a good choice. On the other hand we want to minimize the usage of persistent allocator because of the overhead and fragmentation it can create. Here's an example on how this achieved in other parts of the code: https://github.com/eclipse-openj9/openj9/blob/master/runtime/compiler/control/HookedByTheJit.cpp#L180
Basically we use the stack for most methods, and for the few very long signatures we use the persistent allocator. In this case I think that we can not set a low count for methods that have a very long name. These should be very few.


// contains
methodCachedAtServer =
(serverAOTMethodSet->find(std::string(sigC)) == serverAOTMethodSet->end());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that == needs to be changed to !=

#if defined(J9VM_OPT_JITSERVER)
else if (methodCachedAtServer) // Higher priority than SCC
{
int32_t scount = optionsAOT->getInitialSCount();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check: if local shared classes is not used, optionsAOT may not exist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may use countInOptionSet || TR::Options::getCountsAreProvidedByUser() to check whether the use has specified counts.

@@ -1101,6 +1108,58 @@ TR::CompilationInfoPerThreadRemote::processEntry(TR_MethodToBeCompiled &entry, J
stream->writeError(compilationLowPhysicalMemory, (uint64_t) computeServerMemoryState(getCompilationInfo()));
abortCompilation = true;
}
catch (const JITServer::StreamAotCacheMapRequest &e)
{
const std::string aotCacheName = e.getCacheName();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a reference since that's what getCacheName() returns.

Copy link

@marius-pirvu marius-pirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the IDs of the various network messages has changed, we need to increment the minor version of the server.
Look how we do that in PR #19477 in file runtime/compiler/net/CommunicationStream.hpp

methodCachedAtServer =
(serverAOTMethodSet->find(std::string(sigC)) != serverAOTMethodSet->end());

if (TR::Options::getVerboseOption(TR_VerboseJITServer))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many of these messages that will be printed. I suggest we do the printing when both TR_VerboseJITServer and TR_VerboseCounts are enabled, since this code relates to setting counts.

@@ -475,6 +515,24 @@ static void jitHookInitializeSendTarget(J9HookInterface * * hook, UDATA eventNum
TR::Options::getHighCodeCacheOccupancyBCount() :
TR::Options::getHighCodeCacheOccupancyCount();
}
#if defined(J9VM_OPT_JITSERVER)
else if (methodCachedAtServer) // Higher priority than SCC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this code should not have higher priority than the case when a method is in local SCC, because the JIT will load the method from local SCC if available (it's just faster than the remote AOT cache)


auto aotCacheMap = compInfo->getJITServerAOTCacheMap();
bool pending = false;
auto aotCache = aotCacheMap->get(aotCacheName, 0, pending);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask Christian about the interaction of this code with AOT cache persistence.
@param pending Output flag indicating that a cache load operation from file is in progress, but not read yet
Can we just ignore the pending parameter?

{
const SerializedAOTMethod &serializedAOTMethod = cachedAOTMethod->data();
std::string sig = std::string(serializedAOTMethod.signature());
methodSignaturesV.push_back(sig);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion
methodSignaturesV.push_back(std::string(serializedAOTMethod.signature());:

{
{
OMR::CriticalSection cs(cachedMethodMonitor);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

size_t numAOTMethods = _cachedMethodMap.size(); // Get number of methods in AOT cache (needs an accessor)
methodSignaturesV.reserve(numAOTMethods);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the size of the map needs to be done after we obtain the lock.

@@ -4181,6 +4239,9 @@ void JitShutdown(J9JITConfig * jitConfig)

TR::CompilationInfo * compInfo = TR::CompilationInfo::get(jitConfig);

if (jitConfig->serverAOTMethodSet)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to destruct the set. Calling just jitPersistentFree will only deallocate the memory used for the set itself, but not the entries added to the set.

@luke-li-2003 luke-li-2003 force-pushed the GetServerAOTCacheList branch 6 times, most recently from 2fa5254 to 4f059f1 Compare September 23, 2024 16:20
catch (const std::bad_alloc &e)
{
if (TR::Options::isAnyVerboseOptionSet(TR_VerboseJITServer,
TR_VerboseCompilationDispatch))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason for printing a message into vlog under TR_VerboseCompilationDispatch

@@ -628,6 +669,24 @@ static void jitHookInitializeSendTarget(J9HookInterface * * hook, UDATA eventNum
}
#endif // defined(J9VM_INTERP_AOT_COMPILE_SUPPORT) && defined(J9VM_OPT_SHARED_CLASSES) && (defined(TR_HOST_X86) || defined(TR_HOST_POWER) || defined(TR_HOST_S390) || defined(TR_HOST_ARM) || defined(TR_HOST_ARM64))
} // if (TR::Options::sharedClassCache())
#if defined(J9VM_OPT_JITSERVER)
else if (methodCachedAtServer) // Not cached in SCC but the server

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to set a low count either when SCC/AOT is not set, or when it is set but the method is not in local SCC.
We may have to use the code below in more than one place. Maybe define it in a function?

Added an option that makes the client request a list of cached
methods from the server on bootstrap, and set the count for
those methods as scount to improve rampup

Signed-off-by: Luke Li <luke.li@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project depends:omr
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

4 participants