{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":280276446,"defaultBranch":"main","name":"kineto","ownerLogin":"pytorch","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2020-07-16T23:03:00.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/21003710?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1716334956.0","currentOid":""},"activityList":{"items":[{"before":"2f7ce6f5475bba4e630e4d43504b8c2d5bd390da","after":"3d355d17e15d0fe647e86146434e6edd68025f74","ref":"refs/heads/main","pushedAt":"2024-09-13T23:53:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Overlapping D2H transfer example\n\nSummary:\nThis diff instantiates 6 CPU threads that schedule CUDA kernels and memory copies between CPU and GPU. For each thread there are 2 CUDA streams: one for CUDA kernels, one for memory copies. For every kernel call, we instantiate in advance a pair of inputs and outputs on the host and on the device.\n\nThe CUDA kernel needs the following inputs:\n- A buffer input\n- B buffer input\n- C buffer output\n\nThe memory copies are as follows:\n- A is in pinned memory and copied via a H2D\n- B is in pageable memory and copied via H2D\n- 50% of C output buffers are in pinned and 50% are in pageable\n\nThe D2H to H2D ratio in the current config is somwhere around 2:1\n\n3 out of 6 threads are issuing transactions on the lowest CUDA stream priority and the rest of them are using default priority.\n\nReviewed By: xerothermic\n\nDifferential Revision: D62670279\n\nfbshipit-source-id: e3bca9af984eeae4bce865c35663bfd53ef97ba0","shortMessageHtmlLink":"Overlapping D2H transfer example"}},{"before":"79be4704ca3d04728ef51bd19300d9e0fc99003c","after":"2f7ce6f5475bba4e630e4d43504b8c2d5bd390da","ref":"refs/heads/main","pushedAt":"2024-09-13T19:20:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Extend CPU User Annotations to End of Profile (#986)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/986\n\nIf a CPU User Annotation doesn't end by the time the profile ends, the annotation is marked as a 0-length event. This can be annoying to look at because it seems like profiler just never got the annotation event when it did. Lets set the end time to the end of profiling.\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D62604717\n\nfbshipit-source-id: 34cb06b87c3c369601e1e6df859f61377b8198f6","shortMessageHtmlLink":"Extend CPU User Annotations to End of Profile (#986)"}},{"before":"ca1eedb9214bb12fa906180575894ffb78de8ac7","after":"79be4704ca3d04728ef51bd19300d9e0fc99003c","ref":"refs/heads/main","pushedAt":"2024-09-13T03:39:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Modify CUDA test to attempt overlapping D2H transfers\n\nSummary:\nDistributed checkpointing for GenAI requires very expensive memory downloads from the GPU which can block the trainer thread if it happens that it issues a new D2H transfer.\n\nFor example, we want that model parameters and optimizer state downloads to overlap with compute. However if for some reason the forward pass thread or the backward pass issue a D2H transfer, it will have to wait until the checkpoint download was completed.\n\nThis code is a test program for Kineto that issues CUDA kernels, memory copies and UVM accesses in a configurable way. This change enables us to issue multiple GPU D2H downloads to host memory using multiple streams on multiple threads. Previously the D2H downloads were very short because we downloaded a single output value of 4 bytes. With the change we download an entire buffer.\n\nReviewed By: xerothermic\n\nDifferential Revision: D62601073\n\nfbshipit-source-id: ed192723403787f37d45bf63d39e1a768df4a1d3","shortMessageHtmlLink":"Modify CUDA test to attempt overlapping D2H transfers"}},{"before":"76f23345a70fca6ba7e31939a6443c959454239b","after":"ca1eedb9214bb12fa906180575894ffb78de8ac7","ref":"refs/heads/main","pushedAt":"2024-09-13T01:43:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add CUPTI/RoCM versions to traces (#985)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/985\n\nBecause of the differences that are emerging between different versions, it would be useful in the metadata we could see which third-party library version we are using. We add them to our kineto traces in this diff.\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D62538511\n\nfbshipit-source-id: 813af45c1d2e82002ca7b4b7f3788407f13c254c","shortMessageHtmlLink":"Add CUPTI/RoCM versions to traces (#985)"}},{"before":"58ed2c0229465f80f9ccc32fb9cb3bd4c7074e9c","after":"76f23345a70fca6ba7e31939a6443c959454239b","ref":"refs/heads/main","pushedAt":"2024-08-30T15:47:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add Grid/Block To AMD Kernel Profiles (#983)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/983\n\nRoctracer does not give the grid/block alongside device activities; however, they do have the information in the launch event. Using the correlation we can then stitch these properties using a map from correlation to grid or block. Currently this won't work for RCCL events until https://github.com/ROCm/roctracer/issues/100 is resolved\n\nReviewed By: leitian, aaronenyeshi\n\nDifferential Revision: D61743013\n\nfbshipit-source-id: 1205c62f45e8982b88f7a664857090d981f2cb3c","shortMessageHtmlLink":"Add Grid/Block To AMD Kernel Profiles (#983)"}},{"before":"464bccf98be687a03ea8e23b52fd1448f466d5cc","after":"58ed2c0229465f80f9ccc32fb9cb3bd4c7074e9c","ref":"refs/heads/main","pushedAt":"2024-08-27T02:33:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Fixed use of designated initializers Windows (#976)\n\nSummary:\nFix CUPTI code not compiling on MSVC due to designated initializers not working on Windows.\n`C:\\Work\\pytorch\\third_party\\kineto\\libkineto\\src\\CuptiActivityProfiler.cpp(658): error C7555: use of designated initializers requires at least '/std:c++20'`\n\nPull Request resolved: https://github.com/pytorch/kineto/pull/976\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D61810245\n\nPulled By: sraikund16\n\nfbshipit-source-id: 240a8163477e70b3a3cef0a78bc7cea260d5528e","shortMessageHtmlLink":"Fixed use of designated initializers Windows (#976)"}},{"before":"ba4caf65c07238a02c859c46f5729e7402f4a252","after":"464bccf98be687a03ea8e23b52fd1448f466d5cc","ref":"refs/heads/main","pushedAt":"2024-08-22T20:28:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Enable `-Wheader-hygiene` for kineto/PACKAGE +1 (#981)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/981\n\nPer title\n\n#buildmore - Be thorough\n\n#buildsonlynotests - No runtime effects!\n\n - If you approve of this diff, please use the \"Accept & Ship\" button\n\nno-ig-test - Doesn't require Instagram testing.\n\n(1 file modified.)\n\nReviewed By: dmm-fb\n\nDifferential Revision: D56539017\n\nfbshipit-source-id: 79940cb7874dca56e07726f3171bc177afa81d56","shortMessageHtmlLink":"Enable -Wheader-hygiene for kineto/PACKAGE +1 (#981)"}},{"before":"ba989ce52ace5372a8e7d299294722715bda584a","after":"ba4caf65c07238a02c859c46f5729e7402f4a252","ref":"refs/heads/main","pushedAt":"2024-08-22T17:03:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Correct the condition when setting External id (#978)\n\nSummary:\nThe `External id` will be set to op.linkedActivity()->correlationId() if op.linkedActivity() is not nullptr. So in `if` condition we should check if op.linkedActivity()->correlationId() is 0.\n\nPull Request resolved: https://github.com/pytorch/kineto/pull/978\n\nReviewed By: xuzhao9\n\nDifferential Revision: D61608345\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: edc2c121a970756f008cae10a175b1f7066af18c","shortMessageHtmlLink":"Correct the condition when setting External id (#978)"}},{"before":"120cbc2b8018330ed8d4984520c0d3ad236ea2a9","after":"ba989ce52ace5372a8e7d299294722715bda584a","ref":"refs/heads/main","pushedAt":"2024-08-22T14:56:34.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Remove libkineto_ci.yml (#980)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/980\n\nSince we already cover building libkineto static library and shared library in libkineto_cuda.yml (which has CUPTI installed), we don't need this CI build anymore.\n\nThis CI build has been broken for awhile and not required.\n\nTest Plan: N/A\n\nReviewed By: sraikund16\n\nDifferential Revision: D61612169\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: 3c6d0974aa757b7f4120227cbd2becfc81fa6ca2","shortMessageHtmlLink":"Remove libkineto_ci.yml (#980)"}},{"before":"7d5e58feb39403253b43452a9cdf6fb073cb59b9","after":"120cbc2b8018330ed8d4984520c0d3ad236ea2a9","ref":"refs/heads/main","pushedAt":"2024-08-20T15:32:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Auto-init when CUDA_INJECTION64_PATH=none is set (#979)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/979\n\nCUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none.\n\nThis diff fixes the scenario where `CUDA_INJECTION64_PATH=none` is set and on-demand trace should be enabled.\n\nTest Plan:\nCI\n\nRan locally with `CUDA_INJECTION64_PATH=none` and on-demand profiling works!\n```\n$ CUDA_INJECTION64_PATH=none buck run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_resnet_integration_test\nBuck UI: https://www.internalfb.com/buck2/282115c8-6c29-414a-9d39-c4afb73d4e52\nNetwork: Up: 0B Down: 0B\nJobs completed: 19751. Time elapsed: 1.5s.\nBUILD SUCCEEDED\n[INFO: args.py: 113]: Setting master address to localhost and port to 62421\n[INFO: pytorch_resnet_integration_test.py: 256]: Start: ready to do work 2114906\nWARNING: Logging before InitGoogleLogging() is written to STDERR\nI0819 09:07:08.277374 2116870 C10dScubaLogger.cpp:32] Registering C10dScubaLogger...\nI0819 09:07:08.584887 2116870 ProcessGroupNCCL.cpp:865] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0\nI0819 09:07:08.584949 2116870 ProcessGroupNCCL.cpp:874] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 0, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1\nINFO:2024-08-19 09:07:11 2116870:2116870 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1\nINFO:2024-08-19 09:07:11 2116870:2116870 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000\nINFO:2024-08-19 09:07:11 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 09:07:12 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nI0819 09:07:15.780548 2116870 ProcessGroupNCCL.cpp:2130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.059698 ms\nNCCL version 2.18.3+cudaCUDA_MAJOR.CUDA_MINOR\nI0819 09:07:15.851341 2116870 ProcessGroupNCCL.cpp:2240] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL created ncclComm_ 0x7f67ce8ec000 on CUDA device:\nI0819 09:07:15.851400 2116870 ProcessGroupNCCL.cpp:2245] [PG ID 0 PG GUID 0(default_pg) Rank 0] NCCL_DEBUG: WARN\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 0, peak allocated GPU mem: 2.97GB, peak active GPU mem: 2.97GB, peak reserved GPU mem: 3.07GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 20, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 40, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 60, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 80, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 100, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 120, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 140, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 160, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 180, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 200, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 220, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\nINFO:2024-08-19 09:07:36 2116870:2119791 ConfigLoader.cpp:271] Received config from dyno:\n\n ACTIVITIES_COMPRESSION_ALGORITHM=GZIP\n REQUEST_GROUP_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96\n REQUEST_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96\n ACTIVITIES_DURATION_MSECS=500\n PROFILE_REPORT_INPUT_SHAPES=true\n PROFILE_PROFILE_MEMORY=false\n PROFILE_WITH_STACK=false\n PROFILE_WITH_FLOPS=false\n PROFILE_WITH_MODULES=false\n ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724083651/localhost/\n REQUEST_TRACE_ID=5172367202475739244\n\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 240, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\nINFO:2024-08-19 09:07:37 2116870:2121591 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724083663089205388\nINFO:2024-08-19 09:07:37 2116870:2121591 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_2116870.json\n Log file: /tmp/libkineto_activities_2116870.json\n Trace start time: 2024-08-19 09:07:43 Trace duration: 500ms\n Warmup duration: 5s\n Max GPU buffer size: 128MB\n Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync\n Manifold bucket: gpu_traces\n Manifold object: tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json\n Trace compression enabled: 1\n TTL in seconds: 31536000 (365 days)\nINFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)\nINFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.\nINFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.\nINFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s\nINFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 260, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 280, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 300, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 320, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\nSTAGE:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up\nINFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1194] Tracing started\nINFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1219] Tracing complete.\nSTAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 340, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\nINFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers\nINFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:307] Processed 690712 GPU records (34396720 bytes)\nINFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 172266, Blocklisted runtime = 388516, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0\nINFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1269] Traces Recorded:\nINFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations\nINFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_2116870.json\nINFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_2116870.json\nSTAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing\nI0819 09:07:45.141360 2122707 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 360, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 380, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\nINFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz\nINFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:\nINFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz&bucket=gpu_traces\nINFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 3486626 us\nSTAGE:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 400, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 420, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 440, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 460, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 480, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 500, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 520, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 540, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 560, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 580, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 600, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 620, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 640, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 660, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 680, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 700, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 720, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 740, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 760, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 780, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 800, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 820, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 840, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 860, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 880, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 900, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 920, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 940, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 960, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 199]: step: 980, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.\n[INFO: pytorch_resnet_integration_test.py: 216]: running function took 83.87466025352478 seconds to complete\nI0819 09:08:26.611253 2116870 ProcessGroupNCCL.cpp:1164] [PG ID 0 PG GUID 0(default_pg) Rank 0] Launching ProcessGroupNCCL abort asynchrounously.\nI0819 09:08:26.612347 2125072 ProcessGroupNCCL.cpp:1111] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroying ncclComm_ 0x7f67ce8ec000 on CUDA device: 0\nI0819 09:08:26.612396 2125072 NCCLUtils.hpp:407] Aborting ncclComm_ 0x7f67ce8ec000 with reason: No abort reason provided.\nI0819 09:08:27.125118 2125072 ProcessGroupNCCL.cpp:1130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroyed communicator on CUDA device: 0 with stream: 3\nI0819 09:08:27.125389 2116870 ProcessGroupNCCL.cpp:1065] [PG ID 0 PG GUID 0(default_pg) Rank 0] future is successfully executed for: ProcessGroup abort\nI0819 09:08:27.125425 2116870 ProcessGroupNCCL.cpp:1170] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL aborts successfully.\nI0819 09:08:27.127446 2116870 ProcessGroupNCCL.cpp:1179] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destructor entered.\nI0819 09:08:27.127477 2116870 ProcessGroupNCCL.cpp:1200] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL watchdog thread joined.\nI0819 09:08:27.127542 2116870 ProcessGroupNCCL.cpp:1204] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL heart beat monitor thread joined.\n/usr/local/fbcode/platform010/lib/python3.10/tempfile.py:869: ResourceWarning:\n\nImplicitly cleaning up \n\n[INFO: pytorch_resnet_integration_test.py: 267]: Finish: Completed test workload\n```\n\nAlso, ran dyno/rust/kineto_integration_test:\n```\n$ buck2 run //dyno/rust/kineto_integration_test:dyno_gputrace_test -- --skip-trace-validation -j dacluster/dauser/daname/2 --collect-iters\nFile changed: fbsource//xplat/kineto/libkineto/fb/init.cpp\nFile changed: fbcode//kineto/libkineto/fb/init.cpp\nBuck UI: https://www.internalfb.com/buck2/9bee8d11-3d2d-4295-bd12-a699a50ed8dd\nNetwork: Up: 0B Down: 2.9MiB (reSessionID-8f330210-f582-4fed-bc14-cb4004759ccb)\nJobs completed: 76046. Time elapsed: 24.5s.\nCache hits: 100%. Commands: 3512 (cached: 3512, remote: 0, local: 0)\nBUILD SUCCEEDED\n----------------------------------------\nTrace collector program = common-on-demand-tracing-library\n----------------------------------------\nAutodetected job type Tw for job id dacluster/dauser/daname/2\nBuilding trace tester\nRunning /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester\nSoft Error: source_directory_includes_subpackage: Directory `v2.17.1-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.17.1-1/src/tests`.\nSoft Error: source_directory_includes_subpackage: Directory `v2.18.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.18.3-1/src/tests`.\nSoft Error: source_directory_includes_subpackage: Directory `v2.19.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.19.3-1/src/tests`.\nBuck UI: https://www.internalfb.com/buck2/9bbcd130-82f9-4e51-982d-c9d88ed8686d\nNetwork: Up: 17KiB Down: 2.9MiB (reSessionID-695fce22-aaa1-4cde-9364-bf07ee49f27e)\nJobs completed: 182380. Time elapsed: 1:10.3s.\nCache hits: 99%. Commands: 3516 (cached: 3514, remote: 1, local: 1)\nBUILD SUCCEEDED\nRunning ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based\n\nWaiting 20 for the trace tester to start up...\n----------------------------------------\nCollecting gpu trace on application with\n pid = n/a,\n job_id = dacluster/dauser/daname/2,\n collect iters = true\n----------------------------------------\nI0819 12:39:03.595689 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity)\nI0819 12:39:03.595943 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es)\nTrace Urls: [PidTraceUrlPair { pid: 3175601, trace_url: \"https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces\" }]\n\nTrace tester is running in background, please be patient...\nTrace tester application stdout:\n ---\n\n ---\nTrace tester application stderr:\n ---\nI0819 12:38:32.286183 3175601 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.\nINFO:2024-08-19 12:38:32 3175601:3175601 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000\nINFO:2024-08-19 12:38:32 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:38:33 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:38:34 3175601:3175601 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1\nW0819 12:38:57.471963 3175601 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())\nINFO:2024-08-19 12:39:07 3175601:3175844 ConfigLoader.cpp:271] Received config from dyno:\n\n ACTIVITIES_COMPRESSION_ALGORITHM=GZIP\n REQUEST_GROUP_TRACE_ID=test_group_trace_id\n ACTIVITIES_DURATION_MSECS=500\n PROFILE_REPORT_INPUT_SHAPES=true\n PROFILE_PROFILE_MEMORY=false\n PROFILE_WITH_STACK=false\n PROFILE_WITH_FLOPS=false\n PROFILE_WITH_MODULES=false\n ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/\n REQUEST_TRACE_ID=17832852696324168180\n\nINFO:2024-08-19 12:39:08 3175601:3178719 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096354388906844\nINFO:2024-08-19 12:39:08 3175601:3178719 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3175601.json\n Log file: /tmp/libkineto_activities_3175601.json\n Trace start time: 2024-08-19 12:39:14 Trace duration: 500ms\n Warmup duration: 5s\n Max GPU buffer size: 128MB\n Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync\n Manifold bucket: gpu_traces\n Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json\n Trace compression enabled: 1\n TTL in seconds: 31536000 (365 days)\nINFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)\nINFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.\nINFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.\nINFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s\nINFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s\nSTAGE:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up\nINFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1194] Tracing started\nINFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1219] Tracing complete.\nSTAGE:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection\nINFO:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers\nINFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:307] Processed 933012 GPU records (44806976 bytes)\nINFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 104243, Blocklisted runtime = 514615, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0\nINFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1269] Traces Recorded:\nINFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations\nINFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3175601.json\nINFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3175601.json\nSTAGE:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing\nI0819 12:39:16.931658 3179183 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)\nW0819 12:39:20.854436 3179426 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier\nE0819 12:39:20.919210 3179426 api.cpp:491] The specified logical config name() is not valid\nW0819 12:39:20.919265 3179426 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''\nI0819 12:39:20.919397 3179426 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'\nINFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz\nINFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:\nINFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces\nINFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6317931 us\nSTAGE:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push\nWARNING:2024-08-19 12:40:12 3175601:3175844 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client\n\n ---\nAutodetected job type Tw for job id dacluster/dauser/daname/2\nBuilding trace tester\nRunning /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester\nBuck UI: https://www.internalfb.com/buck2/bf79fa5f-e636-4d39-ab0c-cd237b3189d1\nNetwork: Up: 0B Down: 0B\nJobs completed: 5. Time elapsed: 0.1s.\nBUILD SUCCEEDED\nRunning ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based\n\nWaiting 20 for the trace tester to start up...\n----------------------------------------\nCollecting gpu trace on application with\n pid = 3181985,\n job_id = dacluster/dauser/daname/2,\n collect iters = true\n----------------------------------------\nI0819 12:40:33.999593 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity)\nI0819 12:40:33.999762 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es)\nTrace Urls: [PidTraceUrlPair { pid: 3181985, trace_url: \"https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces\" }]\n\nTrace tester is running in background, please be patient...\nTrace tester application stdout:\n ---\n\n ---\nTrace tester application stderr:\n ---\nI0819 12:40:16.687461 3181985 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.\nINFO:2024-08-19 12:40:16 3181985:3181985 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000\nINFO:2024-08-19 12:40:16 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:40:18 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:40:18 3181985:3181985 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1\nINFO:2024-08-19 12:40:36 3181985:3182373 ConfigLoader.cpp:271] Received config from dyno:\n\n ACTIVITIES_COMPRESSION_ALGORITHM=GZIP\n REQUEST_GROUP_TRACE_ID=test_group_trace_id\n ACTIVITIES_DURATION_MSECS=500\n PROFILE_REPORT_INPUT_SHAPES=true\n PROFILE_PROFILE_MEMORY=false\n PROFILE_WITH_STACK=false\n PROFILE_WITH_FLOPS=false\n PROFILE_WITH_MODULES=false\n ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/\n REQUEST_TRACE_ID=17968765238600337472\n\nINFO:2024-08-19 12:40:37 3181985:3183119 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096443817688761\nINFO:2024-08-19 12:40:37 3181985:3183119 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3181985.json\n Log file: /tmp/libkineto_activities_3181985.json\n Trace start time: 2024-08-19 12:40:43 Trace duration: 500ms\n Warmup duration: 5s\n Max GPU buffer size: 128MB\n Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync\n Manifold bucket: gpu_traces\n Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json\n Trace compression enabled: 1\n TTL in seconds: 31536000 (365 days)\nINFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)\nINFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.\nINFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.\nINFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s\nINFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s\nW0819 12:40:41.175002 3181985 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())\nSTAGE:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up\nINFO:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1194] Tracing started\nINFO:2024-08-19 12:40:44 3181985:3183119 CuptiActivityProfiler.cpp:1219] Tracing complete.\nSTAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection\nINFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers\nINFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:307] Processed 942895 GPU records (44635768 bytes)\nINFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 100281, Blocklisted runtime = 505241, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0\nINFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1269] Traces Recorded:\nINFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations\nINFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3181985.json\nINFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3181985.json\nSTAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing\nI0819 12:40:46.369321 3183880 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)\nW0819 12:40:49.872421 3184196 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier\nE0819 12:40:49.927807 3184196 api.cpp:491] The specified logical config name() is not valid\nW0819 12:40:49.927846 3184196 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''\nI0819 12:40:49.927968 3184196 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'\nINFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz\nINFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:\nINFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces\nINFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 5519986 us\nSTAGE:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push\nWARNING:2024-08-19 12:41:51 3181985:3182373 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client\n\n ---\nBuilding trace tester\nRunning /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester\nBuck UI: https://www.internalfb.com/buck2/40dd7a0b-6f22-4732-89ca-84d22a4e62f5\nNetwork: Up: 0B Down: 0B\nJobs completed: 5. Time elapsed: 0.2s.\nBUILD SUCCEEDED\nRunning ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based\n\nWaiting 20 for the trace tester to start up...\n----------------------------------------\nCollecting gpu trace on application with\n pid = 3190209,\n job_id = n/a,\n collect iters = true\n----------------------------------------\nI0819 12:42:13.637505 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity)\nI0819 12:42:13.637663 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es)\nTrace Urls: [PidTraceUrlPair { pid: 3190209, trace_url: \"https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces\" }]\n\nTrace tester is running in background, please be patient...\nTrace tester application stdout:\n ---\n\n ---\nTrace tester application stderr:\n ---\nI0819 12:41:56.295006 3190209 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.\nINFO:2024-08-19 12:41:56 3190209:3190209 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000\nINFO:2024-08-19 12:41:56 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:41:57 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:41:57 3190209:3190209 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1\nINFO:2024-08-19 12:42:16 3190209:3190382 ConfigLoader.cpp:271] Received config from dyno:\n\n ACTIVITIES_COMPRESSION_ALGORITHM=GZIP\n REQUEST_GROUP_TRACE_ID=test_group_trace_id\n ACTIVITIES_DURATION_MSECS=500\n PROFILE_REPORT_INPUT_SHAPES=true\n PROFILE_PROFILE_MEMORY=false\n PROFILE_WITH_STACK=false\n PROFILE_WITH_FLOPS=false\n PROFILE_WITH_MODULES=false\n ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096533/localhost/\n REQUEST_TRACE_ID=3582354066664069411\n\nINFO:2024-08-19 12:42:17 3190209:3191748 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096543391277741\nINFO:2024-08-19 12:42:17 3190209:3191748 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3190209.json\n Log file: /tmp/libkineto_activities_3190209.json\n Trace start time: 2024-08-19 12:42:23 Trace duration: 500ms\n Warmup duration: 5s\n Max GPU buffer size: 128MB\n Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync\n Manifold bucket: gpu_traces\n Manifold object: tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json\n Trace compression enabled: 1\n TTL in seconds: 31536000 (365 days)\nINFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)\nINFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.\nINFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.\nINFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s\nINFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s\nW0819 12:42:19.165912 3190209 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())\nSTAGE:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up\nINFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1194] Tracing started\nINFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1219] Tracing complete.\nSTAGE:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection\nINFO:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers\nINFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:307] Processed 981694 GPU records (47235736 bytes)\nINFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 109007, Blocklisted runtime = 543555, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0\nINFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1269] Traces Recorded:\nINFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations\nINFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3190209.json\nINFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3190209.json\nSTAGE:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing\nI0819 12:42:25.831236 3192298 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)\nW0819 12:42:29.783102 3192507 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier\nE0819 12:42:29.839179 3192507 api.cpp:491] The specified logical config name() is not valid\nW0819 12:42:29.839231 3192507 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''\nI0819 12:42:29.839358 3192507 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'\nINFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz\nINFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:\nINFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces\nINFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6102519 us\nSTAGE:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push\n\n ---\nBuilding trace tester\nRunning /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester\nBuck UI: https://www.internalfb.com/buck2/b5cfc916-b247-432d-98a2-7b289b5552d7\nNetwork: Up: 0B Down: 0B\nJobs completed: 5. Time elapsed: 0.2s.\nBUILD SUCCEEDED\nRunning ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based\n\nWaiting 20 for the trace tester to start up...\n----------------------------------------\nCollecting gpu trace on application with\n pid = n/a,\n job_id = n/a,\n collect iters = true\n----------------------------------------\nI0819 12:43:48.965962 3167910 OnDemandTracingCommon.cpp:99] Found 2 matching PIDs (Busy: 0 activity)\nI0819 12:43:48.966220 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 2 process(es)\nTrace Urls: [PidTraceUrlPair { pid: 3196469, trace_url: \"https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces\" }, PidTraceUrlPair { pid: 3190209, trace_url: \"https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces\" }]\n\nTrace tester is running in background, please be patient...\nTrace tester application stdout:\n ---\n\n ---\nTrace tester application stderr:\n ---\nI0819 12:43:31.657953 3196469 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.\nINFO:2024-08-19 12:43:31 3196469:3196469 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000\nINFO:2024-08-19 12:43:31 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:43:32 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0\nINFO:2024-08-19 12:43:33 3196469:3196469 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1\nINFO:2024-08-19 12:43:51 3196469:3196920 ConfigLoader.cpp:271] Received config from dyno:\n\n ACTIVITIES_COMPRESSION_ALGORITHM=GZIP\n REQUEST_GROUP_TRACE_ID=test_group_trace_id\n ACTIVITIES_DURATION_MSECS=500\n PROFILE_REPORT_INPUT_SHAPES=true\n PROFILE_PROFILE_MEMORY=false\n PROFILE_WITH_STACK=false\n PROFILE_WITH_FLOPS=false\n PROFILE_WITH_MODULES=false\n ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096628/localhost/\n REQUEST_TRACE_ID=6007547554162055103\n\nINFO:2024-08-19 12:43:52 3196469:3198770 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096638759081016\nINFO:2024-08-19 12:43:52 3196469:3198770 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3196469.json\n Log file: /tmp/libkineto_activities_3196469.json\n Trace start time: 2024-08-19 12:43:58 Trace duration: 500ms\n Warmup duration: 5s\n Max GPU buffer size: 128MB\n Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync\n Manifold bucket: gpu_traces\n Manifold object: tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json\n Trace compression enabled: 1\n TTL in seconds: 31536000 (365 days)\nINFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)\nINFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.\nINFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.\nINFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.\nINFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s\nINFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s\nW0819 12:43:58.627271 3196469 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())\nSTAGE:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up\nINFO:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1194] Tracing started\nINFO:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1219] Tracing complete.\nSTAGE:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection\nINFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers\nINFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:307] Processed 816296 GPU records (38657456 bytes)\nINFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 86644, Blocklisted runtime = 437744, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0\nINFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1269] Traces Recorded:\nINFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations\nINFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3196469.json\nINFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3196469.json\nSTAGE:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing\nI0819 12:44:01.344978 3199372 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)\nW0819 12:44:04.267980 3199473 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier\nE0819 12:44:04.327500 3199473 api.cpp:491] The specified logical config name() is not valid\nW0819 12:44:04.327549 3199473 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''\nI0819 12:44:04.327661 3199473 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'\nINFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz\nINFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:\nINFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces\nINFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 4973015 us\nSTAGE:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push\n\n ---\n```\n\nReviewed By: briancoutinho\n\nDifferential Revision: D61478101\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: d122f1098301d3acddefed4aa08bc6863358a8cf","shortMessageHtmlLink":"Auto-init when CUDA_INJECTION64_PATH=none is set (#979)"}},{"before":"d9753139d181b9ff42872465aac0e5d3018be415","after":"7d5e58feb39403253b43452a9cdf6fb073cb59b9","ref":"refs/heads/main","pushedAt":"2024-08-16T21:26:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Avoid marking every profile loop stop as Collection stage, use data available to mark errored stages. (#977)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/977\n\nWe already have the data collected to know if the collection was stopped due to `collectionDone` or `stopCollection`, the later is only set when CUPTI abruptly stops in events like not finding buffers.\n\nWe infact also set this in the itnernal Error Counters, so leverage that functionality within UST logging as well to denote a terminal stage within Kineto.\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D61226939\n\nfbshipit-source-id: a4d5fa525d4457d44f0b959e4761b82de160152c","shortMessageHtmlLink":"Avoid marking every profile loop stop as Collection stage, use data a…"}},{"before":"0cded49b0cce22f7d777861db5c937da7f3e22db","after":"d9753139d181b9ff42872465aac0e5d3018be415","ref":"refs/heads/main","pushedAt":"2024-08-07T18:07:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add API for Dynamic Activity Toggling [1/n] (#972)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/972\n\nDuring PT2 there are many GPU/CPU events that are unneccessary to profile in between a given step. To remedy this, we can add an API that takes in a list of activities and an arg to toggle said activies or not. For this diff we are just adding the Kineto side to turn on/off the \"CUDA\" events which includes AMD/Roctracer. A follow up will be added for the generic profiler side. Subsequent diffs will be added for CPU toggling and e2e testing.\n\nDifferential Revision: D60542040\n\nfbshipit-source-id: 2608ec912812c9004cb87371bd8fca8145a95621","shortMessageHtmlLink":"Add API for Dynamic Activity Toggling [1/n] (#972)"}},{"before":"1bb9b7632d129c0ee2eb47501215bcc42a6445ea","after":"0cded49b0cce22f7d777861db5c937da7f3e22db","ref":"refs/heads/main","pushedAt":"2024-08-06T16:38:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Introduce XPU profiler by following kineto plugin design (#961)\n\nSummary:\nAs XPU became a PyTorch built-in device, the profiler support is indispensable part of functionality completeness. In this PR, the XPU profiler is introduced by following kineto plugin design under libkineto/src/plugin/xpupti. The XPU profiler plugin is built on the foundation of intel PTI toolkit (https://github.com/intel/pti-gpu), and underlying SYCL runtime. The LIBKINETO_NOXPUPTI option is added to enable or disable the XPU profiler plugin during kineto build stage.\n\nCC: aaronenyeshi briancoutinho davidberard98 sraikund16\n\nPull Request resolved: https://github.com/pytorch/kineto/pull/961\n\nReviewed By: xuzhao9\n\nDifferential Revision: D60830913\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: a24444e1ab1ed074bfcf5a9012076fa7c193b178","shortMessageHtmlLink":"Introduce XPU profiler by following kineto plugin design (#961)"}},{"before":"da2f2682cabaf95d601fa2a9b7e0979f84fe7667","after":"1bb9b7632d129c0ee2eb47501215bcc42a6445ea","ref":"refs/heads/main","pushedAt":"2024-08-02T20:56:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Move group trace id and trace id to OSS kineto Config (#970)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/970\n\nPreviously the `trace_id` and `group_trace_id` fields were only parsed for `FBConfig`. There are OSS use-cases which would benefit from using these fields, so this diff moves the parsing of these fields to `Config`\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D60256788\n\nfbshipit-source-id: 6e9ecccd5f888b704ba30688dff4c41014959347","shortMessageHtmlLink":"Move group trace id and trace id to OSS kineto Config (#970)"}},{"before":"ac37db4e279e49222ea2d90460bfd8938292c998","after":"da2f2682cabaf95d601fa2a9b7e0979f84fe7667","ref":"refs/heads/main","pushedAt":"2024-08-01T19:43:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add Logging for Empty Traces (#968)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/968\n\nRecently, we have had users seen empty traces when the system is idle leading to confusion as to whether it was caused by a bug in kineto formatting or not. This diff adds further logging to ensure that the user is aware that kineto finds no valid trace events and is expecting an empty trace.\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D60311331\n\nfbshipit-source-id: 1836d37c622a85f17c08d6337480cb5bc49eca70","shortMessageHtmlLink":"Add Logging for Empty Traces (#968)"}},{"before":"18606c9835c2e063e9884b50e6c4c28b0c5627f1","after":"ac37db4e279e49222ea2d90460bfd8938292c998","ref":"refs/heads/main","pushedAt":"2024-07-31T20:46:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add perfetto trace analysis benchmark (#969)\n\nSummary:\nAdd a benchmark for trace analysis tasks on backends like Perfetto.\n\nStep 1: install the benchmark (this will download and decompress sample trace from Amazon S3)\n\n```\n$ python -m benchmarks.perfetto.install\n\nChecking out https://ossci-datasets.s3.amazonaws.com/torchbench/traces/torchbench_traces.tar.gz to /Users/xzhao9/git/kineto/benchmarks/trace_analysis/.data/torchbench_traces.tar.gz\ndecompressing input tarball: /Users/xzhao9/git/kineto/benchmarks/trace_analysis/.data/torchbench_traces.tar.gz...OK\nRequirement already satisfied: perfetto in /Users/xzhao9/miniconda3/envs/test-numpy/lib/python3.11/site-packages (from -r /Users/xzhao9/git/kineto/benchmarks/trace_analysis/requirements.txt (line 1)) (0.7.0)\nRequirement already satisfied: tabulate in /Users/xzhao9/miniconda3/envs/test-numpy/lib/python3.11/site-packages (from -r /Users/xzhao9/git/kineto/benchmarks/trace_analysis/requirements.txt (line 2)) (0.9.0)\nRequirement already satisfied: protobuf in /Users/xzhao9/miniconda3/envs/test-numpy/lib/python3.11/site-packages (from perfetto->-r /Users/xzhao9/git/kineto/benchmarks/trace_analysis/requirements.txt (line 1)) (4.25.3)\n```\n\nStep 2: run the benchmark\n\n```\n$ python -m benchmarks.perfetto.run\n\n input-task perfetto-latency\n---------------------------------------------- ------------------\n torchbench_resnet50_3080ti-load 8.53069\ntorchbench_resnet50_3080ti-search_gemm_kernels 0.067583\n torchbench_resnet50_3080ti-select_kernels 0.000549563\n torchbench_resnet50_3080ti-group_kernels 0.0145147\n```\n\nRight now, only latency metric is available. We could add other metrics like memory footprint later.\n\nPull Request resolved: https://github.com/pytorch/kineto/pull/969\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D60466932\n\nPulled By: xuzhao9\n\nfbshipit-source-id: 075476fc6b9b93f94d2d5d44183324360fdbd558","shortMessageHtmlLink":"Add perfetto trace analysis benchmark (#969)"}},{"before":"188c5f55562fae85dbff3d00017c850c2e2c944f","after":"18606c9835c2e063e9884b50e6c4c28b0c5627f1","ref":"refs/heads/main","pushedAt":"2024-07-31T15:21:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Fix Kineto Stress Test (#971)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/971\n\nThe Kineto Stress test was failing for two reasons:\n\n1) In the test itself, we did not set the converter for the TSC timestamp, making all values out of range\n\n2) In Kineto itself there was a bug regarding child profilers. When these profilers are set their spans are set to all 0. However, when we set the record start in Kineto we use the start span of the CPU trace to get the start of the trace itself. In the future we should probably decouple these things but for now I added an edge case to ensure that the profile is not malformed.\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D60415474\n\nfbshipit-source-id: e7b268aa0d41532448a28e1c1b4fe2ba81fda8da","shortMessageHtmlLink":"Fix Kineto Stress Test (#971)"}},{"before":"c2bc75243981c2c8793562502f9bbd3f0aeeb2ba","after":"188c5f55562fae85dbff3d00017c850c2e2c944f","ref":"refs/heads/main","pushedAt":"2024-07-29T15:30:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"fix static variable client access vialotion in DamonConfigLoader (#965)\n\nSummary:\nAs discussed in pytorch issue [#129626](https://github.com/pytorch/pytorch/issues/129626) , `updateThread` of `Config` will keep on accessing `client` after main thread quits. But when main thread quits, `client` will be destructed. Functions in `updateThread` may use a dangling pointer of `client`, which will cause invalid memory access, and finally, `Segment fault` occurs, a core file will be generated, which doesn't behave as expected.\n\nPull Request resolved: https://github.com/pytorch/kineto/pull/965\n\nReviewed By: xuzhao9\n\nDifferential Revision: D60291572\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: d97291aa825deeb672eb8e2d8385633c2838396d","shortMessageHtmlLink":"fix static variable client access vialotion in DamonConfigLoader (#965)"}},{"before":"eb34f147f2af821da931c18457c26b076c8491dd","after":"c2bc75243981c2c8793562502f9bbd3f0aeeb2ba","ref":"refs/heads/main","pushedAt":"2024-07-23T01:23:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"populate src/dst rank to GPU kernel (#963)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/963\n\nSplit from D59794535 V2- populate src/dst rank for p2p kernel in kineto\n\nReviewed By: sraikund16\n\nDifferential Revision: D59952097\n\nfbshipit-source-id: cbe1587d90ae21bb6224915cbbd58f71457e9d09","shortMessageHtmlLink":"populate src/dst rank to GPU kernel (#963)"}},{"before":"eeb4e9b44da82d09709c482dc2cde40f973cc0a1","after":"eb34f147f2af821da931c18457c26b076c8491dd","ref":"refs/heads/main","pushedAt":"2024-07-12T16:29:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Update libfmt to 11-0-0. Closes #958. (#959)\n\nSummary:\nFixes https://github.com/pytorch/kineto/issues/958\n\nPull Request resolved: https://github.com/pytorch/kineto/pull/959\n\nReviewed By: davidberard98, yoyoyocmu\n\nDifferential Revision: D59637583\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: 48def339ed3097ecb4746d7870203040ac535858","shortMessageHtmlLink":"Update libfmt to 11-0-0. Closes #958. (#959)"}},{"before":"1173979024f2b1c109e35f0719b7bb452082d18e","after":"eeb4e9b44da82d09709c482dc2cde40f973cc0a1","ref":"refs/heads/main","pushedAt":"2024-07-09T19:57:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Del un KINETO_NAMESPACE @ kineto/libkineto/src/IpcFabricConfigClient.h:26\n\nSummary:\nRemoves a `using namespace` from the global namespace in pursuit of enabling `-Wheader-hygiene`.\nQualifies instances that relied on the `using namespace`.\n\nReviewed By: palmje\n\nDifferential Revision: D59178186\n\nfbshipit-source-id: a492680f3038e58a683df3a816ec0852c3242b5d","shortMessageHtmlLink":"Del un KINETO_NAMESPACE @ kineto/libkineto/src/IpcFabricConfigClient.…"}},{"before":"9d33b110176854afb0649586674ca435b9d44f1d","after":"1173979024f2b1c109e35f0719b7bb452082d18e","ref":"refs/heads/main","pushedAt":"2024-07-03T00:55:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add Dist Info for On-Demand NCCL Traces (#956)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/956\n\nCurrently, auto-trace uses pytorches c10d.distributed library to set up the \"distributedInfo\" field in the chrome tracing. This makes sense as all the process groups are initialized and maintained in python. Because of this, however, on-demand is unable to retrieve this information directly as it has no access to the python side of a graph. However, we can glean the process group information based on the information that is added in to the dispatcher.\n\nIn this diff, we keep a global to enumerate all the nccl information via the args in the collectives as we print them out to the trace. At the end of the trace we output the distributed information. In order to prevent double counting for auto-trace, we skip if there was previously distributedInfo outputted to the trace.\n\nFixes https://github.com/pytorch/kineto/issues/885\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D58736045\n\nfbshipit-source-id: 7e1a5e45398a63066f6f8f5b2e7541d67a6463a4","shortMessageHtmlLink":"Add Dist Info for On-Demand NCCL Traces (#956)"}},{"before":"817889e836d6c1234260c6581917f43e8a940eb0","after":"9d33b110176854afb0649586674ca435b9d44f1d","ref":"refs/heads/main","pushedAt":"2024-07-01T22:32:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Remove all instances of TMP_USE_TSC_AS_TIMESTAMP (#957)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/957\n\nNow that D56584521 is in, we can remove all insteances of TMP_USE_TSC_AS_TIMESTAMP\n\nReviewed By: swolchok\n\nDifferential Revision: D59132726\n\nfbshipit-source-id: 4edfaff832aba9a4ca1779e31df4cf7041339c00","shortMessageHtmlLink":"Remove all instances of TMP_USE_TSC_AS_TIMESTAMP (#957)"}},{"before":"4e919b2d8877e0fcc435255bec912b6584767a02","after":"817889e836d6c1234260c6581917f43e8a940eb0","ref":"refs/heads/main","pushedAt":"2024-06-24T19:26:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Remove global using namespace (#955)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/955\n\nFixing this:\n\n```\nkineto/libkineto/include/ThreadUtil.h:33:17: error: using namespace directive in global context in header [-Werror,-Wheader-hygiene]\n 33 | using namespace libkineto;\n | ^\n1 error generated.\n```\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D58904139\n\nfbshipit-source-id: 425e9d8a94e9015dbedbc0c111204c047e815451","shortMessageHtmlLink":"Remove global using namespace (#955)"}},{"before":"0063575f9c7c5a695c622e6e331a85b143504d00","after":"4e919b2d8877e0fcc435255bec912b6584767a02","ref":"refs/heads/main","pushedAt":"2024-06-21T01:51:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add API for JK Inside Config Validation (#954)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/954\n\nRight now we initialize all the configuration children when config is initialized. This is fine for the most part except for the fact that we set the selected activities only after the config is initialized. This makes our JustKnobs calls always get ignored. We can fix this by calling an API to check the Knobs after the parent is validated (we set activities before validation). The API (setActivityDependentConfig) in AbstractConfig will iterate through all child configs and call the child-defined version of setActivityDependentConfig. In FBConfig, will contain the JK reading.\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D58759304\n\nfbshipit-source-id: ebd0d978627299b5ddecba58fc934e9342b31da3","shortMessageHtmlLink":"Add API for JK Inside Config Validation (#954)"}},{"before":"36c2a78237b29a1d7eed5d8a14bd5ab004dba701","after":"0063575f9c7c5a695c622e6e331a85b143504d00","ref":"refs/heads/main","pushedAt":"2024-06-18T15:51:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add RoctracerActivityProfilerTest Unit Tests (#950)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/950\n\nAdding 5 unit tests for the CuptiActivityProfiler and its interactions with RoctracerActivityApi. Mock the RoctracerLogger and the RoctracerActivityApi and pass them into the CuptiActivityProfiler orchestrator, then verify the trace files produced.\n\n- **SyncTrace:** Mocks 5 CPU ops, 3 CPU Kernel Runtime Launches, 2 CPU Memcpy Runtime events, 3 GPU Kernels, 2 GPU Memcpy events. Checks that the trace output matches what was passed in, and names are correct after parsing.\n- **GpuNCCLCollectiveTest:** Similar to CUPTI, check that NCCL metadata is properly passed into the CPU and GPU ops.\n- **GpuUserAnnotationTest:** Check that GPU user annotations added via CorrelationDomain1 works properly, and the annotations are in the final trace.\n- **SubActivityProfilers:** Check subactivityprofiler children will continue to work, such as glow runtime mock activity profiler.\n- **JsonGPUIDSortTest:** Check that JSON file contains expected number of process_labels and process_sort_index , so that GPU rows in Chrome traces will always be sorted after CPU rows.\n\nTest Plan: Ran locally on AMDGPU.\n\nReviewed By: houseroad\n\nDifferential Revision: D58554825\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: 32177e7d4db3aa6c6e1b676e5500358a302d7ee7","shortMessageHtmlLink":"Add RoctracerActivityProfilerTest Unit Tests (#950)"}},{"before":"c0c0dbdb378f2e515ecb3f92139aaabb497c034d","after":"36c2a78237b29a1d7eed5d8a14bd5ab004dba701","ref":"refs/heads/main","pushedAt":"2024-06-17T20:06:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add JustKnobs to TSC CUPTI Callback (#947)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/947\n\nAdd JustKnobs so that we can turn TSC CUPTI Callback on and off. Since we need to meet open source compliance, we need to first add a default value of true in the Config class. In the FBConfig class we then add a method that checks the JK and overrides the default value of true. We also add a static variable to the CuptiActivityProfiler which will be overriden by the Config value upon initialization. Once this is set, the profiler as well as all headers will use the flag that was set by the JK. Using this flag we can essentially use the TSC timestamp or system clock via killswitch. Since the JK can run asynchronously across many initializations, wrap it in a try/catch.\n\nOn top of this change, ApproximateClock is moved from src/ to include/. This is because it is a forward facing header so it makes sense to expose it as a public header. Added also to the bazel file as a public header.\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D58485370\n\nfbshipit-source-id: f69a331d11940767090718207913ae963516bbb0","shortMessageHtmlLink":"Add JustKnobs to TSC CUPTI Callback (#947)"}},{"before":"db1f25ec9bb2e2610caa630b3d89f3fe6c4d6a1d","after":"c0c0dbdb378f2e515ecb3f92139aaabb497c034d","ref":"refs/heads/main","pushedAt":"2024-06-14T17:10:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Skip CUPTIRangeProfilerApiTest and CuptiRangeProfilerTest due to SEGFAULT in CUDA 12.4 (#951)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/951\n\nGithub Actions CI on NVIDIA A10G instances are failing for these tests since the upgrade to 12.4:\n\n19 - CuptiRangeProfilerApiTest.asyncLaunchUserRange (SEGFAULT)\n20 - CuptiRangeProfilerApiTest.asyncLaunchAutoRange (SEGFAULT)\n24 - CuptiRangeProfilerTest.UserRangeTest (SEGFAULT)\n25 - CuptiRangeProfilerTest.AutoRangeTest (SEGFAULT)\n\nWe are tracking the issue here: https://github.com/pytorch/kineto/issues/949\n\nTest Plan: CI\n\nReviewed By: sraikund16\n\nDifferential Revision: D58588836\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: 4b1c02d18e235d1c8a4f7c5162d59950cfa89dcb","shortMessageHtmlLink":"Skip CUPTIRangeProfilerApiTest and CuptiRangeProfilerTest due to SEGF…"}},{"before":"a6f80204f246f3f19c727b50cba5a9d4511c86b7","after":"db1f25ec9bb2e2610caa630b3d89f3fe6c4d6a1d","ref":"refs/heads/main","pushedAt":"2024-06-12T17:33:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Move root level TARGETS into subdirectories (#946)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/946\n\nMoving unit tests, samples, and stress test binaries into their respective subdirectories to make libkineto/TARGETS cleaner.\n\nReviewed By: davidberard98\n\nDifferential Revision: D58441051\n\nPulled By: aaronenyeshi\n\nfbshipit-source-id: 0f7f62e6d14fc5f6d4f6b5b4e57cf0760046602e","shortMessageHtmlLink":"Move root level TARGETS into subdirectories (#946)"}},{"before":"8681ff11e1fa54da39023076c5c43eddd87b7a8a","after":"a6f80204f246f3f19c727b50cba5a9d4511c86b7","ref":"refs/heads/main","pushedAt":"2024-06-10T18:35:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Reintroduce CUPTI TSC Timestamp Compilation Flag to OSS (#945)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/kineto/pull/945\n\nNeed to add flag back in for OSS to be able to use TSC timestamp for TSC events once we merge D56584521\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D58263179\n\nfbshipit-source-id: 8539d358ddcea3f6a923d5f89dc8a62622363aee","shortMessageHtmlLink":"Reintroduce CUPTI TSC Timestamp Compilation Flag to OSS (#945)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEtfapPwA","startCursor":null,"endCursor":null}},"title":"Activity · pytorch/kineto"}