| Age | Commit message (Collapse) | Author |
|
Fix events seemingly broken apart at a comma.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Support negative exponents when parsing from a json metric string by
making the numbers after the 'e' optional in the 'Event' insertion fix
up.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It can be convenient to have unnamed metric groups for the sake of
organizing other metrics and metric groups. An unspecified name
shouldn't contribute to the MetricGroup json value, so don't record
it.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a function to recursively generate metric group descriptions.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Previous metric constraints were binary, either none or don't group
when the NMI watchdog is present. Update to match the definitions in
'enum metric_event_groups' in pmu-events.h.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Allow multiple metricgroups.json files by handling any file ending
with metricgroups.json as a metricgroups file.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This happens on hybrid machine metrics. Be tolerant and don't cause
the ilist application to crash with an exception.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Ensure the metric_leader is copied and set up correctly. In
compute_metric determine the correct metric_leader event to match the
requested CPU. Fixes the handling of metrics particularly on hybrid
machines.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a test case for the python interpreter like below so that we can
make sure it won't break again. To validate the effect of build-ID
generation, it adds and removes the JIT'ed DSOs to/from the build-ID
cache for the test.
$ perf test -vv jitdump
84: python profiling with jitdump:
--- start ---
test child forked, pid 214316
Run python with -Xperf_jit
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.180 MB /tmp/__perf_test.perf.data.XbqZNm (140 samples) ]
Generate JIT-ed DSOs using perf inject
Add JIT-ed DSOs to the build-ID cache
Check the symbol containing the script name
Found 108 matching lines
Remove JIT-ed DSOs from the build-ID cache
---- end(0) ----
84: python profiling with jitdump : Ok
Cc: Pablo Galindo <pablogsal@gmail.com>
Link: https://docs.python.org/3/howto/perf_profiling.html#how-to-work-without-frame-pointers
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It was reported that python backtrace with JIT dump was broken after the
change to built-in SHA-1 implementation. It seems python generates the
same JIT code for each function. They will become separate DSOs but the
contents are the same. Only difference is in the symbol name.
But this caused a problem that every JIT'ed DSOs will have the same
build-ID which makes perf confused. And it resulted in no python
symbols (from JIT) in the output.
Looking back at the original code before the conversion, it used the
load_addr as well as the code section to distinguish each DSO. But it'd
be better to use contents of symtab and strtab instead as it aligns with
some linker behaviors.
This patch adds a buffer to save all the contents in a single place for
SHA-1 calculation. Probably we need to add sha1_update() or similar to
update the existing hash value with different contents and use it here.
But it's out of scope for this change and I'd like something that can be
backported to the stable trees easily.
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Pablo Galindo <pablogsal@gmail.com>
Cc: Fangrui Song <maskray@sourceware.org>
Link: https://github.com/python/cpython/issues/139544
Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The mem-loads-aux event exists on hybrid systems but the "cpu" PMU
does not. This causes an event parsing error which erroneously makes
the test look like it is failing. Avoid naming the PMU to avoid
this. Rather than cleaning up perf.data in the directory the test is
run, explicitly send the 'perf record' output to /dev/null and avoid
any cleanup scripts.
Fixes: fc9c17b22352 ("perf test: Add a perf event fallback test")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
So that it can show the correct encoding info in the JSON output.
$ perf list -j hw
[
{
"Unit": "cpu",
"Topic": "legacy hardware",
"EventName": "branch-instructions",
"EventType": "Kernel PMU event",
"BriefDescription": "Retired branch instructions [This event is an alias of branches]",
"Encoding": "cpu/event=0xc4/"
},
...
Reviewed-by: Ian Rogers <irogers@google.com>
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- klp-build livepatch module generation (Josh Poimboeuf)
Introduce new objtool features and a klp-build script to generate
livepatch modules using a source .patch as input.
This builds on concepts from the longstanding out-of-tree kpatch
project which began in 2012 and has been used for many years to
generate livepatch modules for production kernels. However, this is a
complete rewrite which incorporates hard-earned lessons from 12+
years of maintaining kpatch.
Key improvements compared to kpatch-build:
- Integrated with objtool: Leverages objtool's existing control-flow
graph analysis to help detect changed functions.
- Works on vmlinux.o: Supports late-linked objects, making it
compatible with LTO, IBT, and similar.
- Simplified code base: ~3k fewer lines of code.
- Upstream: No more out-of-tree #ifdef hacks, far less cruft.
- Cleaner internals: Vastly simplified logic for
symbol/section/reloc inclusion and special section extraction.
- Robust __LINE__ macro handling: Avoids false positive binary diffs
caused by the __LINE__ macro by introducing a fix-patch-lines
script which injects #line directives into the source .patch to
preserve the original line numbers at compile time.
- Disassemble code with libopcodes instead of running objdump
(Alexandre Chartre)
- Disassemble support (-d option to objtool) by Alexandre Chartre,
which supports the decoding of various Linux kernel code generation
specials such as alternatives:
17ef: sched_balance_find_dst_group+0x62f mov 0x34(%r9),%edx
17f3: sched_balance_find_dst_group+0x633 | <alternative.17f3> | X86_FEATURE_POPCNT
17f3: sched_balance_find_dst_group+0x633 | call 0x17f8 <__sw_hweight64> | popcnt %rdi,%rax
17f8: sched_balance_find_dst_group+0x638 cmp %eax,%edx
... jump table alternatives:
1895: sched_use_asym_prio+0x5 test $0x8,%ch
1898: sched_use_asym_prio+0x8 je 0x18a9 <sched_use_asym_prio+0x19>
189a: sched_use_asym_prio+0xa | <jump_table.189a> | JUMP
189a: sched_use_asym_prio+0xa | jmp 0x18ae <sched_use_asym_prio+0x1e> | nop2
189c: sched_use_asym_prio+0xc mov $0x1,%eax
18a1: sched_use_asym_prio+0x11 and $0x80,%ecx
... exception table alternatives:
native_read_msr:
5b80: native_read_msr+0x0 mov %edi,%ecx
5b82: native_read_msr+0x2 | <ex_table.5b82> | EXCEPTION
5b82: native_read_msr+0x2 | rdmsr | resume at 0x5b84 <native_read_msr+0x4>
5b84: native_read_msr+0x4 shl $0x20,%rdx
.... x86 feature flag decoding (also see the X86_FEATURE_POPCNT
example in sched_balance_find_dst_group() above):
2faaf: start_thread_common.constprop.0+0x1f jne 0x2fba4 <start_thread_common.constprop.0+0x114>
2fab5: start_thread_common.constprop.0+0x25 | <alternative.2fab5> | X86_FEATURE_ALWAYS | X86_BUG_NULL_SEG
2fab5: start_thread_common.constprop.0+0x25 | jmp 0x2faba <.altinstr_aux+0x2f4> | jmp 0x4b0 <start_thread_common.constprop.0+0x3f> | nop5
2faba: start_thread_common.constprop.0+0x2a mov $0x2b,%eax
... NOP sequence shortening:
1048e2: snapshot_write_finalize+0xc2 je 0x104917 <snapshot_write_finalize+0xf7>
1048e4: snapshot_write_finalize+0xc4 nop6
1048ea: snapshot_write_finalize+0xca nop11
1048f5: snapshot_write_finalize+0xd5 nop11
104900: snapshot_write_finalize+0xe0 mov %rax,%rcx
104903: snapshot_write_finalize+0xe3 mov 0x10(%rdx),%rax
... and much more.
- Function validation tracing support (Alexandre Chartre)
- Various -ffunction-sections fixes (Josh Poimboeuf)
- Clang AutoFDO (Automated Feedback-Directed Optimizations) support
(Josh Poimboeuf)
- Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo
Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra,
Thorsten Blum)
* tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
objtool: Fix segfault on unknown alternatives
objtool: Build with disassembly can fail when including bdf.h
objtool: Trim trailing NOPs in alternative
objtool: Add wide output for disassembly
objtool: Compact output for alternatives with one instruction
objtool: Improve naming of group alternatives
objtool: Add Function to get the name of a CPU feature
objtool: Provide access to feature and flags of group alternatives
objtool: Fix address references in alternatives
objtool: Disassemble jump table alternatives
objtool: Disassemble exception table alternatives
objtool: Print addresses with alternative instructions
objtool: Disassemble group alternatives
objtool: Print headers for alternatives
objtool: Preserve alternatives order
objtool: Add the --disas=<function-pattern> action
objtool: Do not validate IBT for .return_sites and .call_sites
objtool: Improve tracing of alternative instructions
objtool: Add functions to better name alternatives
objtool: Identify the different types of alternatives
...
|
|
Recent changes in the linux-next kernel will add new field for syscalls
to have contents in the userspace like below.
# cat /sys/kernel/tracing/events/syscalls/sys_enter_write/format
name: sys_enter_write
ID: 758
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:unsigned int fd; offset:16; size:8; signed:0;
field:const char * buf; offset:24; size:8; signed:0;
field:size_t count; offset:32; size:8; signed:0;
field:__data_loc char[] __buf_val; offset:40; size:4; signed:0;
print fmt: "fd: 0x%08lx, buf: 0x%08lx (%s), count: 0x%08lx", ((unsigned long)(REC->fd)),
((unsigned long)(REC->buf)), __print_dynamic_array(__buf_val, 1),
((unsigned long)(REC->count))
We have a different way to handle those arguments and this change
confuses perf trace then make some tests failing. Fix it by skipping
the new fields that have "__data_loc char[]" type.
Maybe we can switch to this instead of the BPF augmentation later.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Howard Chu <howardchu95@gmail.com>
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Simplify the build ID reading code by removing the non-blocking option.
Having to pass the correct option to this function was fragile and a
mistake would result in a hang, see the linked fix. Furthermore,
compressed files are always opened blocking anyway, ignoring the
non-blocking option.
We also don't expect to read build IDs from non-regular files. The only
hits to this function that are non-regular are devices that won't be elf
files with build IDs, for example "/dev/dri/renderD129".
Now instead of opening these as non-blocking and failing to read, we
skip them. Even if something like a pipe or character device did have a
build ID, I don't think it would have worked because you need to call
read() in a loop, check for -EAGAIN and handle timeouts to make
non-blocking reads work.
Link: https://lore.kernel.org/linux-perf-users/20251022-james-perf-fix-dso-block-v1-1-c4faab150546@linaro.org/
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
T-HEAD C920 has a V2 iteration, which supports Sscompmf. The V2
iteration supports the same perf events as V1.
Reuse T-HEAD c900-legacy JSON file for T-HEAD C920V2.
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Acked-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Remove duplicate check for PERF_PMU_TYPE_DRM_END in perf_pmu__kind.
Fixes: f0feb21e0a10 ("perf pmu: Add PMU kind to simplify differentiating")
Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Closes: https://lore.kernel.org/linux-perf-users/CA+G8Dh+wLx+FvjjoEkypqvXhbzWEQVpykovzrsHi2_eQjHkzQA@mail.gmail.com/
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes
so document them. Also document existing 'event_filter' bits that were
missing from the doc and the fact that latency values are stored in the
weight field.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf_event_attr has gained a new field, config4, so add support for it
extending the existing configN support.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Usage of strcpy() can lead to buffer overflows. Therefore, it has been
replaced with strncpy(). The output file path is provided as a parameter
and might be restricted by command-line by default. But this defensive
patch will prevent any potential overflow, making the code more robust
against future changes in input handling.
Testing:
- ran perf test from tools/perf and did not observe any regression with
the earlier code
Signed-off-by: Hrishikesh Suresh <hrishikesh123s@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Like regular output mode, it should honor command line arguments to
limit to a certain type of PMUs or events.
$ perf list -j hw
[
{
"Unit": "cpu",
"Topic": "legacy hardware",
"EventName": "branch-instructions",
"EventType": "Kernel PMU event",
"BriefDescription": "Retired branch instructions [This event is an alias of branches]",
"Encoding": "cpu/event=0xc4\n/"
},
{
"Unit": "cpu",
"Topic": "legacy hardware",
"EventName": "branch-misses",
"EventType": "Kernel PMU event",
"BriefDescription": "Mispredicted branch instructions",
"Encoding": "cpu/event=0xc5\n/"
},
...
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The JSON print state has only one different field (need_sep). Let's
add the default print state to the json state and use it. Then we can
use the 'ps' variable to update the state properly.
This is a preparation for the next commit.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When --unit option is used, pmu_glob is set to the argument. It should
match with event PMU and display the matching ones only. But it also
shows raw events and metrics after that.
$ perf list --unit tool
List of pre-defined events (to be used in -e or -M):
tool:
core_wide
[1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool]
duration_time
[Wall clock interval time in nanoseconds. Unit: tool]
has_pmem
[1 if persistent memory installed otherwise 0. Unit: tool]
num_cores
[Number of cores. A core consists of 1 or more thread,with each thread being associated with a logical Linux CPU. Unit: tool]
num_cpus
[Number of logical Linux CPUs. There may be multiple such CPUs on a core. Unit: tool]
...
rNNN [Raw event descriptor]
cpu/event=0..255,pc,edge,.../modifier [Raw event descriptor]
[(see 'man perf-list' or 'man perf-record' on how to encode it)]
breakpoint//modifier [Raw event descriptor]
cstate_core/event=0..0xffffffffffffffff/modifier [Raw event descriptor]
cstate_pkg/event=0..0xffffffffffffffff/modifier [Raw event descriptor]
drm_i915//modifier [Raw event descriptor]
hwmon_acpitz//modifier [Raw event descriptor]
hwmon_ac//modifier [Raw event descriptor]
hwmon_bat0//modifier [Raw event descriptor]
hwmon_coretemp//modifier [Raw event descriptor]
...
Metric Groups:
Backend: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
tma_core_bound
[This metric represents fraction of slots where Core non-memory issues were of a bottleneck]
tma_info_core_ilp
[Instruction-Level-Parallelism (average number of uops executed when there is execution) per thread (logical-processor)]
tma_info_memory_l2mpki
[L2 cache true misses per kilo instruction for retired demand loads]
...
This change makes it print the tool PMU events only.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Determine if a metric is default from `perf list --raw-dump $m` eg:
```
$ perf list --raw-dump l1_prefetch_miss_rate
Default4 l1_prefetch_miss_rate
```
If a metric has "not supported" or "no supported events" then ignore
these failures for default metrics. Tidy up the skip/fail messages in
the output to make them easier to spot/read.
```
$ perf list -vv "all metrics"
...
Testing llc_miss_rate
[Ignored llc_miss_rate] failed but as a Default metric this can be expected
Error: No supported events found. The LLC-loads event is not supported.
...
```
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Closes: https://lore.kernel.org/linux-perf-users/20251119104751.51960-1-tmricht@linux.ibm.com/
Reported-by: Namhyung Kim <namhyung@kernel.org>
Reported-by: James Clark <james.clark@linaro.org>
Closes: https://lore.kernel.org/lkml/aRi9xnwdLh3Dir9f@google.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.18-rc7).
No conflicts, adjacent changes:
tools/testing/selftests/net/af_unix/Makefile
e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.")
45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The IDs are associated with perf events and not applicable to non-perf
event PMUs. The failure to generate the ids was causing perf stat
record to fail.
```
$ perf stat record -a sleep 1
Performance counter stats for 'system wide':
47,941 context-switches # nan cs/sec cs_per_second
0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized
3,261 cpu-migrations # nan migrations/sec migrations_per_second
516 page-faults # nan faults/sec page_faults_per_second
7,525,483 cpu_core/branch-misses/ # 2.3 % branch_miss_rate
322,069,004 cpu_core/branches/ # nan M/sec branch_frequency
1,895,684,291 cpu_core/cpu-cycles/ # nan GHz cycles_frequency
2,789,777,426 cpu_core/instructions/ # 1.5 instructions insn_per_cycle
7,074,765 cpu_atom/branch-misses/ # 3.2 % branch_miss_rate (49.89%)
224,225,412 cpu_atom/branches/ # nan M/sec branch_frequency (50.29%)
2,061,679,981 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.33%)
2,011,242,533 cpu_atom/instructions/ # 1.0 instructions insn_per_cycle (50.33%)
TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation
# 28.3 % tma_frontend_bound
# 35.2 % tma_backend_bound
# 27.5 % tma_retiring
TopdownL1 (cpu_atom) # 36.8 % tma_backend_bound (59.65%)
# 22.8 % tma_frontend_bound (59.60%)
# 11.6 % tma_bad_speculation
# 28.8 % tma_retiring (59.59%)
1.006777519 seconds time elapsed
$ perf stat report
Performance counter stats for 'perf':
1,013,376,154 duration_time
<not counted> duration_time
<not counted> duration_time
<not counted> duration_time
<not counted> duration_time
<not counted> duration_time
47,941 context-switches
0.00 msec cpu-clock
3,261 cpu-migrations
516 page-faults
7,525,483 cpu_core/branch-misses/
322,069,814 cpu_core/branches/
322,069,004 cpu_core/branches/
1,895,684,291 cpu_core/cpu-cycles/
1,895,679,209 cpu_core/cpu-cycles/
2,789,777,426 cpu_core/instructions/
<not counted> cpu_core/cpu-cycles/
<not counted> cpu_core/stalled-cycles-frontend/
<not counted> cpu_core/cpu-cycles/
<not counted> cpu_core/stalled-cycles-backend/
<not counted> cpu_core/stalled-cycles-backend/
<not counted> cpu_core/instructions/
<not counted> cpu_core/stalled-cycles-frontend/
7,074,765 cpu_atom/branch-misses/ (49.89%)
221,679,088 cpu_atom/branches/ (49.89%)
224,225,412 cpu_atom/branches/ (50.29%)
2,061,679,981 cpu_atom/cpu-cycles/ (50.33%)
2,016,259,567 cpu_atom/cpu-cycles/ (50.33%)
2,011,242,533 cpu_atom/instructions/ (50.33%)
<not counted> cpu_atom/cpu-cycles/
<not counted> cpu_atom/stalled-cycles-frontend/
<not counted> cpu_atom/cpu-cycles/
<not counted> cpu_atom/stalled-cycles-backend/
<not counted> cpu_atom/stalled-cycles-backend/
<not counted> cpu_atom/instructions/
<not counted> cpu_atom/stalled-cycles-frontend/
17,145,113 cpu_core/INT_MISC.UOP_DROPPING/
10,594,226,100 cpu_core/TOPDOWN.SLOTS/
2,919,021,401 cpu_core/topdown-retiring/
943,101,838 cpu_core/topdown-bad-spec/
3,031,152,533 cpu_core/topdown-fe-bound/
3,739,756,791 cpu_core/topdown-be-bound/
1,909,501,648 cpu_atom/CPU_CLK_UNHALTED.CORE/ (60.04%)
3,516,608,359 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (59.65%)
2,179,403,876 cpu_atom/TOPDOWN_FE_BOUND.ALL/ (59.60%)
2,745,732,458 cpu_atom/TOPDOWN_RETIRING.ALL/ (59.59%)
1.006777519 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
```
Reported-by: James Clark <james.clark@linaro.org>
Closes: https://lore.kernel.org/lkml/ca0f0cd3-7335-48f9-8737-2f70a75b019a@linaro.org/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than perf_pmu__is_xxx calls, and a notion of kind so that a
single call can be used.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Writing currently fails on non-x86 and hybrid CPUs. Switch to the more
regular find_core_pmu that is normally used in this case. Tested on
hybrid alderlake system.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add additional test to the maps covering
maps__fixup_overlap_and_insert. Change the test suite to be for more
than just 1 test.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The case of __maps__fixup_overlap_and_insert where the "new" maps
covers existing mappings can create a use-after-free with reference
count checking enabled. The issue is that "pos" holds a map pointer
from maps_by_address that is put from maps_by_address but then used to
look for a map in maps_by_name (the compared map is now a
use-after-free). The issue stems from using maps__remove which redoes
some of the searches already done by __maps__fixup_overlap_and_insert,
so optimize the code (by avoiding repeated searches) and avoid the
use-after-free by inlining the appropriate removal code.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202511141407.f9edcfa6-lkp@intel.com
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When reading a metric like memory bandwidth on multiple sockets, the
additional sockets will be on CPUS > 0. Because of the affinity
reading, the counters are read on CPU 0 along with the time, then the
later sockets are read. This can lead to the later sockets having a
bandwidth larger than is possible for the period of time. To avoid
this move the reading of tool events to occur after all other events
are read.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Synthesize memory samples for SIMD operations (including Advanced SIMD,
SVE, and SME). To provide complete information, also generate data
source entries for SIMD operations.
Since memory operations are not limited to load and store, set
PERF_MEM_OP_STORE if the operation does not fall into these cases.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The other operations contain SME data processing, ASE (Advanced SIMD)
and floating-point operations. Expose these info in the records.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Report GCS related info in records.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Expose memset and memcpy related info in records.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
SVE / SME operations can be predicated or Gather load / scatter store,
save the relevant info into record.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Extended memory operations include atomic (AT), acquire/release (AR),
and exclusive (EXCL) operations. Save the relevant information
in the records.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Save MTE tag info in memory record.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Record register access info for load / store operations.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Introduce the ARM_SPE_OP_DP (data processing) macro as associated
information for SVE operations. For SVE register access, only
ARM_SPE_OP_SVE is set; for SVE data processing, both ARM_SPE_OP_SVE and
ARM_SPE_OP_DP are set together.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Consolidate operation types in a way:
(a) Extract the second-level types into separate enums.
(b) The second-level types for memory and SIMD operations are classified
by modules. E.g., an operation may relate to general register,
SIMD/FP, SVE, etc.
(c) The associated information tells details. E.g., an operation is
load or store, whether it is atomic operation, etc.
Start the enum items for the second-level types from 8 to accommodate
more entries within a 32-bit integer.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Remove unused SVE operation types. These operations will be reintroduced
in subsequent refactoring, but with a different format.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
For SME data processing, decode its Effective vector length or Tile Size
(ETS), and print out if a floating-point operation.
After:
. 00000000: 49 00 SME-OTHER ETS 1024 FP
. 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18
. 0000000b: 9a 00 00 LAT 0 XLAT
. 0000000e: 43 00 DATA-SOURCE 0
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a check for other operation, which prevents any incorrectly
classifying. Parse the ASE and FP fields.
After:
. 0000002f: 48 06 OTHER ASE FP INSN-OTHER
. 00000031: b2 08 80 48 01 08 00 ff ff VA 0xffff000801488008
. 0000003a: 9a 00 00 LAT 0 XLAT
. 0000003d: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rename the macro to SPE_OP_PKT_OTHER_SUBCLASS_SVE to unify naming.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Decode a load or store from a GCS operation and the associated "common"
field.
After:
. 00000000: 49 44 LD GCS COMM
. 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18
. 0000000b: 9a 00 00 LAT 0 XLAT
. 0000000e: 43 00 DATA-SOURCE 0
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rename extended subclass and SVE/SME register access subclass, so that
the naming can be consistent cross all sub classes.
Add an log "SVE-SME-REG" for the SVE/SME register access, this is easier
for parsing.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The operation subclass is extracted from bits [7..1] of the payload.
Since bit [0] is not parsed, there is no chance to match the memset type
(0x25). As a result, the memset payload is never parsed successfully.
Instead of extracting a unified bit field, change to extract the
specific bits for each operation subclass.
Fixes: 34fb60400e32 ("perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The user and system time events can record on different CPUs, but for
all other events a single CPU map of just CPU 0 makes sense. In
parse-events detect a tool PMU and then pass the perf_event_attr so
that the tool_pmu can return CPUs specific for the event. This avoids
a CPU map of all online CPUs being used for events like
duration_time. Avoiding this avoids the evlist CPUs containing CPUs
for which duration_time just gives 0. Minimizing the evlist CPUs can
remove unnecessary sched_setaffinity syscalls that delay metric
calculations.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
walltime_nsecs_stats is no longer used for counter values, move into
that stat_config where it controls certain things like noise
measurement.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|