| Age | Commit message (Collapse) | Author |
|
Add SRIOV check when setting VCN ring's supported reset mask.
Signed-off-by: Shikang Fan <shikang.fan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt
behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like
normal TT/doorbell/preempt memory and unconditionally accessed
ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a
NULL pointer dereference when computing PDE flags.
Fix this by checking that ttm is non-NULL before reading ttm->caching.
This prevents the crash for MMIO_REMAP and also makes the code more
defensive if other BOs ever come through without a ttm_tt.
Fixes: fb5a52dbe9fe ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement")
Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Tested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This fixes sparse mappings (aka. partially resident textures).
Check the correct flags.
Since a recent refactor, the code works with uAPI flags (for
mapping buffer objects), and not PTE (page table entry) flags.
Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Accoreding to CP updated to RS64 on gfx11,
WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW.
That packet is used for MCBP on F32 based system.
So it would lead to incorrect GRBM write and FW is not handling that
extra case correctly.
[How]
With gfx11 rs64 enabled, skip emit de meta data.
Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
During the suspend sequence VPE is already going to be power gated
as part of vpe_suspend(). It's unnecessary to call during calls to
amdgpu_device_set_pg_state().
It actually can expose a race condition with the firmware if s0i3
sequence starts as well. Drop these calls.
Cc: Peyton.Lee@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use %ptSp instead of open coded variants to print content of
struct timespec64 in human readable format.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251113150217.3030010-6-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
[Why]
Existing routine has two conversion sequence,
pbn_to_kbps and kbps_to_pbn with margin.
Non of those has without-margin calculation.
kbps_to_pbn with margin conversion includes
fec overhead which has already been included in
pbn_div calculation with 0.994 factor considered.
It is a double counted fec overhead factor that causes
potential bw loss.
[How]
Add without-margin calculation.
Fix fec overhead double counted issue.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3735
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e0dec00f3d05e8c0eceaaebfdca217f8d10d380c)
Cc: stable@vger.kernel.org
|
|
[Why]
On DCN20 & DCN30, the 6th DPP's & HUBP's are powered on permanently and
cannot be power gated. Thus, when dpp_reset() is invoked for the DPP5,
while it's still powered on, the cached cursor_state
(dpp_base->pos.cur0_ctl.bits.cur0_enable)
and the actual state (CUR0_ENABLE) bit are unsycned. This can cause a
double cursor in full screen with non-native scaling.
[How]
Force disable cursor on DPP5 on plane powerdown for ASICs w/ 6 DPPs/HUBPs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4673
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 79b3c037f972dcb13e325a8eabfb8da835764e15)
Cc: stable@vger.kernel.org
|
|
[Why]
Some monitors perform rapid “autoscan” HPD re‑assertions right after a
disconnect or powersaving mode enablement. These appear as a quick
disconnect→reconnect with an identical EDID. Since Linux has no HDMI
hotplug detection (HPD) filter, these quick reconnects are seen as hotplug
events, which can unintentionally wake a system with DPMS off.
An example: https://gitlab.freedesktop.org/drm/amd/-/issues/2876
Such 'fake reconnects' are considered when the interval between a
disconnect and a connect is within 1500ms (experimentally chosen using
several monitors), and the two connections have the same EDID.
[How]
Implement a time-based debounce mechanism:
1. On HDMI disconnect detection, instead of immediately processing the
HPD event, save the current sink and schedule delayed work (default 1500ms)
2. If another HDMI disconnect HPD event arrives during the debounce period,
it reschedules the pending work, ensuring only the final state is processed.
3. When the debounce timer expires, re-detect the display and compare the
new sink with the cached one using EDID comparison.
4. If sinks match (same EDID), this was a spontaneous HPD toggle:
- Update connector state internally
- Skip hotplug event to prevent desktop rearrangement
If sinks differ, this was a real display change:
- Process normally with the hotplug event
The debounce delay is configurable via module parameter
'hdmi_hpd_debounce_delay_ms'.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2876
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c918e75e1ed95be76f8e3156a411188f650fe03f)
|
|
[Why]
Empirical measurement of some monitors that fail to read EDID while
booting shows that the number of retries with a 30ms delay between
tries is as high as 16.
[How]
Increase number of retries to 20.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ad1c59ad7cf74ec06e32fe2c330ac1e957222288)
Cc: stable@vger.kernel.org
|
|
[Why]
When a monitor is booting it's possible that it isn't ready to retrieve
link caps and this can lead to an EDID read failure:
```
[drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read receiver caps dpcd data failed.
amdgpu 0000:c5:00.0: [drm] *ERROR* No EDID read.
```
[How]
Rather than msleep once and try a few times, msleep each time. Should
be no changes for existing working monitors, but should correct reading
caps on a monitor that is slow to boot.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 669dca37b3348a447db04bbdcbb3def94d5997cc)
Cc: stable@vger.kernel.org
|
|
[why]
1. With allow_0_dtb_clk enabled, the time required to latch DTBCLK to 600 MHz
depends on the SMU. If DTBCLK is not latched to 600 MHz before set_mode completes,
gating DTBCLK causes the DP2 sink to lose its clock source.
2. The existing DTBCLK gating sequence ungates DTBCLK based on both pix_clk and ref_dtbclk,
but gates DTBCLK when either pix_clk or ref_dtbclk is zero.
pix_clk can be zero outside the set_mode sequence before DTBCLK is properly latched,
which can lead to DTBCLK being gated by mistake.
[how]
Consider both pixel_clk and ref_dtbclk when determining when it is safe to gate DTBCLK;
this is more accurate.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4701
Fixes: 5949e7c4890c ("drm/amd/display: Enable Dynamic DTBCLK Switch")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d04eb0c402780ca037b62a6aecf23b863545ebca)
Cc: stable@vger.kernel.org
|
|
Instead of checking that the signal is analog before calling the
HWSS disable_audio_stream() function to disable audio, move
the check inside the HWSS function.
Suggested-by: Ray Wu <Ray.Wu@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20251113163348.137315-5-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Match pre-existing patterns in the DC code base.
Instead of returning early from the construct_phy() function,
add a label at the end and use goto to jump there.
Additionally, respect the DC logger and let it log the function
even when it returns early.
Suggested-by: Ray Wu <Ray.Wu@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20251113163348.137315-4-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In the detect_link_and_local_sink() function, do not modify the
EDID capabilities of the display based on the connector. Instead,
respect the analog flag better and when the analog flag is set,
check that the connector indeed supports analog displays.
Suggested-by: Ray Wu <Ray.Wu@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20251113163348.137315-3-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes the following warning that some users are reporting
with some kernel configurations:
"positional initialization of field in 'struct' declared
with 'designated_init' attribute"
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20251113163348.137315-2-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Change comments from kernel-doc style "/**" to normal C comments
"/*" since the comments are not in kernel-doc format.
This fixes around 39 kernel-doc warnings like this one:
drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c:1322: warning:
This comment starts with '/**', but isn't a kernel-doc comment.
Refer Documentation/doc-guide/kernel-doc.rst
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511062036.Ry8Z2APc-lkp@intel.com/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Unregister mce notifier on unload.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This version brings along the following updates:
- Add interface to capture expected HW state from SW state
- Add panel Replay capability detection, DPCD reading, and enablement logic
- Re-check seamless boot enablement on subsequent dc_commit_streams
- Improve DPCD link capability retrieval with increased retries and per-retry delays
- Add HPD filter for HDMI
- Add pipe topology history tracking to DC
- Fix MST initialization on resume when switching from SST to MST during suspend
- Fix double cursor on DCN20 & DCN30 in non-native scaling
- Check DCCG_AUDIO_DTO2 register mask before access
- Fix pbn to kbps conversion
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why&How]
Ignore Coverity false positive analysis in the dmub_cmd.h
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Existing routine has two conversion sequence,
pbn_to_kbps and kbps_to_pbn with margin.
Non of those has without-margin calculation.
kbps_to_pbn with margin conversion includes
fec overhead which has already been included in
pbn_div calculation with 0.994 factor considered.
It is a double counted fec overhead factor that causes
potential bw loss.
[How]
Add without-margin calculation.
Fix fec overhead double counted issue.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3735
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why&How]
Check DCCG_AUDIO_DTO2 register mask exist before access.
Also, add a existing DIO_CLOCK_control register for later use.
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why]
Check that the stream exists to add link->local_sink null pointer access
protection.
Reviewed-by: Harold Sun <harold.sun@amd.com>
Reviewed-by: Ethan Cheung <ethan.cheung@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
On DCN20 & DCN30, the 6th DPP's & HUBP's are powered on permanently and
cannot be power gated. Thus, when dpp_reset() is invoked for the DPP5,
while it's still powered on, the cached cursor_state
(dpp_base->pos.cur0_ctl.bits.cur0_enable)
and the actual state (CUR0_ENABLE) bit are unsycned. This can cause a
double cursor in full screen with non-native scaling.
[How]
Force disable cursor on DPP5 on plane powerdown for ASICs w/ 6 DPPs/HUBPs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4673
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
There is no way to check pipe topology update history through a
dump.
[How]
Add a topology history structure to dc with snapshots of the most recent
pipe topology updates.
Reviewed-by: George Shen <george.shen@amd.com>
Signed-off-by: Nicholas Carbones <ncarbone@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Some monitors perform rapid “autoscan” HPD re‑assertions right after a
disconnect or powersaving mode enablement. These appear as a quick
disconnect→reconnect with an identical EDID. Since Linux has no HDMI
hotplug detection (HPD) filter, these quick reconnects are seen as hotplug
events, which can unintentionally wake a system with DPMS off.
An example: https://gitlab.freedesktop.org/drm/amd/-/issues/2876
Such 'fake reconnects' are considered when the interval between a
disconnect and a connect is within 1500ms (experimentally chosen using
several monitors), and the two connections have the same EDID.
[How]
Implement a time-based debounce mechanism:
1. On HDMI disconnect detection, instead of immediately processing the
HPD event, save the current sink and schedule delayed work (default 1500ms)
2. If another HDMI disconnect HPD event arrives during the debounce period,
it reschedules the pending work, ensuring only the final state is processed.
3. When the debounce timer expires, re-detect the display and compare the
new sink with the cached one using EDID comparison.
4. If sinks match (same EDID), this was a spontaneous HPD toggle:
- Update connector state internally
- Skip hotplug event to prevent desktop rearrangement
If sinks differ, this was a real display change:
- Process normally with the hotplug event
The debounce delay is configurable via module parameter
'hdmi_hpd_debounce_delay_ms'.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2876
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Empirical measurement of some monitors that fail to read EDID while
booting shows that the number of retries with a 30ms delay between
tries is as high as 16.
[How]
Increase number of retries to 20.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
When a monitor is booting it's possible that it isn't ready to retrieve
link caps and this can lead to an EDID read failure:
```
[drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read receiver caps dpcd data failed.
amdgpu 0000:c5:00.0: [drm] *ERROR* No EDID read.
```
[How]
Rather than msleep once and try a few times, msleep each time. Should
be no changes for existing working monitors, but should correct reading
caps on a monitor that is slow to boot.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
If the seamless boot feature has already been enabled, and
dc_commit_streams is called again before receiving a flip, the
driver will adjust the engine clock without turning off the screen,
which will cause garbage to occur. However, in reality, the Pixel
Clock from the first dc_commit_streams and the second dc_commit_streams
are different.
[How]
If the apply seamless boot flag in the previous stream has not been
cleared, and dc_commit_streams is received again, we need to recheck
whether seamless boot should be disabled
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Paul Hsieh <Paul.Hsieh@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why&How]
Read Panel replay caps from DPCD when retrieving link capability
Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why&How]
1.Add flow to enable and configure panel replay enablement and
configuration
2.Add registry key for enable option
3.Add replay version check to be compatible with freesync replay
4.Add AC/DC switch function to notify ac/dc change.
5.Add flow in set event function to check and decide Replay
enable/disable
Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why&How]
For supporting VESA PR, add flow to determine the support capability
Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
To debug certain issues, such as underflow, it is common practice to
dump the HW state of all registers for analysis. The first thing to
check with the dump is to ensure all values are programmed as expected
according to SW state.
[How]
Add interface to DC to capture expected HW register values based on SW
state.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use pci_rebar_get_max_size() to simplify amdgpu_device_resize_fb_bar().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20251113180053.27944-11-ilpo.jarvinen@linux.intel.com
|
|
PCI core handles releasing device's resources and their rollback in case of
failure of a BAR resizing operation. Releasing resource prior to calling
pci_resize_resource() prevents PCI core from restoring the BARs as they
were.
Remove driver-side release of BARs from the amdgpu driver.
Also remove the driver initiated assignment as pci_resize_resource() should
try to assign as much as possible. If the driver side call manages to get
more required resources assigned in some scenario, such a problem should be
fixed inside pci_resize_resource() instead.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU
Link: https://patch.msgid.link/20251113162628.5946-11-ilpo.jarvinen@linux.intel.com
|
|
BAR resize operation is implemented in the pci_resize_resource() and
pbus_reassign_bridge_resources() functions. pci_resize_resource() can be
called either from __resource_resize_store() from sysfs or directly by the
driver for the Endpoint Device.
The pci_resize_resource() requires that caller has released the device
resources that share the bridge window with the BAR to be resized as
otherwise the bridge window is pinned in place and cannot be changed.
pbus_reassign_bridge_resources() rolls back resources if the resize
operation fails, but rollback is performed only for the bridge windows.
Because releasing the device resources are done by the caller of the BAR
resize interface, these functions performing the BAR resize do not have
access to the device resources as they were before the resize.
pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources()
after rolling back the bridge windows as they were, however, it will not
guarantee the resource are assigned due to differences in how FW and the
kernel assign the resources (alignment of the start address and tail).
To perform rollback robustly, the BAR resize interface has to be altered to
also release the device resources that share the bridge window with the BAR
to be resized.
Also, remove restoring from the entries failed list as saved list should
now contain both the bridge windows and device resources so the extra
restore is duplicated work.
Some drivers (currently only amdgpu) want to prevent releasing some
resources. Add exclude_bars param to pci_resize_resource() and make amdgpu
pass its register BAR (BAR 2 or 5), which should never be released during
resize operation. Normally 64-bit prefetchable resources do not share a
bridge window with the 32-bit only register BAR, but there are various
fallbacks in the resource assignment logic which may make the resources
share the bridge window in rare cases.
This change (together with the driver side changes) is to counter the
resource releases that had to be done to prevent resource tree corruption
in the ("PCI: Release assigned resource before restoring them") change. As
such, it likely restores functionality in cases where device resources were
released to avoid resource tree conflicts which appeared to be "working"
when such conflicts were not correctly detected by the kernel.
Reported-by: Simon Richter <Simon.Richter@hogyros.de>
Link: https://lore.kernel.org/linux-pci/f9a8c975-f5d3-4dd2-988e-4371a1433a60@hogyros.de/
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Link: https://lore.kernel.org/linux-pci/874irqop6b.fsf@draig.linaro.org/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash amdgpu BAR selection from
https://lore.kernel.org/r/20251114103053.13778-1-ilpo.jarvinen@linux.intel.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20251113162628.5946-7-ilpo.jarvinen@linux.intel.com
|
|
Now that the DC analog connector support and VCE1 support landed,
amdgpu is at feature parity with the old radeon driver
on SI dGPUs.
Enabling the amdgpu driver by default for SI dGPUs has the
following benefits:
- More stable OpenGL support through RadeonSI
- Vulkan support through RADV
- Improved performance
- Better display features through DC
Users who want to keep using the old driver can do so using:
amdgpu.si_support=0 radeon.si_support=1
v2:
- Update documentation in Kconfig file
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The amdgpu driver has been working well on CIK dGPUs for years.
Now that the DC analog connector support landed,
amdgpu is at feature parity with the old radeon driver
on CIK dGPUs.
Enabling the amdgpu driver by default for CIK dGPUs has the
following benefits:
- More stable OpenGL support through RadeonSI
- Vulkan support through RADV
- Improved performance
- Better display features through DC
Users who want to keep using the old driver can do so using:
amdgpu.cik_support=0 radeon.cik_support=1
v2:
- Update documentation in Kconfig file
v3:
- Rebase documentation updates (Alex)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This code only applies to amdgpu processing
poison consumption after uniras is enabled,
but not to sriov.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add lock to serialize sriov command execution.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Synchronize sriov host to add block_mmsch bit field.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Otherwise job submissions can fail with ENOMEM.
We probably need to re-design the per VMID tracking at some point.
Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4258
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When we run out of VMIDs we need to wait for some to become available.
Previously we were using a dma_fence_array for that, but this means that
we have to allocate memory.
Instead just wait for the first not signaled fence from the least recently
used VMID to signal. That is not as efficient since we end up in this
function multiple times again, but allocating memory can easily fail or
deadlock if we have to wait for memory to become available.
v2: remove now unused VM manager fields
v3: fix dma_fence reference
Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4258
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The amd-smi tool relies on extended peer link information to report xgmi
link metrics. The necessary xgmi ta command, GET_EXTEND_PEER_LINKS, has
been enabled in the host driver and this change is necessary for the
guest to make use of it. To handle the case where the host driver does
not have the latest xgmi ta, the guest driver checks for guest support
through a pf2vf feature flag before invoking psp.
Signed-off-by: Will Aitken <wiaitken@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adds new sriov msg flag to match host, feature flag in the amdgim
enum, and a wrapper macro to check it.
Signed-off-by: Will Aitken <wiaitken@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_xgmi_fill_topology_info and psp_xgmi_reflect_topology_info
perform the same logic of copying topology info of one node to every
other node in the hive. Instead of having two functions that purport to
do the same thing, this refactoring moves the logic of the fill function
to the reflect function and adds reflecting port number info as well for
complete functionality.
Signed-off-by: Will Aitken <wiaitken@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The amdgpu driver has been working well on CIK dGPUs for years.
Now that the DC analog connector support landed, these GPUs
are at feature parity with the old radeon driver.
Additionally, amdgpu yields extra performance, supports Vulkan
and provides more display features through DC as well as more
robust power management.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move the determination into a separate function.
Change amdgpu.si_support and amdgpu.cik_support so that their
default value is -1 (default).
This prepares the code for changing the default driver based
on the chip.
Also adjust the module param documentation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On single GPU systems, avoid accesses to XGMI link registers.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix the following corner case:-
Consider a 2M huge page SVM allocation, followed by prefetch call for
the first 4K page. The whole range is initially mapped with single PTE.
After the prefetch, this range gets split to first page + rest of the
pages. Currently, the first page mapping is not updated on MI300A (APU)
since page hasn't migrated. However, after range split PTE mapping it not
valid.
Fix this by forcing page table update for the whole range when prefetch
is called. Calling prefetch on APU doesn't improve performance. If all
it deteriotes. However, functionality has to be supported.
v2: Use apu_prefer_gtt as this issue doesn't apply to APUs with carveout
VRAM
v3: Simplify by setting the flag for all ASICs as it doesn't affect dGPU
v4: Remove v2 and v3 changes. Force update_mapping when range is split
at a size that is not aligned to prange granularity
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 076470b9f6f8d9c7c8ca73a9f054942a686f9ba7)
|