This manages two kinds of streaming buffers: one for unified memory
models and one for dedicated GPUs. The first one skips the copy from the
staging buffer to the real buffer, since it creates an unified buffer.
This implementation waits for all fences to finish their operation
before "invalidating". This is suboptimal since it should allocate
another buffer or start searching from the beginning. There is room for
improvement here.
This could also handle AMD's "pinned" memory (a heap with 256 MiB) that
seems to be designed for buffer streaming.
The scheduler abstracts command buffer and fence management with an
interface that's able to do OpenGL-like operations on Vulkan command
buffers.
It returns by value a command buffer and fence that have to be used for
subsequent operations until Flush or Finish is executed, after that the
current execution context (the pair of command buffers and fences) gets
invalidated a new one must be fetched. Thankfully validation layers will
quickly detect if this is skipped throwing an error due to modifications
to a sent command buffer.
As fetching command list headers and and the list of command headers is a fixed 1:1 relation now, they can be implemented within a single call.
This cleans up the Step() logic quite a bit.
Fetching every u32 from memory leads to a big overhead. So let's fetch all of them as a block if possible.
This reduces the Memory::* calls by the dma_pusher by a factor of 10.
Gets rid of the largest set of mutable global state within the core.
This also paves a way for eliminating usages of GetInstance() on the
System class as a follow-up.
Note that no behavioral changes have been made, and this simply extracts
the functionality into a class. This also has the benefit of making
dependencies on the core timing functionality explicit within the
relevant interfaces.
Previously, we were completely ignoring for screenshots whether the game uses RGB or sRGB.
This resulted in screenshot colors that looked off for some titles.
There are some potential edge cases where gl_state may fail to track the
state if a related state changes while the toggle is disabled or it
didn't change. This addresses that.
Handles a pool of resources protected by fences. Manages resource
overflow allocating more resources.
This class is intended to be used through inheritance.
Fences take ownership of objects, protecting them from GPU-side or
driver-side concurrent access. They must be commited from the resource
manager. Their usage flow is: commit the fence from the resource
manager, protect resources with it and use them, send the fence to an
execution queue and Wait for it if needed and then call Release. Used
resources will automatically be signaled when they are free to be
reused.
VKDevice contains all the data required to manage and initialize a
physical device. Its intention is to be passed across Vulkan objects to
query device-specific data (for example the logical device and the
dispatch loader).
We already store a reference to the system instance that the renderer is
created with, so we don't need to refer to the system instance via
Core::System::GetInstance()
This file is intended to be included instead of vulkan/vulkan.hpp. It
includes declarations of unique handlers using a dynamic dispatcher
instead of a static one (which would require linking to a Vulkan
library).
Places all of the timing-related functionality under the existing Core
namespace to keep things consistent, rather than having the timing
utilities sitting in its own completely separate namespace.
When I originally added the compute assert I used the wrong
documentation. This addresses that.
The dispatch register was tested with homebrew against hardware and is
triggered by some games (e.g. Super Mario Odyssey). What exactly is
missing to get a valid program bound by this engine requires more
investigation.
This was originally included because texture operations returned a vec4.
These operations now return a single float and the F4 prefix doesn't
mean anything.
Previous code relied on GLSL parameter order (something that's always
ill-formed on an IR design). This approach passes spatial coordiantes
through operation nodes and array and depth compare values in the the
texture metadata. It still contains an "extra" vector containing generic
nodes for bias and component index (for example) which is still a bit
ill-formed but it should be better than the previous approach.
i965 (and probably all mesa drivers) require GL_PROGRAM_SEPARABLE when using
glProgramBinary. This is probably required by the standard but it's ignored by
permisive proprietary drivers.
Some games search conditionally use global memory instructions. This
allows the heuristic to search inside conditional nodes for the source
constant buffer.
Some games call LDG at the top of a basic block, making the tracking
heuristic to fail. This commit lets the heuristic the decoded nodes as a
whole instead of per basic blocks.
This may lead to some false positives but allows it the heuristic to
track cases it previously couldn't.