Amends a few interfaces to be able to handle the migration over to the
new Memory class by passing the class by reference as a function
parameter where necessary.
Notably, within the filesystem services, this eliminates two ReadBlock()
calls by using the helper functions of HLERequestContext to do that for
us.
These will eventually be migrated into the main Memory class, but for
now, we put them in an anonymous namespace, so that the other functions
that use them, can be migrated over separately.
A fairly straightforward migration. These member functions can just be
mostly moved verbatim with minor changes. We already have the necessary
plumbing in places that they're used.
IsKernelVirtualAddress() can remain a non-member function, since it
doesn't rely on class state in any form.
Migrates all of the direct mapping facilities over to the new memory
class. In the process, this also obsoletes the need for memory_setup.h,
so we can remove it entirely from the project.
Currently, the main memory management code is one of the remaining
places where we have global state. The next series of changes will aim
to rectify this.
This change simply introduces the main skeleton of the class that will
contain all the necessary state.
* core_timing: Use better reference tracking for EventType.
- Moves ownership of the event to the caller, ensuring we don't fire events for destroyed objects.
- Removes need for unique names - we won't be using this for save states anyways.
The heuristic to detect AMD's driver was not working properly since it
also included Intel. Instead of using heuristics to detect it, compare
the GL_VENDOR string.
We relies on UNREACHABLE's noreturn attribute to eliminate parent's "no return value" warning. However, this was wrapped in a `if(!false)` block, which compilers may not unfold to recognize the noreturn nature.
SSBOs and other resources are limited per pipeline on Intel and AMD.
Heuristically reserve resources per stage having in mind the reported
OpenGL limits.
The current shared memory size seems to be smaller than what the game
actually uses. This makes Nvidia's driver consistently blow up; in the
case of FE3H it made it explode on Qt's SwapBuffers while SDL2 worked
just fine. For now keep this hack since it's still progress over the
previous hardcoded shared memory size.
Drop the usage of ARB_compute_variable_group_size and specialize compute
shaders instead. This permits compute to run on AMD and Intel
proprietary drivers.
Some games like "Fire Emblem: Three Houses" bind 2D textures to offsets
used by instructions of 1D textures. To handle the discrepancy this
commit uses the the texture type from the binding and modifies the
emitted code IR to build a valid backend expression.
E.g.: Bound texture is 2D and instruction is 1D, the emitted IR samples
a 2D texture in the coordinate ivec2(X, 0).
This commit ensures cond var threads act exactly as they do in the real
console. The original implementation uses an RBTree and the behavior of
cond var threads is that at the same priority level they act like a
FIFO.
This commit aims to redo the full setup of invalid textures and
guarantee correct behavior across backends in the case of finding one by
using black dummy textures that match the target of the expected
texture.
While DEPBAR is stubbed it doesn't change anything from our end. Shading
languages handle what this instruction does implicitly. We are not
getting anything out fo this log except noise.
Nvidia has sane default output values for varyings, but the other
vendors don't apply these. To properly emulate this we would have to
analyze the shader header. For the time being, apply the same default
Nvidia applies so we get the same behaviour on non-Nvidia drivers.
This commit corrects the behavior of cancel synchronization when the
thread is running/ready and ensures the next wait is cancelled as it's
suppose to.
format_lookup_table: Drop bitfields
format_lookup_table: Use std::array for definition table
format_lookup_table: Include <limits> instead of <numeric>
Use a large flat array to look up texture formats. This allows us to
properly implement formats with different component types. It should
also be faster.
Abstracted ComponentType was not being used in a meaningful way.
This commit drops its usage.
There is one place where it was being used to test compatibility between
two cached surfaces, but this one is implied in the pixel format.
Removing the component type test doesn't change the behaviour.
Maintains implementation parity between QueryApplicationPlayStatistics
and QueryApplicationPlayStatisticsByUid.
These function the same behaviorally underneath the hood, with the only
difference being that one allows specifying a UID.
This properly handles unicode-based paths on Windows, while opening a
raw stream doesn't out-of-the-box.
Prevents file creation from potentially failing on Windows PCs that make
use of unicode characters in their save paths (e.g. writing to a user's
AppData folder, where the user has a name with non-ASCII characters).
Since the introduction of this library, numerous improvements have been
made. Notably, many of the warnings we would get by simply including the
library header have now been fixed. This makes it much easier to make
conversion warning an error.
Uncovered a bug within Thread's SetCoreAndAffinityMask() where an
unsigned variable (ideal_core) was being compared against "< 0", which
would always be a false condition.
We can also get rid of an unused function (GetNextProcessorId) which contained a sign
mismatch warning.
Quite frequently there have been cases where code has been merged into
the core that produces warning. In order to prevent this from occurring,
we can make the compiler flag these cases and allow our CI to flag down
any code that would generate these warnings.
This is beneficial given silent conversions from signed/unsigned can
result in logic bugs. This forces one writing changes to be explicit
about when signedness conversions are desirable, rather than leaving it
up to readers' interpretation.
Currently the codebase isn't in a state where it will build successfully
with this change applied, but this will be addressed in subsequent
follow-up changes. This set of changes will focus on making it build
properly with these changes for MSVC as a starting point for basic
coverage.
`boost::make_iterator_range` is available when `boost/range/iterator_range.hpp` is included.
Also include `boost/icl/interval_map.hpp` and `boost/icl/interval_set.hpp`.
Avoids potential allocations due to the usage of std::string on strings
that we know at compile time. Most of these might fit in SSO, but it
adds complexity that can be easily avoided with string views.
Emulates negative y viewports with ARB_clip_control. This allows us to
more easily emulated pipelines with tessellation and/or geometry shader
stages. It also avoids corrupting games with transform feedbacks and
negative viewports (gl_Position.y was being modified).
Update src/video_core/shader/control_flow.cpp
Co-Authored-By: Mat M. <mathew1800@gmail.com>
Update src/video_core/shader/control_flow.cpp
Co-Authored-By: Mat M. <mathew1800@gmail.com>
Update src/video_core/shader/control_flow.cpp
Co-Authored-By: Mat M. <mathew1800@gmail.com>
Update src/video_core/shader/control_flow.cpp
Co-Authored-By: Mat M. <mathew1800@gmail.com>
Update src/video_core/shader/control_flow.cpp
Co-Authored-By: Mat M. <mathew1800@gmail.com>
Update src/video_core/shader/control_flow.cpp
Co-Authored-By: Mat M. <mathew1800@gmail.com>
- This does not actually seem to exist in the real kernel - games reset these automatically.
# Conflicts:
# src/core/hle/service/am/applets/applets.cpp
# src/core/hle/service/filesystem/fsp_srv.cpp
Nvidia's OpenGL driver maps gl(Named)BufferSubData with some requirements
to a fast. This path has an extra memcpy but updates the buffer without
orphaning or waiting for previous calls. It can be seen as a better
model for "push constants" that can upload a whole UBO instead of 256
bytes.
This path has some requirements established here:
http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering-techniques.pdf#page=24
Instead of using the stream buffer, this commits moves constant buffers
uploads to calls of glNamedBufferSubData and from my testing it brings a
performance improvement. This is disabled when the vendor is not Nvidia
since it brings performance regressions.
Originally on the last commit I thought TLD4 acted the same as TLD4S and
didn't have a mask. It actually does have a component mask. This commit
corrects that.
This commit fixes an issue where not all 4 results of tld4 were being
written, the color component was defaulted to red, among other things.
It also implements the bindless variant.
Bindless textures were using u64 to pack the buffer and offset from
where they come from. Drop this in favor of separated entries in the
struct.
Remove the usage of std::set in favor of std::list (it's not std::vector
to avoid reference invalidations) for samplers and images.
Greatly shrinks the amount of generated code for GetDecodeTable().
Collapses an assembly output of 9000+ lines down to ~3621 with Clang,
and 6513 down to ~2616 with GCC, given it's now allowed to construct all
the entries as a sequence of constant data.
Given the overall size of the maps are very small, we can use arrays of
pairs here instead of always heap allocating a new map every time the
functions are called. Given the small size of the maps, the difference
in container lookups are negligible, especially given the entries are
already sorted.