21 KiB
title, date, excerpt, tags, rating
title | date | excerpt | tags | rating |
---|---|---|---|---|
VirtualTexture学习笔记 | 2024-02-20 18:26:49 | ⭐ |
前言
- UE4 Runtime Virtual Texture 实现机制及源码解析:https://zhuanlan.zhihu.com/p/143709152
- UE Virtual Texture图文浅析:https://zhuanlan.zhihu.com/p/642580472
相关概念
- Virtual Texture:虚拟纹理,以下简称 VT
- Runtime Virtual Texture:UE4 运行时虚拟纹理系统,以下简称 RVT
- VT feedback:存储当前屏幕像素对应的 VT Page 信息,用于加载 VT 数据。
- VT Physical Texture:虚拟纹理对应的物理纹理资源
- PageTable:虚拟纹理页表,用来寻址 VT Physical Texture Page 数据。
- PageTable Texture:包含虚拟纹理页表数据的纹理资源,通过此纹理资源可查询 Physical Texture Page 信息。有些 VT 系统也叫 Indirection Texture,由于本文分析 UE4 VT 的内容,这里采用 UE4 术语。
- PageTable Buffer:包含虚拟纹理页表数据内容的 GPU Buffer 资源。
相关类:
- URuntimeVirtualTexture(UObject)
- FRuntimeVirtualTextureRenderResource
- UVirtualTexture(UObject)
- UVirtualTexture2D(UTexture2D)
UE5 VirtualHeightfieldMesh简述
https://zhuanlan.zhihu.com/p/575398476
可能的相关类
- VirtualHeightfieldMesh
- UVirtualHeightfieldMeshComponent
- UHeightfieldMinMaxTexture
- BuildTexture()
- UHeightfieldMinMaxTexture
- FVirtualHeightfieldMeshSceneProxy
- FVirtualHeightfieldMeshRendererExtension
- AddWork()
- SubmitWork()
- FVirtualTextureFeedbackBuffer 参考#Pass1的补充VirtualTextureFeedback
- UVirtualHeightfieldMeshComponent
- UNiagaraDataInterfaceLandscape
- UNiagaraDataInterfaceVirtualTexture(NiagaraDataInterfaceVirtualTextureTemplate.ush)
- GetAttributesValid()
- SampleRVTLayer()
- SampleRVT()
- URuntimeVirtualTextureComponent
VirtualHeightfieldMesh
首先是MinMaxTexture。全称UHeightfieldMinMaxTexture(下简称MinMaxTexture),可以说是整个VHM中最重要的部分之一。它是离线生成的,目的主要是以下几个:
- 用作Instance的剔除(遮挡剔除查询+Frustum剔除)
- 用作决定VHM的LOD
- 用作平滑VHM的顶点位置
其中比较关键的几个成员变量为:
- TObjectPtr Texture:BGRA8格式、贴图大小与RVT的Tile数量一致、有全部mipmap。每个像素存储RVT一个Tile中的最小值以及最大值,各为16bit、encode在RGBA的4个通道上。
- TObjectPtr LodBiasTexture:G8格式、贴图大小与RTV的Tile数量一致、无mipmap。每个像素存储了Texture对应像素周围3x3blur之后的结果。
- TObjectPtr LodBiasMinMaxTexture:BGRA8格式、贴图大小与RTV的Tile数量一致、有全部mipmap。类似于HZB、每个像素存储LodBiasTexture的最小值以及最大值,各为8bit、存在RG两个通道上。
- int32 MaxCPULevels:表示共需要在CPU端存储多少层level的数据。
- TArray
<FVector2D>
TextureData:CPU端存储Texture贴图的数据,共MaxCPULevels层mipmap。
TextureData的获取
因此要生成MinMaxTexture、最关键的就是要得到TextureData,其入口函数为位于HeightfieldMinMaxTextureBuilder.cpp的VirtualHeightfieldmesh::BuildMinMaxTexture中。由于Texture存储的是RVT中每个Tile中最小最大值,因此不难想象到其大致流程可以分为以下几步:
- 遍历RVT的每个Tile并绘制到一张中间贴图上,然后计算这张中间纹理的最小最大值、存储至目标贴图对应的位置上;
- 为目标贴图生成mipmap;
- 将目标贴图回读至CPU、得到TextureData。
将Tile绘制到一张中间贴图使用的是自带的RuntimeVirtualTexture::RenderPagesStandAlone函数;计算最小最大值是通过Downsample的方式计算而成。如下图所示为2x2Tiles、4TileSize的RVT,计算Tile0的最小最大值的示意过程图:
Downsample的ComputeShader为TMinMaxTextureCS。遍历计算完每个Tile的最小最大值后,同样通过Downsample为目标贴图生成全mipmap。
最后为了将贴图回读到CPU,先是通过CopyTexture的方式将目标贴图的各个mipmap复制到各自带有CPUReadback Flag的贴图后,再通过MapStagingSurface/UnmapStagingSurface的方式复制到CPU内存上。由于是比较常规的操作,就不过多介绍了。
至此也就得到了带有所有mipmap的CPU端的TextureData,接着将此作为参数调用UHeightfieldMinMaxTexture::BuildTexture以生成剩下的内容(即Texture、LodBiasTexture、LodBiasMinMaxTexture)。
FVirtualHeightfieldMeshSceneProxy
至此离线生成的MinMaxTexture介绍完毕,后面都是实时渲染内容的介绍部分。所有内容都围绕着VHM的SceneProxy也就是FVirtualHeightfieldMeshSceneProxy展开。
遮挡剔除
关于硬件的遮挡剔除用法可以参考DX12的官方sample[8]
首先是遮挡剔除部分,VHM做了Tile级别且带有LOD的遮挡剔除。VHM的SceneProxy重写了函数GetOcclusionQueries,函数实现只是简单地返回OcclusionVolumes:
OcclusionVolumes的构建在函数BuildOcclusionVolumes中,其基本思路为取MinMaxTexture中CPU端的TextureData的数据、获得每个Tile的高度最小最大值来创建该Tile的Bounds信息。
可以看到OcclusionVolumes是带有Lod的。当然实际上这里的代码的LodIndex不一定从0开始,因为Component中有一项成员变量NumOcclusionLod、表示创建多少层mipmap的OcclusionVolumes。另外有一点需要注意的是,NumOcclusionLod默认值为0、也就是说VHM的遮挡剔除默认没有开启。
由于VHM需要在ComputePass中动态地构建Instance绘制的IndirectArgs、因此SceneProxy还重写了函数AcceptOcclusionResults,用以获取遮挡剔除的结果。具体是将UE返回的遮挡剔除的结果存在贴图OcclusionTexture上、以便能够作为SRV在后续的Pass中访问:
void FVirtualHeightfieldMeshSceneProxy::AcceptOcclusionResults(FSceneView const* View, TArray<bool>* Results, int32 ResultsStart, int32 NumResults)
{
// 由于构建IndirectArgs跟SceneProxy不在同一个地方,因此用了一个全局变量来保存遮挡剔除的结果
FOcclusionResults& OcclusionResults = GOcclusionResults.Emplace(FOcclusionResultsKey(this, View));
OcclusionResults.TextureSize = OcclusionGridSize;
OcclusionResults.NumTextureMips = NumOcclusionLods;
// 创建贴图,并将遮挡剔除结果Copy至贴图上
FRHIResourceCreateInfo CreateInfo(TEXT("VirtualHeightfieldMesh.OcclusionTexture"));
OcclusionResults.OcclusionTexture = RHICreateTexture2D(OcclusionGridSize.X, OcclusionGridSize.Y, PF_G8, NumOcclusionLods, 1, TexCreate_ShaderResource, CreateInfo);
bool const* Src = Results->GetData() + ResultsStart;
FIntPoint Size = OcclusionGridSize;
for (int32 MipIndex = 0; MipIndex < NumOcclusionLods; ++MipIndex)
{
uint32 Stride;
uint8* Dst = (uint8*)RHILockTexture2D(OcclusionResults.OcclusionTexture, MipIndex, RLM_WriteOnly, Stride, false);
for (int Y = 0; Y < Size.Y; ++Y)
{
for (int X = 0; X < Size.X; ++X)
{
Dst[Y * Stride + X] = *(Src++) ? 255 : 0;
}
}
RHIUnlockTexture2D(OcclusionResults.OcclusionTexture, MipIndex, false);
Size.X = FMath::Max(Size.X / 2, 1);
Size.Y = FMath::Max(Size.Y / 2, 1);
}
}
整体思路
至此就开始真正的VHM的Mesh的数据构建了。为了后续的代码细节能够更加易懂,这里再说明一下VHM构建mesh的整体思路:假设我们有一个工作队列为QueueBuffer,每一项工作就是从QueueBuffer中取出一项工作(更准确地说,取出一个Quad)、对这个Quad判断是否需要进行细化、如果需要细分则将这个Quad细分为4个Quad并塞入QueueBuffer中。
重复这个取出→处理→放回的过程,直到QueueBuffer中没有工作为止。示意图如下:
RVT相关代码(Pass1:CollectQuad)
如果不能细分,那么就会增加一个Instance、将其Instance的数据写入RWQuadBuffer中。RWQuadBuffer将会用在后续的CullInstance Pass中,以真正地构建IndirectArgs:
// 无法继续细分的情况
// 用以后续对RVT进行采样
uint PhysicalAddress = PageTableTexture.Load(int3(Pos, Level));
InterlockAdd(RWQuadInfo.Write, 1, Write);
RWQuadBuffer[Write] = Pack(InitQuadRenderItem(Pos, Level, PhysicalAddress, bCull | bOcclude));
其中的RWQuadInfo是我编的变量名、实际的代码中并不存在。或者说实际上这里的变量名是RWIndirectArgsBuffer,但是并不是前面所说的用以绘制的IndirectArgs。为了不让大家混淆,这里改了下变量名
另外也能由此看出的是,VHM也许曾经想过利用IndirectArgs数组来绘制(即DXSample中将符合条件的生成IndirectArgs放进数组中)。但是最后改成的是一个IndirectArgs但是Instance的绘制方式
PS. PageTableTexture的类型为RHITextuire。相关Shader代码位于VirtualHeightfieldMesh.usf
Pass1的补充VirtualTextureFeedback
不再继续进行细分后、说明后续就要对该Level的RVT进行采样,因此需要处理对应的Feedback信息、让虚幻可以加载对应的Page。shader代码如下图所示:
c++中则要将这个RWFeedbackBuffer喂给虚幻的函数SubmitVirtualTextureFeedbackBuffer:
相关代码段
FVertexFactoryIntermediates GetVertexFactoryIntermediates(FVertexFactoryInput Input)
{
...
// Sample height from virtual texture.
VTUniform Uniform = VTUniform_Unpack(VHM.VTPackedUniform);
Uniform.vPageBorderSize -= .5f * VHM.PhysicalTextureSize.y; // Half texel offset is used in VT write and in sampling because we want texel locations to match landscape vertices.
VTPageTableUniform PageTableUniform = VTPageTableUniform_Unpack(VHM.VTPackedPageTableUniform0, VHM.VTPackedPageTableUniform1);
VTPageTableResult VTResult0 = TextureLoadVirtualPageTableLevel(VHM.PageTableTexture, PageTableUniform, NormalizedPos, VTADDRESSMODE_CLAMP, VTADDRESSMODE_CLAMP, floor(SampleLevel));
float2 UV0 = VTComputePhysicalUVs(VTResult0, 0, Uniform);
float Height0 = VHM.HeightTexture.SampleLevel(VHM.HeightSampler, UV0, 0);
VTPageTableResult VTResult1 = TextureLoadVirtualPageTableLevel(VHM.PageTableTexture, PageTableUniform, NormalizedPos, VTADDRESSMODE_CLAMP, VTADDRESSMODE_CLAMP, ceil(SampleLevel));
float2 UV1 = VTComputePhysicalUVs(VTResult1, 0, Uniform);
float Height1 = VHM.HeightTexture.SampleLevel(VHM.HeightSampler, UV1, 0);
float Height = lerp(Height0.x, Height1.x, frac(SampleLevel));
...
}
渲染线程创建VT的相关逻辑:
void FVirtualHeightfieldMeshSceneProxy::CreateRenderThreadResources()
{
if (RuntimeVirtualTexture != nullptr)
{
if (!bCallbackRegistered)
{
GetRendererModule().AddVirtualTextureProducerDestroyedCallback(RuntimeVirtualTexture->GetProducerHandle(), &OnVirtualTextureDestroyedCB, this);
bCallbackRegistered = true;
}
//URuntimeVirtualTexture* RuntimeVirtualTexture;
if (RuntimeVirtualTexture->GetMaterialType() == ERuntimeVirtualTextureMaterialType::WorldHeight)
{
AllocatedVirtualTexture = RuntimeVirtualTexture->GetAllocatedVirtualTexture();
NumQuadsPerTileSide = RuntimeVirtualTexture->GetTileSize();
if (AllocatedVirtualTexture != nullptr)
{
// Gather vertex factory uniform parameters.
FVirtualHeightfieldMeshVertexFactoryParameters UniformParams;
UniformParams.PageTableTexture = AllocatedVirtualTexture->GetPageTableTexture(0);
UniformParams.HeightTexture = AllocatedVirtualTexture->GetPhysicalTextureSRV(0, false);
UniformParams.HeightSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
UniformParams.LodBiasTexture = LodBiasTexture ? LodBiasTexture->GetResource()->TextureRHI : GBlackTexture->TextureRHI;
UniformParams.LodBiasSampler = TStaticSamplerState<SF_Point>::GetRHI();
UniformParams.NumQuadsPerTileSide = NumQuadsPerTileSide;
FUintVector4 PackedUniform;
AllocatedVirtualTexture->GetPackedUniform(&PackedUniform, 0);
UniformParams.VTPackedUniform = PackedUniform;
FUintVector4 PackedPageTableUniform[2];
AllocatedVirtualTexture->GetPackedPageTableUniform(PackedPageTableUniform);
UniformParams.VTPackedPageTableUniform0 = PackedPageTableUniform[0];
UniformParams.VTPackedPageTableUniform1 = PackedPageTableUniform[1];
const float PageTableSizeX = AllocatedVirtualTexture->GetWidthInTiles();
const float PageTableSizeY = AllocatedVirtualTexture->GetHeightInTiles();
UniformParams.PageTableSize = FVector4f(PageTableSizeX, PageTableSizeY, 1.f / PageTableSizeX, 1.f / PageTableSizeY);
const float PhysicalTextureSize = AllocatedVirtualTexture->GetPhysicalTextureSize(0);
UniformParams.PhysicalTextureSize = FVector2f(PhysicalTextureSize, 1.f / PhysicalTextureSize);
UniformParams.VirtualHeightfieldToLocal = FMatrix44f(UVToLocal);
UniformParams.VirtualHeightfieldToWorld = FMatrix44f(UVToWorld); // LWC_TODO: Precision loss
UniformParams.MaxLod = AllocatedVirtualTexture->GetMaxLevel();
UniformParams.LodBiasScale = LodBiasScale;
// Create vertex factory.
VertexFactory = new FVirtualHeightfieldMeshVertexFactory(GetScene().GetFeatureLevel(), UniformParams);
VertexFactory->InitResource(FRHICommandListImmediate::Get());
}
}
}
}
RVT生成相关
RVT相关操作总结
CPU端创建:
作为UniformParameter传递到GPU端:
AllocatedVirtualTexture = RuntimeVirtualTexture->GetAllocatedVirtualTexture();
//PageTableTexture、Texture&Sampler
FVirtualHeightfieldMeshVertexFactoryParameters UniformParams;
UniformParams.PageTableTexture = AllocatedVirtualTexture->GetPageTableTexture(0);
UniformParams.HeightTexture = AllocatedVirtualTexture->GetPhysicalTextureSRV(0, false);
UniformParams.HeightSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
//VTPackedUniform&VTPackedPageTableUniform
FUintVector4 PackedUniform;
AllocatedVirtualTexture->GetPackedUniform(&PackedUniform, 0);
UniformParams.VTPackedUniform = PackedUniform;
FUintVector4 PackedPageTableUniform[2];
AllocatedVirtualTexture->GetPackedPageTableUniform(PackedPageTableUniform);
UniformParams.VTPackedPageTableUniform0 = PackedPageTableUniform[0];
UniformParams.VTPackedPageTableUniform1 = PackedPageTableUniform[1];
//PageTableSize
const float PageTableSizeX = AllocatedVirtualTexture->GetWidthInTiles();
const float PageTableSizeY = AllocatedVirtualTexture->GetHeightInTiles();
UniformParams.PageTableSize = FVector4f(PageTableSizeX, PageTableSizeY, 1.f / PageTableSizeX, 1.f / PageTableSizeY);
//PhysicalTextureSize
const float PhysicalTextureSize = AllocatedVirtualTexture->GetPhysicalTextureSize(0);
UniformParams.PhysicalTextureSize = FVector2f(PhysicalTextureSize, 1.f / PhysicalTextureSize);
//Local <=> World Matrix
UniformParams.VirtualHeightfieldToLocal = FMatrix44f(UVToLocal);
UniformParams.VirtualHeightfieldToWorld = FMatrix44f(UVToWorld); // LWC_TODO: Precision loss
//MaxLod
UniformParams.MaxLod = AllocatedVirtualTexture->GetMaxLevel();
GPU端采样:
VTUniform Uniform = VTUniform_Unpack(VHM.VTPackedUniform);
Uniform.vPageBorderSize -= .5f * VHM.PhysicalTextureSize.y; // Half texel offset is used in VT write and in sampling because we want texel locations to match landscape vertices.
VTPageTableUniform PageTableUniform = VTPageTableUniform_Unpack(VHM.VTPackedPageTableUniform0, VHM.VTPackedPageTableUniform1);
VTPageTableResult VTResult0 = TextureLoadVirtualPageTableLevel(VHM.PageTableTexture, PageTableUniform, NormalizedPos, VTADDRESSMODE_CLAMP, VTADDRESSMODE_CLAMP, floor(SampleLevel));
float2 UV0 = VTComputePhysicalUVs(VTResult0, 0, Uniform);
float Height0 = VHM.HeightTexture.SampleLevel(VHM.HeightSampler, UV0, 0);
VTPageTableResult VTResult1 = TextureLoadVirtualPageTableLevel(VHM.PageTableTexture, PageTableUniform, NormalizedPos, VTADDRESSMODE_CLAMP, VTADDRESSMODE_CLAMP, ceil(SampleLevel));
float2 UV1 = VTComputePhysicalUVs(VTResult1, 0, Uniform);
float Height1 = VHM.HeightTexture.SampleLevel(VHM.HeightSampler, UV1, 0);
float Height = lerp(Height0.x, Height1.x, frac(SampleLevel));
NiagaraDataInterfaceVirtualTextureTemplate.ush中的代码:
//其他相关VT操作函数位于VirtualTextureCommon.ush
float4 SampleRVTLayer_{ParameterName}(float2 SampleUV, Texture2D InTexture, Texture2D<uint4> InPageTable, uint4 InTextureUniforms)
{
VTPageTableResult PageTable = TextureLoadVirtualPageTableLevel(InPageTable, VTPageTableUniform_Unpack({ParameterName}_PageTableUniforms[0], {ParameterName}_PageTableUniforms[1]), SampleUV, VTADDRESSMODE_CLAMP, VTADDRESSMODE_CLAMP, 0.0f);
return TextureVirtualSample(InTexture, {ParameterName}_SharedSampler, PageTable, 0, VTUniform_Unpack(InTextureUniforms));
}
void SampleRVT_{ParameterName}(in float3 WorldPosition, out bool bInsideVolume, out float3 BaseColor, out float Specular, out float Roughness, out float3 Normal, out float WorldHeight, out float Mask)
{
bInsideVolume = false;
BaseColor = float3(0.0f, 0.0f, 0.0f);
Specular = 0.5f;
Roughness = 0.5f;
Normal = float3(0.0f, 0.0f, 1.0f);
WorldHeight = 0.0f;
Mask = 1.0f;
// Get Sample Location
FLWCVector3 LWCWorldPosition = MakeLWCVector3({ParameterName}_SystemLWCTile, WorldPosition);
FLWCVector3 LWCUVOrigin = MakeLWCVector3({ParameterName}_SystemLWCTile, {ParameterName}_UVUniforms[0].xyz);
float2 SampleUV = VirtualTextureWorldToUV(LWCWorldPosition, LWCUVOrigin, {ParameterName}_UVUniforms[1].xyz, {ParameterName}_UVUniforms[2].xyz);
// Test to see if we are inside the volume, but still take the samples as it will clamp to the edge
bInsideVolume = all(SampleUV >- 0.0f) && all(SampleUV < 1.0f);
// Sample Textures
float4 LayerSample[3];
LayerSample[0] = ({ParameterName}_ValidLayersMask & 0x1) != 0 ? SampleRVTLayer_{ParameterName}(SampleUV, {ParameterName}_VirtualTexture0, {ParameterName}_VirtualTexture0PageTable, {ParameterName}_VirtualTexture0TextureUniforms) : 0;
LayerSample[1] = ({ParameterName}_ValidLayersMask & 0x2) != 0 ? SampleRVTLayer_{ParameterName}(SampleUV, {ParameterName}_VirtualTexture1, {ParameterName}_VirtualTexture1PageTable, {ParameterName}_VirtualTexture1TextureUniforms) : 0;
LayerSample[2] = ({ParameterName}_ValidLayersMask & 0x4) != 0 ? SampleRVTLayer_{ParameterName}(SampleUV, {ParameterName}_VirtualTexture2, {ParameterName}_VirtualTexture2PageTable, {ParameterName}_VirtualTexture2TextureUniforms) : 0;
// Sample Available Attributes
switch ( {ParameterName}_MaterialType )
{
case ERuntimeVirtualTextureMaterialType_BaseColor:
{
BaseColor = LayerSample[0].xyz;
break;
}
case ERuntimeVirtualTextureMaterialType_BaseColor_Normal_Roughness:
{
BaseColor = VirtualTextureUnpackBaseColorSRGB(LayerSample[0]);
Roughness = LayerSample[1].y;
Normal = VirtualTextureUnpackNormalBGR565(LayerSample[1]);
break;
}
case ERuntimeVirtualTextureMaterialType_BaseColor_Normal_DEPRECATED:
case ERuntimeVirtualTextureMaterialType_BaseColor_Normal_Specular:
{
BaseColor = LayerSample[0].xyz;
Specular = LayerSample[1].x;
Roughness = LayerSample[1].y;
Normal = VirtualTextureUnpackNormalBC3BC3(LayerSample[0], LayerSample[1]);
break;
}
case ERuntimeVirtualTextureMaterialType_BaseColor_Normal_Specular_YCoCg:
{
BaseColor = VirtualTextureUnpackBaseColorYCoCg(LayerSample[0]);
Specular = LayerSample[2].x;
Roughness = LayerSample[2].y;
Normal = VirtualTextureUnpackNormalBC5BC1(LayerSample[1], LayerSample[2]);
break;
}
case ERuntimeVirtualTextureMaterialType_BaseColor_Normal_Specular_Mask_YCoCg:
{
BaseColor = VirtualTextureUnpackBaseColorYCoCg(LayerSample[0]);
Specular = LayerSample[2].x;
Roughness = LayerSample[2].y;
Normal = VirtualTextureUnpackNormalBC5BC1(LayerSample[1], LayerSample[2]);
Mask = LayerSample[2].w;
break;
}
case ERuntimeVirtualTextureMaterialType_WorldHeight:
{
WorldHeight = VirtualTextureUnpackHeight(LayerSample[0], {ParameterName}_WorldHeightUnpack);
break;
}
}
}
VT还存在一个反馈机制,具体可以参考:#Pass1的补充VirtualTextureFeedback
/** GPU fence pool. Contains a fence array that is kept in sync with the FeedbackItems ring buffer. Fences are used to know when a transfer is ready to Map() without stalling. */
/** GPU 栅栏池。其中包含一个与 FeedbackItems 环形缓冲区保持同步的栅栏数组。栅栏用于了解传输何时准备就绪,可在不停滞的情况下进行 Map()。 */
class FFeedbackGPUFencePool* Fences;