# Erosion 8

### From DaveWiki

As an aside to our previous adventures in erosion we'll take a quick look at the overall algorithm performance and make a few changes for the sake of efficiency. Doing very basic profiling on the erosion application so far results in:

Method | Debug Time (ms) | Release Time (ms) |
---|---|---|

CreateBaseNoise() | 280 | 240 |

FillSeaLevel() | 10 | 0.3 |

FindAllLakes() | 5500 | 160 |

ErodeAll() 10 pixel Blend | 81000 | 23500 |

ErodeAll() 4 pixel Blend | 14000 | 4000 |

ErodeAll() Simple Blend | 1700 | 100 |

ErodeAll() No Blend | 900 | 65 |

CreateLakeMap() | 8 | 0.8 |

CreateCombinedMap() | 15 | 0.6 |

While the release times of most components do not look too bad keep in mind that most of the algorithms used so far were choosen for their implementation simplicity and are at least O(n**2) if not O(n**3) or worse. The current test map size is only 256x256 and if we move to a more realistically sized height map, 10000x10000, then some of our current execution times may well be measured in years.

## Lake Creation

While lake creation does not have the largest execution time we'll consider it first. One potential optimization is to eliminate the detection and creation of duplicate lakes. Of the 332 lakes found in our test map 48 of those are actually duplicates. To do this we will fill in our lake height map as we find lakes instead of creating it all afterwords. This will allow us to check if we are already in a lake position both preventing duplicates and increasing our search efficiency.

#include "erosion7.h" class CErosionSample8 : public CErosionSample7 { public: int m_FillIterations; int m_TotalFillIterations; /* * Class Constructor */ CErosionSample8 () { m_FillIterations = 0; m_TotalFillIterations = 0; } /* * Override to test some optimizations. */ virtual bool FindLakeLevel (const int OrigX, const int OrigY) { bool Result; float LakeLevel; float NewLevel; m_LakeTempMap = m_HeightMap; m_LakeTempMap.SetBorderValue(2); LakeLevel = m_LakeTempMap.GetValue(OrigX, OrigY); m_pFillLakeSource = &m_LakeTempMap; m_pFillLakeOutput = &m_LakeTempMap; m_FillIterations = 0; while (LakeLevel < 1.0) { NewLevel = LakeLevel + FILL_DELTA; ++m_FillIterations; m_FoundLakeOut = false; m_LakeOutX = -1; m_LakeOutY = -1; Result = FillLake(OrigX, OrigY, NewLevel, LakeLevel); if (Result) break; LakeLevel = NewLevel; } m_TotalFillIterations += m_FillIterations; AddLake(OrigX, OrigY, LakeLevel, m_LakeOutX, m_LakeOutY); m_LakeTempMap = m_HeightMap; m_LakeTempMap.SetBorderValue(2); m_pFillLakeSource = &m_LakeTempMap; m_pFillLakeOutput = &m_LakeMap; FillLake(OrigX, OrigY, LakeLevel, -1); return (true); } /* * Override to test some optimizations. */ bool FindLakes (const int OrigX, const int OrigY) { float Value; bool Result; int X = OrigX; int Y = OrigY; /* Find the first local minimum from the original point */ do { Result = davelib::FindMinimumNeighbour(m_HeightMap, X, Y); } while (Result); /* If there is already a lake here stop */ if (m_LakeMap.GetValue(X, Y) > -1) return (true); /* Stop if we've reached sea level directly */ Value = m_HeightMap.GetValue(X, Y); if (Value <= m_SeaLevel) return (false); /* Already found this lake? */ if (FindExistingLake(X, Y) != NULL) return (true); printf("Lake (%d, %d)\n", OrigX, OrigY); FindLakeLevel(X, Y); return (true); } /* * Override to test some optimizations. */ void FindAllLakes (void) { int X, Y; davelib::CProfileTime Profile(_T("FindAllLakes()")); for (Y = 0; Y < m_HeightMap.GetHeight(); ++Y) { for (X = 0; X < m_HeightMap.GetWidth(); ++X) { /* Don't bother looking if there's already a lake found here */ if (m_LakeMap.GetValue(X, Y) > -1) continue; FindLakes(X, Y); } } } };

This results in the FindAllLakes() profile reducing from 5500 to 1600 ms in Debug builds and 160 to 120 ms in Release builds and no duplicate lakes are present. While not a huge savings for this small map it will undoubtedly be useful for larger maps with many more lakes.

Another optimization that can be tested is changing how the lake level is computed. The current method is basically:

- Find local minimum
- Increase water level a little bit and fill lake (Level += FILL_DELTA)
- If we haven't reached sea level repeat step #2

For our test height map the average number of iterations required to find the lake level is just ~4 but for large lakes it can reach beyond 50 levels (1155 total iterations for the 284 lakes in our test map). If we wanted a more accurate lake level we would have to reduce FILL_DELTA which would result in more iterations.

Rather than using a linear method we can try using a binary partition method to both reduce the number of iterations as well as increase the lake level computation. The algorithm will be essentially:

- Find local minimum
- Choose an initial FILL_DELTA (larger than used in the previous method)
- Increase fill level by FILL_DELTA
- If lake is not full continue step #3
- Reduce size of FILL_DELTA by 2 (FILL_DELTA /= 2)
- If FILL_DELTA is smaller than a set size stop
- Continue filling lake at previous unfilled level at step 3

If the original FILL_DELTA=0.001 (land heights vary from -1.0 to 1.0) then for the new binary search method:

- Choose an initial FILL_DELTA of 0.01
- Assume lake depths vary from 0.001 to 0.060 (on the test height map)
- Original number of iterations vary from 1 to 60
- Using the new binary method with an accuracy of 0.001 the number of iterations should be from 5 to 11
- Increasing the accuracy by a factor of 10 to 0.0001 the number of binary iterations will be from 8 to 14 (compared to 1 to 600 for the original method)

Whether or not the binary search method will be faster depends on the average lake depth. If most lakes are very small and shallow then the binary method might actually be slower for lower accuracy methods. Some quick tests with binary search code (not shown here) do indeed show this trend. For example, with a fill accuracy of 0.00001 it takes the iterative method 440ms to perform 103351 fill iterations while the binary method takes 220ms for only 2066 fill iterations. Unfortunately, the increased overall complexity of the binary method results in a relatively small performance increase and is actually slower for less accurate fills.

## Erosion Blending

The erosion process itself is very quick but the addition of nice blending slows it down by a factor of 200 or so. One potential reason for this is that most recent ErodePoint() method uses the slow floating point functions sqrt() and pow(). If we can avoid using these functions hopefully we might significantly reduce the blend running time but still get close to the same results.

Looking at the inner loop of the complex blend method:

Distance = sqrt((float)((Y1 - Y)*(Y1 - Y) + (X1 - X)*(X1 - X))); if (Distance > BLEND_RADIUS) continue; HeightDelta = pow(BLEND_FALLOFF, Distance) * ERODE_DELTA; m_HeightMap.SetValue(X1, Y1, m_HeightMap.GetValue(X1,Y1) - HeightDelta);

Since the *HeightDelta* parameter doesn't depend on the height map we can actually precompute all the blend parameters for a set blend size. Putting this into action:

lass CErosionSample8a : public CErosionSample8 { public: static const int BLEND_SIZE = 9; float m_ErodeBlend[BLEND_SIZE][BLEND_SIZE]; /* * Class Constructor */ CErosionSample8a () { BLEND_RADIUS = (BLEND_SIZE - 1)/2; PreComputeBlend(); } /* * Precompute the erosion blend array for faster blend times. */ virtual void PreComputeBlend (void) { int X, Y; int X1, Y1; float Distance; X1 = BLEND_RADIUS; Y1 = X1; for (Y = 0; Y < BLEND_SIZE; ++Y) { for (X = 0; X < BLEND_SIZE; ++X) { m_ErodeBlend[Y][X] = 0; Distance = sqrt((float)((Y1 - Y)*(Y1 - Y) + (X1 - X)*(X1 - X))); if (Distance > BLEND_RADIUS) continue; m_ErodeBlend[Y][X] = pow(BLEND_FALLOFF, Distance) * EROSION_DELTA; } } } /* * Override to use the optmized blend effect. */ virtual void ErodePoint (const int X, const int Y) { int X1, Y1; for (Y1 = Y - BLEND_RADIUS; Y1 <= Y + BLEND_RADIUS; ++Y1) { for (X1 = X - BLEND_RADIUS; X1 <= X + BLEND_RADIUS; ++X1) { m_HeightMap.SetValue(X1, Y1, m_HeightMap.GetValue(X1,Y1) - m_ErodeBlend[Y1 - Y + BLEND_RADIUS][X1 - X + BLEND_RADIUS]); } } } };

While this method blend result is not identical to the previous version it is significantly faster (23 to 2 secs in release builds):

In some ways this type of blending could be considered as superior as it results in a sharper river feature with nicely feathered banks (although it is difficult to tell from a gray scale height map how well it will look in 3D).

Another method to optimize the erosion blending is to perform all the blending in one step as a post process.

class CErosionSample8b : public CErosionSample8a { public: /* * Blends the given point in the erosion map to the supplied output map. */ virtual void BlendErosionMap (utils::NoiseMap& OutputMap, const int X, const int Y) { int X1, Y1; float Level; float NewHeight = 0; for (Y1 = Y - BLEND_RADIUS; Y1 <= Y + BLEND_RADIUS; ++Y1) { if (Y1 < 0 || Y1 >= m_Height) continue; for (X1 = X - BLEND_RADIUS; X1 <= X + BLEND_RADIUS; ++X1) { if (X1 < 0 || X1 >= m_Width) continue; Level = -m_ErosionMap.GetValue(X1, Y1) / EROSION_DELTA; if (Level <= 0) continue; NewHeight += Level * m_ErodeBlend[Y1 - Y + BLEND_RADIUS][X1 - X + BLEND_RADIUS]; } } OutputMap.SetValue(X, Y, m_ErosionMap.GetValue(X,Y) - NewHeight); } /* * Blends the completed erosion map. */ virtual void BlendErosionMap (void) { utils::NoiseMap TempMap; int X, Y; TempMap = m_ErosionMap; for (Y = 0; Y < m_Height; ++Y) { for (X = 0; X < m_Width; ++X) { BlendErosionMap(TempMap, X, Y); } } m_ErosionMap = TempMap; } /* * Override to provide an unblended erosion effect. */ virtual void ErodePoint (const int X, const int Y) { m_ErosionMap.SetValue(X, Y, m_ErosionMap.GetValue(X, Y) - EROSION_DELTA); } /* * Override to add the post-blend erosion effect. */ virtual void ErodeAll (void) { CErosionSample8a::ErodeAll(); BlendErosionMap(); } };

With this improvement erosion times drop even further (from 2100 to 70 ms) while the results remain similar:

## Summary

While we were only able to improve the lake detection a little speed wise, the largest improvement was the blend effect. While keeping the end result close to the original we increased performance by a factor of 330 (23.5 to 0.07 secs in release builds) by applying a few relatively simple optimizations.