Your argument about it looking obviously tiled seem odd when you're fine with using very large textures.
Not really. You're confused about the issue.
Look, with SMF 2.0, you'd load up your textures, call the shader, and run one pass over the geometry. Your main slowdown's texture loading time across the bus, which can be reduced using PBO.
With SM3, you'd load up your textures per layer, call the shader, and run a pass over the geometry. Then you'd do it again. And again. And again. Why? Because you can't just throw unlimited textures at GPUs running GLSL. IIRC, I hit the limit at 16 while working on P.O.P.S. (this is hardware-dependent, so welcome to driver hell btw).
For a 20-layer SM3, my theoretical bare-minimum where it would start not looking like ass, you'd have 80 texture loading calls, if you want the full treatment- diffuse, specular/glow/reflectivity, normalmap/depthmap, and 20 8-bit grayscales to give you interpolation values per layer. 80, vs. 12.
The number of calls matters, btw. I tested that, and regardless of the texture size, it still matters. IIRC, it's something to do with memory handling, but tbh, I couldn't follow it, all I know is that it actually matters and that it's testable- hence why I've been doing more stuff with atlases lately, btw, it's starting to make more sense.
You could cut it to two main tiles, and just keep specularity in texture1, but I really think we need all three, and possibly another 8-bit channel for advanced water, but that's another subject. So, 60 textures. 20 of which need to be heightmap -1, btw. So they may be larger than 1024!!!
So, how many times are we re-rendering the geometry? A lot, because otherwise the GLSL won't compile on most end-users' hardware. That's a huge amount of load on the GPU side of things, even if we're maybe saving some time moving the textures across the bus. Hence why fillrate will kill you in the end.
Meh. Smoth, go build a test SM3, using the current stuff, with 20 layers with different tiles and normalmaps. Just see it, it doesn't require anything more than some colored noise per tile, same for the blenders.
Look, just to prove my point that I'm actually interested in being helpful, though, here's a free idea:
Instead of having tiles loaded per rendering pass, reduce the number of gl.texture calls by slicing up the blend textures into sectors and atlasing them. This obviously would put an upper limit on the number of blending levels that could be used, but that can get really high, depending on how many slices you use.
Make it a requirement that all tiles be the same size, and atlas them as well at runtime. Then you can reduce the number of texture calls considerably- three-four blenders per visible sector, maybe 15 tile atlases, depending on how many blend layers and the size of the tiles and the size of the atlases you want to throw across the bus. That's still a lot larger load than SMF 2.0, but it would be a considerable improvement over loose tiles, and atlases are fairly easy to code (although the GLSL side would not be so fun, I ran into that with P.O.P.S.... but that's another story).
The only major problem with this is determining which blender sectors are in POV, but that's something where SMF's source should be very helpful.
Anyhow, that should help, once you get the basics working, in terms of optimizing it. IDK whether JC used that strategy for SM3 or not, but it seems like a good way to reduce the load across the bus each frame to me.