Replacing URLs in databases and serialised PHP

Replacing URLs in databases and serialised PHP

Post just about everything that isn't directly related to Spring here!

Moderator: Moderators

User avatar
AF
AI Developer
Posts: 20687
Joined: 14 Sep 2004, 11:32

Replacing URLs in databases and serialised PHP

Post by AF »

In recent events, the issue of replacing URLs in the mysql databases of PHP applications, in particular WordPress, though others are applicable also, has been raised.

As this can be a frustrating and stressful issue, I think it best to save people time and hassle for the future. The issue arises because a standard search replace corrupts serialised PHP data structures in mysql tables.

Solutions

ICIT Search Replace

Built by a company I worked at, this tool comes as both a visual GUI with dry run and incremental features, and a CLI version. While aimed at WordPress, it works for any mysql database. It's well tested and recommended widely, and intended to scale to very large data sets of hundreds of thousands of replacements.

Thanks to the frontend browser interface it is by far the most convenient method, and will automatically pick up any wordpress configurations if found.

WP CLI has a search replace function. It comes as standard with most vagrant based environments and is maintained by several high profile WordPress developers with affiliations to the Core developers. It is reliable but it does not scale as well as the above mentioned tool.

WP Dictator sidesteps the need for the search replace entirely. A standard WordPress export coupled with a WP Dictator export describes the entirety of a WordPress site including its content, users, options, theme, etc. The output is standard JSON, on which standard Unix tools can search replace on. Most instances of URLs are unneeded in this setup, and the data set is more portable than an SQL dump.

Conclusion

Migrating databases and changing URLs need not be painful. if despite this you still have issues, feel free to contact myself, or visit http://wordpress.stackexchange.net ( these people were mad enough to vote me in as a community moderator in the previous elections ).

I would hope anybody doing migrations in future employs automated provisioning to avoid this issue entirely. Nobody should have to manually manipulate database strings enmasse, and nobody has to.
User avatar
Forboding Angel
Evolution RTS Developer
Posts: 14673
Joined: 17 Nov 2005, 02:43

Re: Replacing URLs in databases and serialised PHP

Post by Forboding Angel »

The company I work for currently makes use of the Interconnect tool fairly extensively as we are moving many dev environments from MediaTemple to DigitalOcean and at times the databases like to go screwy (IE someone screwed up). Very useful and reliable tool.
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Replacing URLs in databases and serialised PHP

Post by abma »

why does wordpress store absolute urls at all in db?

imo there is no need to store it in db. you could argue about performance, but thats just an excuse, if you really care about performance you have to make static html files.

very simplified, thats all code which would be needed:

Code: Select all

str_replace("URL_PREFIX", "http://springrts.com/);
in short, wordpress heavily sucks in this part.

this plugin would make much more sense:
http://wordpress.org/plugins/root-relative-urls/
User avatar
Forboding Angel
Evolution RTS Developer
Posts: 14673
Joined: 17 Nov 2005, 02:43

Re: Replacing URLs in databases and serialised PHP

Post by Forboding Angel »

Root relatives breaks a lot of things and is not suitable for a production environment.'
abma wrote:why does wordpress store absolute urls at all in db?
Many reasons. Not the least of which is SEO.
abma wrote:you could argue about performance, but thats just an excuse, if you really care about performance you have to make static html files.
nope.jpg
Data serialization offers many advantages. As more entries are added to a site, its database must necessarily grow. Databases grow in proportion to the data they contain, but they are often larger than the sum of the contained data. This is a design feature of databases, which can provide random access to any data member it contains. This is possible since databases store data in tables.

To access any field in a database, the user merely has to enter the cell coordinates via an SQL query. Databases are optimized for access speed, which provides users with prompt access to data. The trade-off is that databases take up more room than other data structures.

[..this] is simply what happens when you work in an SQL database: you have to decide between slowing down and complicating even fairly basic queries with a bunch of joins, or serializing data so that items with different fields can live in the same table (which slows down and complicates a bunch of other stuff).

Remember, we're working with tables here -- every row has to have the same columns, right down the line.

Relative URLs in a production environment are fail:
https://yoast.com/relative-urls-issues/

More info:
http://wordpress.stackexchange.com/ques ... e-database
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Replacing URLs in databases and serialised PHP

Post by abma »

read what i wrote. absoulte urls in _DB_ are fail.

still in webpages they suck as well because it makes caching very difficult if the site is accessable by http AND https AND IP.

for SEO absolute urls are maybe important, but technically they are very bad as we absite should be always accessable by ip/http/https. its also bad that you need to modify database contens when having a test and productive environment.

its bad software when processes have to be changed to use it. it should be vice versa: change the software so the process can be the same.

this is a BIG limitation in wordpress.

also relative you have to distingish when using the word "relative urls", i mean root-relative urls! 100% relative urls mostly makes no sense / big problems. (=don't use ../ in urls)
User avatar
Forboding Angel
Evolution RTS Developer
Posts: 14673
Joined: 17 Nov 2005, 02:43

Re: Replacing URLs in databases and serialised PHP

Post by Forboding Angel »

Why on earth are you caching https content?

It isn't a "huge limitation". For example, with root relative plugin, just disable it on live site, and enable on dev staging. It's not particularly complicated.

As for the rest, I'll let AF explain it to you.
100Gbps
Posts: 74
Joined: 30 Jan 2009, 13:19

Re: Replacing URLs in databases and serialised PHP

Post by 100Gbps »

abma wrote:

Code: Select all

str_replace("URL_PREFIX", "http://springrts.com/);
"URL_PREFIX" can be (accidentaly) written as "uRL_PREFIX"*, so 'str_ireplace()' is better.

* If URL_PREFIX is not some fancy reserved word (array with url prefixes), added in standard PHP. I can't find it in official docs.
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Replacing URLs in databases and serialised PHP

Post by abma »

Forboding Angel wrote:Why on earth are you caching https content?
to reduce server load? its very expensive when all stuff has to be computed always.


Forboding Angel wrote:It isn't a "huge limitation". For example, with root relative plugin, just disable it on live site, and enable on dev staging. It's not particularly complicated.
it is! exactly because of that. you can't use the same stuff on test and dev site which means you will apply untested changes which leads to unexpected results as well.
User avatar
Forboding Angel
Evolution RTS Developer
Posts: 14673
Joined: 17 Nov 2005, 02:43

Re: Replacing URLs in databases and serialised PHP

Post by Forboding Angel »

Can't test same stuff? Are you high? It's a plugin. You click enable, you click disable. Everything is exactly the same... Plugin is enabled on dev side, disabled on live side. It's not rocket science and it only has to be set a single time. Everything else stays the same.

Untested changes? You do not understand what you're talking about.

Additionally, you don't cache https content serverside. That happens clientside.
Last edited by Forboding Angel on 04 May 2014, 03:43, edited 1 time in total.
User avatar
Silentwings
Posts: 3720
Joined: 25 Oct 2008, 00:23

Re: Replacing URLs in databases and serialised PHP

Post by Silentwings »

Everything is exactly the same.... Everything else stays the same.
Come on - I don't really want to get involved in this discussion but abma makes a perfectly valid point about not applying untested changes, and (afaics) your two statements there directly contradict each other.
User avatar
Forboding Angel
Evolution RTS Developer
Posts: 14673
Joined: 17 Nov 2005, 02:43

Re: Replacing URLs in databases and serialised PHP

Post by Forboding Angel »

Silentwings, a plugin is activated via the plugin menu. All the plugin in question does is rewrite urls for dev environments.

On the live side of things, you leave it deactivated. On the Dev side of things, you leave it activated. It really is that simple.

The entire plugin is a single file: http://pastebin.com/NuVCkQZC

That way, your test and live environments are identical. There is no "applying untested changes" and the very assertion is nonsensical.
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Replacing URLs in databases and serialised PHP

Post by abma »

Forboding Angel wrote: On the live side of things, you leave it deactivated. On the Dev side of things, you leave it activated. It really is that simple.

That way, your test and live environments are identical. There is no "applying untested changes" and the very assertion is nonsensical.
wrong, because of that, they are not identical! seems we need to discuss about the definition of "identical" :'-(

http://en.wiktionary.org/wiki/identical
User avatar
Forboding Angel
Evolution RTS Developer
Posts: 14673
Joined: 17 Nov 2005, 02:43

Re: Replacing URLs in databases and serialised PHP

Post by Forboding Angel »

The difference is quite literally a Boolean in the database. Yeah, for all intents and purposes and anything that matters at all, it is identical.

Come on dude... Myself and AF work for companies using this exact same method creating sites for multimillion dollar corporations (some of the ones I've done recently are likely household names to you) and somehow or another the spring site is what... So special that the same methods used by large design firms somehow don't apply anymore?

Are you, in actuality, trying to make that argument?

It is identical. The content doesn't change one iota. The only difference is that on the dev side it uses a relative url and on the live side it uses and absolute url. The content is exactly the same and literally nothing changes between live/dev.
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Replacing URLs in databases and serialised PHP

Post by abma »

A wrong boolean could kill people! They are not identical!
malric
Posts: 521
Joined: 30 Dec 2005, 22:22

Re: Replacing URLs in databases and serialised PHP

Post by malric »

Forboding Angel wrote:Come on dude... Myself and AF work for companies using this exact same method creating sites for multimillion dollar corporations (some of the ones I've done recently are likely household names to you) and somehow or another the spring site is what... So special that the same methods used by large design firms somehow don't apply anymore?
Are you seriously using this as an argument? Didn't you notice the occasional fails of "multi-million dollar" corporations?

Thanks for explaining how stuff works, but I still think that ideally you should not have absolute url-s in the database. One simple example is what will you do if you want to change domain? Lets say from springrts.com to springrts.org? In my opinion it would have been nicer if the URL-s would always be re-written based on some root URL (so, same as what the plugin does, but then it would be a configuration parameter, not an "addon"). Do you have any argument why something like that would not be good?
User avatar
smoth
Posts: 22309
Joined: 13 Jan 2005, 00:46

Re: Replacing URLs in databases and serialised PHP

Post by smoth »

I think he is saying that it is a test and proven method.

Whatever
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Replacing URLs in databases and serialised PHP

Post by abma »

smoth wrote:I think he is saying that it is a test and proven method.
then he should say that. i don't have a crystal-ball which allows me to read other minds.

still its bad that the idenitcal db of wordpress can't be used for dev and productive sites. this breaks our current process of testing & applying changes.

also i see it as big disadvantage of wordpress that it can't be easily accessed by ip and http and https urls at the same time without changing configuration (without the plugin). this plugin is a must-have.
User avatar
AF
AI Developer
Posts: 20687
Joined: 14 Sep 2004, 11:32

Re: Replacing URLs in databases and serialised PHP

Post by AF »

also i see it as big disadvantage of wordpress that it can't be easily accessed by ip and http and https urls at the same time without changing configuration (without the plugin). this plugin is a must-have.
It's undesirable to have any site accessible directly via IP, but if it was, what you said would not be true.

There are also millions of WordPress websites handling both http and https traffic from the same installation, so there is strong evidence that the rest of what you said is not true.
to reduce server load? its very expensive when all stuff has to be computed always.
It is a horrendous idea to cache most HTTPS pages in full. https connections imply that secure data is being transferred or that personal data is being transferred, imagine logging in over https and seeing MY private messages instead of your own, does that sound logical?

What you actually want is object caching, and other forms of fragment caching. Full page caching should not be applied to https connections.

To add to this, full page caching over https is more difficult, for example, Varnish does not distinguish between http and https, and requires a 3rd box sitting in front acting as a proxy to forward http connections to the varnish cache and https to the original box or load balancers.

So no, for security, data protection, and practical reasons, caching full HTTPS pages is a bad idea. Cache objects and data at the PHP level instead. It's on the same level of insecurity as sharing a database, once someone breaks in they have access to everything, imagine if you visited an admin dashboard page shortly after the site administrator had been there and saw the full admin interface with all the stats and data?
str_replace("URL_PREFIX", "http://springrts.com/);
This, and the whole URL in the database thing...

1. There are constants you can put in the wp-config.php file that set the base URL and install URL manually
2. It's not always possibly to figure out where the installation is located URL-wise depending on where execution started on the server. For historical reasons not everything is routed through index.php, although most things are, and new stuff is.
3. There is no performance cost here, because the site URL is pulled in from the options table, and is marked as auto-loaded. This means WordPress pulls it in amongst other options on every page load. There are optimisations such as the wp-config.php change, and database caching that improve this, but any half decent MySQL installation will have cached that query, as will any decent object cache.
4. Not every WordPress installation has a single URL. Would you seriously expect an installation such as wordpress.com with millions of sites on a single installation to use str_replace?
5. I would argue a single query to the database ( which as I said is not what happens, the data is pulled in along with the rest of the auto-loaded options ), would be faster on a well set up system, than continuous string replacement searches, which would be a sloppy method of doing things
6. Links between pages and posts already sidestep this issue entirely and work fine moving from one site to another, as they have the necessary meta data if added correctly to account for this, as well as slug changes etc
7. A lot of plugins add direct references to URLs, these are poorly built plugins that generally aren't popular, that should have used Post IDs instead.

To the idea that you should be able to use the same database for test and production, I say that is also an idea with impracticalities. You should be able to throw away your test environment and replace it on a whim.

That something as simple as a search replace of serialised data is such an issue indicates:

1. A lack of a build process
2. Poorly built code
3. Poor content and data exports
it is! exactly because of that. you can't use the same stuff on test and dev site which means you will apply untested changes which leads to unexpected results as well.
This is a self defeating point, because the moment you change the URL the site is installed at you've changed the environment, regardless of the need for any database change.
which leads to unexpected results as well.
This in particular suggests the tests pass because you're applying a particular set of data, not because what has been built is flexible and robust enough to fail gracefully and handle unexpected data ( data which may be entered into the live site at a future data )
this is a BIG limitation in wordpress.
No, as I mentioned, this is actually an advantage, and it's pretty much a requirement in any situation with a single installation serving multiple domains, especially in situations where new domains can be added from the front end.


------------

Migrations in WordPress are something that happen on a regular basis with a large amount of tooling and methods to doing it. As somebody who has moved hundreds of sites, everything from a simple export and import, setting up virtual machines, deploying websites, or consolidating as many as 50 out of date installations into a single updated Multi-site installation via bash scripting, the URL search replace is not the most difficult part.

Keeping in mind WordPress is not a brand new project, perhaps you would have seen fit to do some research?

https://core.trac.wordpress.org/ticket/17048

Regarding relative URLs Andrew Nacin, core developer who lead several releases:
Root-relative URLs aren't really proper. /path/ might not be WordPress, it might be outside of the install. So really it's not much different than an absolute URL.

Any relative URLs also make it significantly more difficult to perform transformations when the install is moved. The find-replace is going to be necessary in most situations, and having an absolute URL is ironically more portable for those reasons.

As you said, absolute URLs are needed in numerous other places. Needing to add these in conditionally will add to processing, as well as introduce potential bugs (and incompatibilities with plugins).
Unfortunately, the opinion of those threads are skewed in the direction you mention due to most of them not really understanding the problem or the design decision (nor are most of them active core contributors or core developers). I've tried to cover a bit of the design decision on wp-hackers before, and I've been able to locate this reply of mine (the first part, in particular):
http://lists.automattic.com/pipermail/w ... 36195.html
Simply using relative URLs is not appropriate. Especially when the user can use wp-config.php settings to move around the locations of uploads, plugins, themes, admin panel, etc, sometimes to completely new domains entirely.

Also keep in mind WordPress is not the only major CMS to store the URL in the database. Perhaps then you need to ask why?

If it really is an open shut case as you suggest, then why have you not opened a Trac ticket on WP Core saying so and suggesting what be done and gotten props on the WordPress 4.0 release? The time and thanks of all those people who you say are struggling with the consequences of this?

Exactly how would we handle multiple sites and a single installation that have different content on one site to another? How would we determine which is why without a base URL to match against in the sites and blogs table? What about the scenarios currently catered for by the domain mapping plugin? Will I need installs for each and every site now?

Or perhaps there are good practical reasons for having the URL in the database

I recommend you do the following:

1. A question on WordPress stack exchange asking why the URL of the root of the WP install must be saved in the database, and why it can't just be defined in wp-config.php
2. Ask core developers on twitter, they're usually friendly and nice to reply. Except Otto. Otto does not tolerate fools
3. Join IRC and enter the #wordpress room on freenode, or the #wordpress-dev room, and ask. A smattering of mid to high level developers and core developers, including project leads, will be able to see your question. Perhaps the person who originally wrote the code can explain their reasoning to you?

This way you can get a more varied response. The WordPress developers are not stupid, and the project has a use for these URLs in the database ( and as I mentioned with wp-config.php constants, methods of providing alternatives ).
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Replacing URLs in databases and serialised PHP

Post by abma »

who should ever read this post? (TL;DR)
1. A question on WordPress stack exchange asking why the URL of the root of the WP install must be saved in the database, and why it can't just be defined in wp-config.php
2. Ask core developers on twitter, they're usually friendly and nice to reply. Except Otto. Otto does not tolerate fools
3. Join IRC and enter the #wordpress room on freenode, or the #wordpress-dev room, and ask. A smattering of mid to high level developers and core developers, including project leads, will be able to see your question. Perhaps the person who originally wrote the code can explain their reasoning to you?
why should i do that? i explained my POV what wordpress is doing wrong. to use it for spring, the process of applying changes to the website has to be changed, this is bad.

related to the trac ticked: i've read this trac thing already. you maybe have seen that there are MANY responses to it, the general stance there is imo: "yeah, we should fix that, but it would break to many stuff".

replacing text in db is not KISS.

@AF, Forb
also keep in mind, you are the experts related to wordpress. if you are unable to solve such isses / find a acceptable solution then i don't see us setting up a wordpress site for springrts.com.
malric
Posts: 521
Joined: 30 Dec 2005, 22:22

Re: Replacing URLs in databases and serialised PHP

Post by malric »

AF wrote:Also keep in mind WordPress is not the only major CMS to store the URL in the database. Perhaps then you need to ask why?.
I interacted with Joomla and my impression it is that it does things differently. References:
http://www.inmotionhosting.com/support/ ... la-website (one line change to move domain name)
http://codex.wordpress.org/Moving_WordPress (use a script or an online tool)

So, for me, Joomla's method sounds simpler (and I do not care how they implement it). Maybe there are some complex cases where would you want to do what Wordpress does but for simple cases I would favor the simples method.

Personally, based on the headaches I had with Wordpress and Joomla, I would choose Joomla, but the difference in headaches was only marginal :wink: . I am sure it depends a lot from case to case, if you have training and so on. Edit: of course, what I did were small/simple cases, nothing on the scale of millions of websites. I am just a casual user of these things.
Post Reply

Return to “Off Topic Discussion”