Replacing URLs in databases and serialised PHP
Moderator: Moderators
Replacing URLs in databases and serialised PHP
In recent events, the issue of replacing URLs in the mysql databases of PHP applications, in particular WordPress, though others are applicable also, has been raised.
As this can be a frustrating and stressful issue, I think it best to save people time and hassle for the future. The issue arises because a standard search replace corrupts serialised PHP data structures in mysql tables.
Solutions
ICIT Search Replace
Built by a company I worked at, this tool comes as both a visual GUI with dry run and incremental features, and a CLI version. While aimed at WordPress, it works for any mysql database. It's well tested and recommended widely, and intended to scale to very large data sets of hundreds of thousands of replacements.
Thanks to the frontend browser interface it is by far the most convenient method, and will automatically pick up any wordpress configurations if found.
WP CLI has a search replace function. It comes as standard with most vagrant based environments and is maintained by several high profile WordPress developers with affiliations to the Core developers. It is reliable but it does not scale as well as the above mentioned tool.
WP Dictator sidesteps the need for the search replace entirely. A standard WordPress export coupled with a WP Dictator export describes the entirety of a WordPress site including its content, users, options, theme, etc. The output is standard JSON, on which standard Unix tools can search replace on. Most instances of URLs are unneeded in this setup, and the data set is more portable than an SQL dump.
Conclusion
Migrating databases and changing URLs need not be painful. if despite this you still have issues, feel free to contact myself, or visit http://wordpress.stackexchange.net ( these people were mad enough to vote me in as a community moderator in the previous elections ).
I would hope anybody doing migrations in future employs automated provisioning to avoid this issue entirely. Nobody should have to manually manipulate database strings enmasse, and nobody has to.
As this can be a frustrating and stressful issue, I think it best to save people time and hassle for the future. The issue arises because a standard search replace corrupts serialised PHP data structures in mysql tables.
Solutions
ICIT Search Replace
Built by a company I worked at, this tool comes as both a visual GUI with dry run and incremental features, and a CLI version. While aimed at WordPress, it works for any mysql database. It's well tested and recommended widely, and intended to scale to very large data sets of hundreds of thousands of replacements.
Thanks to the frontend browser interface it is by far the most convenient method, and will automatically pick up any wordpress configurations if found.
WP CLI has a search replace function. It comes as standard with most vagrant based environments and is maintained by several high profile WordPress developers with affiliations to the Core developers. It is reliable but it does not scale as well as the above mentioned tool.
WP Dictator sidesteps the need for the search replace entirely. A standard WordPress export coupled with a WP Dictator export describes the entirety of a WordPress site including its content, users, options, theme, etc. The output is standard JSON, on which standard Unix tools can search replace on. Most instances of URLs are unneeded in this setup, and the data set is more portable than an SQL dump.
Conclusion
Migrating databases and changing URLs need not be painful. if despite this you still have issues, feel free to contact myself, or visit http://wordpress.stackexchange.net ( these people were mad enough to vote me in as a community moderator in the previous elections ).
I would hope anybody doing migrations in future employs automated provisioning to avoid this issue entirely. Nobody should have to manually manipulate database strings enmasse, and nobody has to.
- Forboding Angel
- Evolution RTS Developer
- Posts: 14673
- Joined: 17 Nov 2005, 02:43
Re: Replacing URLs in databases and serialised PHP
The company I work for currently makes use of the Interconnect tool fairly extensively as we are moving many dev environments from MediaTemple to DigitalOcean and at times the databases like to go screwy (IE someone screwed up). Very useful and reliable tool.
Re: Replacing URLs in databases and serialised PHP
why does wordpress store absolute urls at all in db?
imo there is no need to store it in db. you could argue about performance, but thats just an excuse, if you really care about performance you have to make static html files.
very simplified, thats all code which would be needed:
in short, wordpress heavily sucks in this part.
this plugin would make much more sense:
http://wordpress.org/plugins/root-relative-urls/
imo there is no need to store it in db. you could argue about performance, but thats just an excuse, if you really care about performance you have to make static html files.
very simplified, thats all code which would be needed:
Code: Select all
str_replace("URL_PREFIX", "http://springrts.com/);
this plugin would make much more sense:
http://wordpress.org/plugins/root-relative-urls/
- Forboding Angel
- Evolution RTS Developer
- Posts: 14673
- Joined: 17 Nov 2005, 02:43
Re: Replacing URLs in databases and serialised PHP
Root relatives breaks a lot of things and is not suitable for a production environment.'
Relative URLs in a production environment are fail:
https://yoast.com/relative-urls-issues/
More info:
http://wordpress.stackexchange.com/ques ... e-database
Many reasons. Not the least of which is SEO.abma wrote:why does wordpress store absolute urls at all in db?
nope.jpgabma wrote:you could argue about performance, but thats just an excuse, if you really care about performance you have to make static html files.
Data serialization offers many advantages. As more entries are added to a site, its database must necessarily grow. Databases grow in proportion to the data they contain, but they are often larger than the sum of the contained data. This is a design feature of databases, which can provide random access to any data member it contains. This is possible since databases store data in tables.
To access any field in a database, the user merely has to enter the cell coordinates via an SQL query. Databases are optimized for access speed, which provides users with prompt access to data. The trade-off is that databases take up more room than other data structures.
[..this] is simply what happens when you work in an SQL database: you have to decide between slowing down and complicating even fairly basic queries with a bunch of joins, or serializing data so that items with different fields can live in the same table (which slows down and complicates a bunch of other stuff).
Remember, we're working with tables here -- every row has to have the same columns, right down the line.
Relative URLs in a production environment are fail:
https://yoast.com/relative-urls-issues/
More info:
http://wordpress.stackexchange.com/ques ... e-database
Re: Replacing URLs in databases and serialised PHP
read what i wrote. absoulte urls in _DB_ are fail.
still in webpages they suck as well because it makes caching very difficult if the site is accessable by http AND https AND IP.
for SEO absolute urls are maybe important, but technically they are very bad as we absite should be always accessable by ip/http/https. its also bad that you need to modify database contens when having a test and productive environment.
its bad software when processes have to be changed to use it. it should be vice versa: change the software so the process can be the same.
this is a BIG limitation in wordpress.
also relative you have to distingish when using the word "relative urls", i mean root-relative urls! 100% relative urls mostly makes no sense / big problems. (=don't use ../ in urls)
still in webpages they suck as well because it makes caching very difficult if the site is accessable by http AND https AND IP.
for SEO absolute urls are maybe important, but technically they are very bad as we absite should be always accessable by ip/http/https. its also bad that you need to modify database contens when having a test and productive environment.
its bad software when processes have to be changed to use it. it should be vice versa: change the software so the process can be the same.
this is a BIG limitation in wordpress.
also relative you have to distingish when using the word "relative urls", i mean root-relative urls! 100% relative urls mostly makes no sense / big problems. (=don't use ../ in urls)
- Forboding Angel
- Evolution RTS Developer
- Posts: 14673
- Joined: 17 Nov 2005, 02:43
Re: Replacing URLs in databases and serialised PHP
Why on earth are you caching https content?
It isn't a "huge limitation". For example, with root relative plugin, just disable it on live site, and enable on dev staging. It's not particularly complicated.
As for the rest, I'll let AF explain it to you.
It isn't a "huge limitation". For example, with root relative plugin, just disable it on live site, and enable on dev staging. It's not particularly complicated.
As for the rest, I'll let AF explain it to you.
Re: Replacing URLs in databases and serialised PHP
"URL_PREFIX" can be (accidentaly) written as "uRL_PREFIX"*, so 'str_ireplace()' is better.abma wrote:Code: Select all
str_replace("URL_PREFIX", "http://springrts.com/);
* If URL_PREFIX is not some fancy reserved word (array with url prefixes), added in standard PHP. I can't find it in official docs.
Re: Replacing URLs in databases and serialised PHP
to reduce server load? its very expensive when all stuff has to be computed always.Forboding Angel wrote:Why on earth are you caching https content?
it is! exactly because of that. you can't use the same stuff on test and dev site which means you will apply untested changes which leads to unexpected results as well.Forboding Angel wrote:It isn't a "huge limitation". For example, with root relative plugin, just disable it on live site, and enable on dev staging. It's not particularly complicated.
- Forboding Angel
- Evolution RTS Developer
- Posts: 14673
- Joined: 17 Nov 2005, 02:43
Re: Replacing URLs in databases and serialised PHP
Can't test same stuff? Are you high? It's a plugin. You click enable, you click disable. Everything is exactly the same... Plugin is enabled on dev side, disabled on live side. It's not rocket science and it only has to be set a single time. Everything else stays the same.
Untested changes? You do not understand what you're talking about.
Additionally, you don't cache https content serverside. That happens clientside.
Untested changes? You do not understand what you're talking about.
Additionally, you don't cache https content serverside. That happens clientside.
Last edited by Forboding Angel on 04 May 2014, 03:43, edited 1 time in total.
- Silentwings
- Posts: 3720
- Joined: 25 Oct 2008, 00:23
Re: Replacing URLs in databases and serialised PHP
Come on - I don't really want to get involved in this discussion but abma makes a perfectly valid point about not applying untested changes, and (afaics) your two statements there directly contradict each other.Everything is exactly the same.... Everything else stays the same.
- Forboding Angel
- Evolution RTS Developer
- Posts: 14673
- Joined: 17 Nov 2005, 02:43
Re: Replacing URLs in databases and serialised PHP
Silentwings, a plugin is activated via the plugin menu. All the plugin in question does is rewrite urls for dev environments.
On the live side of things, you leave it deactivated. On the Dev side of things, you leave it activated. It really is that simple.
The entire plugin is a single file: http://pastebin.com/NuVCkQZC
That way, your test and live environments are identical. There is no "applying untested changes" and the very assertion is nonsensical.
On the live side of things, you leave it deactivated. On the Dev side of things, you leave it activated. It really is that simple.
The entire plugin is a single file: http://pastebin.com/NuVCkQZC
That way, your test and live environments are identical. There is no "applying untested changes" and the very assertion is nonsensical.
Re: Replacing URLs in databases and serialised PHP
wrong, because of that, they are not identical! seems we need to discuss about the definition of "identical" :'-(Forboding Angel wrote: On the live side of things, you leave it deactivated. On the Dev side of things, you leave it activated. It really is that simple.
That way, your test and live environments are identical. There is no "applying untested changes" and the very assertion is nonsensical.
http://en.wiktionary.org/wiki/identical
- Forboding Angel
- Evolution RTS Developer
- Posts: 14673
- Joined: 17 Nov 2005, 02:43
Re: Replacing URLs in databases and serialised PHP
The difference is quite literally a Boolean in the database. Yeah, for all intents and purposes and anything that matters at all, it is identical.
Come on dude... Myself and AF work for companies using this exact same method creating sites for multimillion dollar corporations (some of the ones I've done recently are likely household names to you) and somehow or another the spring site is what... So special that the same methods used by large design firms somehow don't apply anymore?
Are you, in actuality, trying to make that argument?
It is identical. The content doesn't change one iota. The only difference is that on the dev side it uses a relative url and on the live side it uses and absolute url. The content is exactly the same and literally nothing changes between live/dev.
Come on dude... Myself and AF work for companies using this exact same method creating sites for multimillion dollar corporations (some of the ones I've done recently are likely household names to you) and somehow or another the spring site is what... So special that the same methods used by large design firms somehow don't apply anymore?
Are you, in actuality, trying to make that argument?
It is identical. The content doesn't change one iota. The only difference is that on the dev side it uses a relative url and on the live side it uses and absolute url. The content is exactly the same and literally nothing changes between live/dev.
Re: Replacing URLs in databases and serialised PHP
A wrong boolean could kill people! They are not identical!
Re: Replacing URLs in databases and serialised PHP
Are you seriously using this as an argument? Didn't you notice the occasional fails of "multi-million dollar" corporations?Forboding Angel wrote:Come on dude... Myself and AF work for companies using this exact same method creating sites for multimillion dollar corporations (some of the ones I've done recently are likely household names to you) and somehow or another the spring site is what... So special that the same methods used by large design firms somehow don't apply anymore?
Thanks for explaining how stuff works, but I still think that ideally you should not have absolute url-s in the database. One simple example is what will you do if you want to change domain? Lets say from springrts.com to springrts.org? In my opinion it would have been nicer if the URL-s would always be re-written based on some root URL (so, same as what the plugin does, but then it would be a configuration parameter, not an "addon"). Do you have any argument why something like that would not be good?
Re: Replacing URLs in databases and serialised PHP
I think he is saying that it is a test and proven method.
Whatever
Whatever
Re: Replacing URLs in databases and serialised PHP
then he should say that. i don't have a crystal-ball which allows me to read other minds.smoth wrote:I think he is saying that it is a test and proven method.
still its bad that the idenitcal db of wordpress can't be used for dev and productive sites. this breaks our current process of testing & applying changes.
also i see it as big disadvantage of wordpress that it can't be easily accessed by ip and http and https urls at the same time without changing configuration (without the plugin). this plugin is a must-have.
Re: Replacing URLs in databases and serialised PHP
It's undesirable to have any site accessible directly via IP, but if it was, what you said would not be true.also i see it as big disadvantage of wordpress that it can't be easily accessed by ip and http and https urls at the same time without changing configuration (without the plugin). this plugin is a must-have.
There are also millions of WordPress websites handling both http and https traffic from the same installation, so there is strong evidence that the rest of what you said is not true.
It is a horrendous idea to cache most HTTPS pages in full. https connections imply that secure data is being transferred or that personal data is being transferred, imagine logging in over https and seeing MY private messages instead of your own, does that sound logical?to reduce server load? its very expensive when all stuff has to be computed always.
What you actually want is object caching, and other forms of fragment caching. Full page caching should not be applied to https connections.
To add to this, full page caching over https is more difficult, for example, Varnish does not distinguish between http and https, and requires a 3rd box sitting in front acting as a proxy to forward http connections to the varnish cache and https to the original box or load balancers.
So no, for security, data protection, and practical reasons, caching full HTTPS pages is a bad idea. Cache objects and data at the PHP level instead. It's on the same level of insecurity as sharing a database, once someone breaks in they have access to everything, imagine if you visited an admin dashboard page shortly after the site administrator had been there and saw the full admin interface with all the stats and data?
This, and the whole URL in the database thing...str_replace("URL_PREFIX", "http://springrts.com/);
1. There are constants you can put in the wp-config.php file that set the base URL and install URL manually
2. It's not always possibly to figure out where the installation is located URL-wise depending on where execution started on the server. For historical reasons not everything is routed through index.php, although most things are, and new stuff is.
3. There is no performance cost here, because the site URL is pulled in from the options table, and is marked as auto-loaded. This means WordPress pulls it in amongst other options on every page load. There are optimisations such as the wp-config.php change, and database caching that improve this, but any half decent MySQL installation will have cached that query, as will any decent object cache.
4. Not every WordPress installation has a single URL. Would you seriously expect an installation such as wordpress.com with millions of sites on a single installation to use str_replace?
5. I would argue a single query to the database ( which as I said is not what happens, the data is pulled in along with the rest of the auto-loaded options ), would be faster on a well set up system, than continuous string replacement searches, which would be a sloppy method of doing things
6. Links between pages and posts already sidestep this issue entirely and work fine moving from one site to another, as they have the necessary meta data if added correctly to account for this, as well as slug changes etc
7. A lot of plugins add direct references to URLs, these are poorly built plugins that generally aren't popular, that should have used Post IDs instead.
To the idea that you should be able to use the same database for test and production, I say that is also an idea with impracticalities. You should be able to throw away your test environment and replace it on a whim.
That something as simple as a search replace of serialised data is such an issue indicates:
1. A lack of a build process
2. Poorly built code
3. Poor content and data exports
This is a self defeating point, because the moment you change the URL the site is installed at you've changed the environment, regardless of the need for any database change.it is! exactly because of that. you can't use the same stuff on test and dev site which means you will apply untested changes which leads to unexpected results as well.
This in particular suggests the tests pass because you're applying a particular set of data, not because what has been built is flexible and robust enough to fail gracefully and handle unexpected data ( data which may be entered into the live site at a future data )which leads to unexpected results as well.
No, as I mentioned, this is actually an advantage, and it's pretty much a requirement in any situation with a single installation serving multiple domains, especially in situations where new domains can be added from the front end.this is a BIG limitation in wordpress.
------------
Migrations in WordPress are something that happen on a regular basis with a large amount of tooling and methods to doing it. As somebody who has moved hundreds of sites, everything from a simple export and import, setting up virtual machines, deploying websites, or consolidating as many as 50 out of date installations into a single updated Multi-site installation via bash scripting, the URL search replace is not the most difficult part.
Keeping in mind WordPress is not a brand new project, perhaps you would have seen fit to do some research?
https://core.trac.wordpress.org/ticket/17048
Regarding relative URLs Andrew Nacin, core developer who lead several releases:
Simply using relative URLs is not appropriate. Especially when the user can use wp-config.php settings to move around the locations of uploads, plugins, themes, admin panel, etc, sometimes to completely new domains entirely.Root-relative URLs aren't really proper. /path/ might not be WordPress, it might be outside of the install. So really it's not much different than an absolute URL.
Any relative URLs also make it significantly more difficult to perform transformations when the install is moved. The find-replace is going to be necessary in most situations, and having an absolute URL is ironically more portable for those reasons.
As you said, absolute URLs are needed in numerous other places. Needing to add these in conditionally will add to processing, as well as introduce potential bugs (and incompatibilities with plugins).
Unfortunately, the opinion of those threads are skewed in the direction you mention due to most of them not really understanding the problem or the design decision (nor are most of them active core contributors or core developers). I've tried to cover a bit of the design decision on wp-hackers before, and I've been able to locate this reply of mine (the first part, in particular):
http://lists.automattic.com/pipermail/w ... 36195.html
Also keep in mind WordPress is not the only major CMS to store the URL in the database. Perhaps then you need to ask why?
If it really is an open shut case as you suggest, then why have you not opened a Trac ticket on WP Core saying so and suggesting what be done and gotten props on the WordPress 4.0 release? The time and thanks of all those people who you say are struggling with the consequences of this?
Exactly how would we handle multiple sites and a single installation that have different content on one site to another? How would we determine which is why without a base URL to match against in the sites and blogs table? What about the scenarios currently catered for by the domain mapping plugin? Will I need installs for each and every site now?
Or perhaps there are good practical reasons for having the URL in the database
I recommend you do the following:
1. A question on WordPress stack exchange asking why the URL of the root of the WP install must be saved in the database, and why it can't just be defined in wp-config.php
2. Ask core developers on twitter, they're usually friendly and nice to reply. Except Otto. Otto does not tolerate fools
3. Join IRC and enter the #wordpress room on freenode, or the #wordpress-dev room, and ask. A smattering of mid to high level developers and core developers, including project leads, will be able to see your question. Perhaps the person who originally wrote the code can explain their reasoning to you?
This way you can get a more varied response. The WordPress developers are not stupid, and the project has a use for these URLs in the database ( and as I mentioned with wp-config.php constants, methods of providing alternatives ).
Re: Replacing URLs in databases and serialised PHP
who should ever read this post? (TL;DR)
related to the trac ticked: i've read this trac thing already. you maybe have seen that there are MANY responses to it, the general stance there is imo: "yeah, we should fix that, but it would break to many stuff".
replacing text in db is not KISS.
@AF, Forb
also keep in mind, you are the experts related to wordpress. if you are unable to solve such isses / find a acceptable solution then i don't see us setting up a wordpress site for springrts.com.
why should i do that? i explained my POV what wordpress is doing wrong. to use it for spring, the process of applying changes to the website has to be changed, this is bad.1. A question on WordPress stack exchange asking why the URL of the root of the WP install must be saved in the database, and why it can't just be defined in wp-config.php
2. Ask core developers on twitter, they're usually friendly and nice to reply. Except Otto. Otto does not tolerate fools
3. Join IRC and enter the #wordpress room on freenode, or the #wordpress-dev room, and ask. A smattering of mid to high level developers and core developers, including project leads, will be able to see your question. Perhaps the person who originally wrote the code can explain their reasoning to you?
related to the trac ticked: i've read this trac thing already. you maybe have seen that there are MANY responses to it, the general stance there is imo: "yeah, we should fix that, but it would break to many stuff".
replacing text in db is not KISS.
@AF, Forb
also keep in mind, you are the experts related to wordpress. if you are unable to solve such isses / find a acceptable solution then i don't see us setting up a wordpress site for springrts.com.
Re: Replacing URLs in databases and serialised PHP
I interacted with Joomla and my impression it is that it does things differently. References:AF wrote:Also keep in mind WordPress is not the only major CMS to store the URL in the database. Perhaps then you need to ask why?.
http://www.inmotionhosting.com/support/ ... la-website (one line change to move domain name)
http://codex.wordpress.org/Moving_WordPress (use a script or an online tool)
So, for me, Joomla's method sounds simpler (and I do not care how they implement it). Maybe there are some complex cases where would you want to do what Wordpress does but for simple cases I would favor the simples method.
Personally, based on the headaches I had with Wordpress and Joomla, I would choose Joomla, but the difference in headaches was only marginal
