Wednesday, April 23, 2008

Avoid storing Markup (HTML) in the database

I see this to often; Storing HTML in the database. Then UI wants to change the HTML, but the data grew to 100 GB, so the really only feasible way to change the HTML is via a post process after the database fetch. This post process produces a huge list of preg_replace statements to rebuild the HTML on display. This consumes a lot of memory over time and slows down the APP, plus its time consuming to debug.


Store URLS if the apps need to. Or better yet build the schema to store the bare minimum the app needs to generate the HTML.

Here is a compelling reason why not to store HTML:

The data needed to generate the HTML is 10 bytes, but the TEXT field consumes 1024 bytes, all of which is the same text.

Thus for an app that should cost very little to maintain now costs 10 times as much.

If your building a search engine strip out the HTML and store the text. If its a quick app, take this saying into account

The is nothing more Permanent then a temporary solution


A exception that I don't mind: php serialize - this is markup that (rarely) does not change.

4 comments:

Ambush Commander said...

There are notable cases when storing HTML in the database is acceptable and desirable:

1. When the data itself is HTML! (i.e. user input for a wiki-page or the like)

2. When the HTML is meant to be used as a cache. Squid proxies or memcached, in many cases, will be more appropriate, but if you have a markup language that is intensive to convert to HTML, storing it in a database column helps performance a bit.

Also, it is often a good idea to store text in its own database, and then use a foreign key to link it to the relevant row. This way you can avoid TEXT columns in heavily accessed tables.

Anonymous said...

I agree, but I think there are very good times to store HTML data in the database. Most administration sections of a web site are storing HTML values such as p tags, br tags etc. I think as long as you don't start storing div's and span's you will be ok. Good topic~!

Sheeri K. Cabral said...

It's perfectly OK to store markup in the database. The problem isn't in the storing, it's when it's retrieved. :)

It may seem silly to say that, but it's true. A very simple script can be called to generate a static page from the db contents; this script can be run hourly, daily, monthly, and/or manually as part of a release process.

There's plenty of reasons to store HTML in the database, but there are very very very few instances in which one needs to retrieve HTML from the database over and over and over and over.

Anonymous said...

I'm storing XHTML 1.1 in MySQL database. We publish a daily bulletin and each article is stored in MySQL.
Advantage:
Security. MySQL give us a extra level of security, cause doesn't not exist XHTML files to alter.
Availability: Retrieving or searching is faster on MySQL stored text.
Disadvantages:
Processing:
Some of this articles are really big, and we have some problems on retrieving these with phpMyAdmin.
But we are now using PHP gzcompress/gzuncompress to avoid this inconvenient.

See you