PHP/Zend "string types" patch
This is a proposed extension to the PHP language (and the Zend engine
that drives the PHP interpreter). The patch introduces five types of
strings: plain string, SQL string, HTML string, URL (query) string and
undefined (unknown type) string. The difference is in escaping characters
that have special meaning in SQL (quotes, nul), HTML (ampersand, less-than,
greater-than, double-quote) and URL (nearly everything except plain letters
and digits). The conversion is done automatically when requested. This
language extension is fully backwards-compatible; users who don't know
about the new features (or don't want to know) need not worry: their
existing scripts should work the same without any change. For users who
do know about this and want to use it, I believe this new feature should
bring significant improvement of code readability, reduction of code size
and reduced probability of bugs.
strtypes-v1.patch.gz - 130kB;
the uncompressed patch is about 1.2MB big, but most of the volume are
changes in the parser, which is generated - so the real hand-made changes
are not so big. The patch applies cleanly against PHP 4.2.0 release.
Apply the patch by the command: "zcat strtypes-v1.patch.gz | patch -p1"
in the root directory of the PHP source tree. Then compile as you
The license is the same as that of PHP, of course. (Whatever that may
Every string in PHP has a type, one of the following:
These names are constants both in PHP and in the C source (zend.h). The
string type is stored in a new member: "int str_type" of zval.value.str.
- STR_UNDEFINED (0) - this is the default. STR_UNDEFINED strings
don't participate in any automatic conversion. All strings have
this type unless explicitly converted or casted. The only exception
are GET, POST and COOKIE parameters: these have STR_SQL when
magic_quotes_gpc is set to on (the default), and STR_PLAIN
- STR_PLAIN (1) - the plain string, no escaped characters.
- STR_SQL (2) - SQL string; you would get an SQL string from a plain
string in normal PHP using the AddSlashes() function.
- STR_HTML (3) - HTML string; you would get a HTML string from a
plain string in normal PHP using the HtmlSpecialChars() function.
- STR_QUERY (4) - URL query string; you would get this from a
plain string in normal PHP using the RawUrlEncode() function.
You create a string of a specific type either by using a special new
syntax, or by conversion or casting. The new syntax looks like this:
$plain = p"When A < 0, A is called 'negative'.";
echo h"A definition of a math term: <B>$plain</B>";
mysql_query(s"INSERT INTO table VALUES ('$plain')");
$url = q"http://example.com/cgi-bin/script?param=$plain";
As you may have guessed, the first line creates a plain string, the
second line an HTML string, the third line an SQL string and the fourth
line a URL query string. This alone doesn't bring anything special.
The exciting new thing about this is that on each of these lines
except the first, an automatic conversion occurs when a variable
is inserted into the string in place of $varname.
So on line 2, you can safely output HTML without calling
HtmlSpecialChars on the included string, because this occurs
automatically, so the less-than sign will be converted to <
on the fly. On line 3, you can safely issue the SQL command without
worrying about the inserted string containing apostrophes - they will
be escaped with backslashes (or doubled, if you have
magic_quotes_sybase set to on) automatically. And on line 4, you
can safely pass the string as a URL parameter to a CGI script,
because it will, again, be automatically converted.
There are three new built-in functions that are always available:
- int str_type_get(string str) - returns type of string.
- string str_type_set(string str, int type) - returns the
string with changed type, but does no conversion.
- string str_type_convert(string str, int type) - returns the
string converted to the given type.
STR_* - see above.
When concatenating strings, the second is converted to the type of
the first. When either of the two strings are STR_UNDEFINED, no
conversion occurs. The resulting string is always of the same type
as the first string.
When comparing strings, they are first converted to the same type.
Again, when either of them is STR_UNDEFINED, no conversion occurs.
The code is very much in alpha status. It is nearly untested and is very
likely to contain bugs. I warn you: this is the first time I have even
seen PHP/Zend internals, let alone hacked them. Try it, test it, even
use it if you want, but don't blame me for any catastrophes. And be aware
that I will probably still make changes to the semantics and maybe even
syntax. I do want to hear about the bugs you find, of course.
As I said, this is the first time I hack PHP/Zend internals, so I can't
really say I understand everything. If I may criticize the PHP developers
a little, the code could sure use some commenting! Also, there are many
macros, but they are rarely used: for example, there is ZVAL_STRING to
set a zval to a given string value, but most of the code just manually
sets the zval fields instead of using it. This makes it difficult to
assure that every string is initialized to a valid string type. I have
searched for "[.>]type = IS_STRING" and added
"<zval>.value.str.str_type = STR_UNDEFINED" everywhere, but still
there are cases where a string somehow gets an invalid type. You can
find these if you turn on reporting of E_NOTICE - see lines 521 and 544
in zend_operators.c, function _convert_to_string_type. This is where I
could use some help, because currently I have no idea how to find these
P.S.: Aha! Perhaps I missed some Z_TYPE_P(xxx) = IS_STRING cases - I think
I didn't search for those!
Another problem is with the self-test suite that comes with PHP. After
making my changes, I tried running "make test" and was quite shocked to
find that it reported 45 failed tests. But then I ran it on the
unmodified PHP 4.2.0, and the result was the same. So I must be doing
something terribly wrong. README.TESTING says "You must build CGI SAPI".
After untarring php-4.2.0.tar.bz2, I ran "./configure" without parameters
and sure enough, it said (hidden in about a kilometer of useless stuff)
"checking for chosen SAPI module... cgi". So I thought that was OK and
ran "make test", but the result was
"No rule to make target `/root/php/php-4.2.0/sapi/cli/php'". So then
I tried "./configure --enable-cli". "make test" then ran, but as I said,
45 tests failed. Since this was on the unmodified PHP, it can't be
because I broke PHP with my changes. :-) I must be doing something wrong,
or the test suite simply doesn't work. This is another thing where
I would like some help.
Also see section 'Known problems' below.
Known problems, things to do and things to think about
As mentioned in the previous section, sometimes strings are not
initialized to a valid type. This surely occurs especially in the
The following outputs '00', but should output '11'. I don't know why.
echo str_type_get(str_type_set(h"<tag>\n", STR_PLAIN));
echo str_type_get(str_type_convert(h"<tag>\n", STR_PLAIN));
Currently, all strings are STR_UNDEFINED unless converted or casted,
except GET/POST/COOKIE parameters, which are STR_SQL when magic_quotes_gpc
is on, and STR_PLAIN otherwise. Other strings are unaffected (i.e.
magic_quotes_runtime affected strings are still always STR_UNDEFINED).
This should be completely backwards-compatible, because unless you use
the new features, only STR_UNDEFINED and exactly _one_ other string type
is used, so no conversion should ever occur. We might want to make
magic_quotes_runtime affected strings also have a type, but this would
break compatibility in the case where magic_quotes_runtime is set
differently from magic_quotes_gpc. I see two possibilities: either we
simply let it be as it is - after all, magic_quotes_runtime can be turned
off at runtime, so it doesn't have the same problem as magic_quotes_gpc,
which the programmer may have no way of changing. The other possibility
is to make magic_quotes_runtime-affected strings have type, but only
when the programmer says so with a run-time setting. This would be better,
but of course much more work to do.
Currently, when converting from a STR_SQL, STR_HTML or STR_QUERY string
to another of these types, the string is first converted to STR_PLAIN
(i.e. the escaped characters are un-escaped) and then to the target type.
Perhaps this intermediate step should be skipped? It seems it would be
Unlike HtmlSpecialChars and HtmlEntities, the string type conversion
doesn't handle multi-byte character sets. The conversion from HTML to
PLAIN handles only the four HTML entities that the reverse conversion
uses (amp, lt, gt, quot).
The conversion from and to a URL query string is equivalent to the "raw"
URL encode/decode functions, i.e. it uses %20 as space, not '+'.
var_dump and print_r should be changed to output the string type
specifier. Also, (un)serialization must preserve the string type.
There are surely more unresolved problems and questions that I forgot
to mention or that I don't even know of.
Additional information and links
There is a bug database
entry (no. 16480), a Zend
"Into the Future" forum
in the same forum. And there is a
thread - sorry, my fault) going on in the php-dev mailing list.
I want to know what you think. Do you like/dislike the idea? Do you
have anything to say about the proposed changes? Did you try it and
it worked? Great, tell me more! It didn't work? Not so great, but tell
me still more! I'm keen to hear from you. And of course, if you are a
member of the PHP/Zend team, I would like to know whether you are willing
to include this in a future version of PHP. Write me to
Poslední změna / last modified: 2002-05-25 23:14:31