Become a Patron!

My Amazon wishlist can be found here.

Life Line

PHP Internals News: Episode 92: First Class Callable Syntax

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 92. Today I'm talking with Nikita Popov about a first class callable syntax RFC that he's proposing together with Joe Watkins. Nikita, would you please introduce yourself?

Nikita Popov 0:36

Hi, Derick. I'm Nikita and I am still working at JetBrains. And still working on PHP core development.

Derick Rethans 0:43

Just like about half an hour ago when we recorded an earlier episode.

Nikita Popov 0:47

Exactly.

Derick Rethans 0:48

This RFC has no relation to read only properties. What is the first class callable syntax RFC about?

Nikita Popov 0:55

The context here is that PHP has the callable syntax based on literals, which is that if you just use a plain string, it's interpreted as a function name, and an array where the first element is an object, and the second one is a method name, that's methods. Or the first element is the class name, and the second one is method name, that's a static method.

Derick Rethans 1:17

I would consider this concept a bit of a hack, especially the the one with the arrays, and I reckon you feel similar and hence this RFC?

Nikita Popov 1:27

Yes, I do. So the current callable syntax has a couple of issues. I think the core issue is that it's not really analysable. So if you see this kind of like array with two strings inside it, it could just be an array with two strings, you don't know if that's supposed to actually be a static method reference. If you look at the context of where it is used, you might be able to figure out that actually, this is a callable. And like in your IDE, if you rename this method, then this array should also be this array element will also be renamed. But there's like a lot of complex reasoning that the static analyser has to perform. That's one side of the issue. The second one is that callables are not scope independent. For example, if you have a private method, then like at the point where you create your callable, like as an array, it might be callable there, but then you pass it to some other function. And that's in a different scope. And suddenly that method is not callable there. So this is a general issue with both like this callable syntax based on arrays, and also the callable type. It's a callable at exactly this point, not callable at a later point. This is what the new syntax essentially addresses. So it provides a syntax that like clearly indicates that yes, this really is a callable, and it performs the callable callability check at the point where it's created, and also binds the scope at that time. So if you pass it to a different function in a different scope, it still remains callable.

Derick Rethans 3:01

And it's guaranteed to always be callable.

Nikita Popov 3:03

Yeah, exactly.

Derick Rethans 3:04

What does the syntax like?

Nikita Popov 3:06

The syntax is the funny bit. As a bit of context. This proposal was created as an alternative or as a subset of the partial function application RFC.

Derick Rethans 3:17

That is just as hard to pronounce as first class callable syntax RFC.

Nikita Popov 3:21

Yes, that's why we say PFA. The PFA RFC has a more general feature. It also allows you to create a reference to a callable as a side effect. But more generally, it allows you to also bind some of the arguments to a fixed value. And has like finer control over for example, you can create a callable that has three required parameters, by passing three question mark arguments. While the new syntax only allows you to use the signature of the original function. But the syntax between both of those is compatible. So the new RFC is a subset of PFA. And that's why it uses the syntax where you do a normal function call, but then pass three dots or an ellipsis as arguments.

Derick Rethans 4:08

Instead of passing the function's or method's normal arguments, you use the three dots.

Nikita Popov 4:14

I think like the way to think about the syntax is that this is similar to like a variadic argument, or to the argument unpacking syntax, just that the arguments haven't yet been provided, they will be provided during the actual call. But I think the syntax was definitely the most contentious bit in the discussion of the RFC. I think this is mainly related to the fact that if you the see this code snippet, it looks a bit like, like the example code where the arguments haven't been filled in. While now this is like actual syntax.

Derick Rethans 4:44

I'm sure there's quite a few tutorials out there explaining how PHP works by using dot dot dot. That is not something you can avoid.

Nikita Popov 4:54

Well, we can avoid it, but it's fairly tricky question. I mean, the reason for this dot dot dot syntax, on one hand, this the compatibility with partial functions. I mean, the PFA, RFC has recently been declined. But in the future, we could extend the current syntax to full partial functions. And we would not end up with two different ways. So that's one benefit of the syntax. But the other part is that PHP has different symbol tables for different kinds of symbols. People often ask, why can't you just write like strlen as a plain name, not inside a string, and have that be treated as a reference to this function? And the answer to that is that we can't do that because you can't have a constant that's called strlen. Normally, that would be reference to constant and the same actually applies to all other callable types as well. So if you have something like methods, like object or method name, that would right now be interpreted as a property access. And for static methods, it will be interpreted as a as a class constant access. So we have this ambiguity here. Even if we add an additional symbol to this, for example, like for classes, we have the syntax, class name, and then scope operator class, that gives you the class name. We could do something like strlen, scope operator function, or fn, or whatever, and have that return the callable. That would work, but it also has some ambiguities. For example, if you have something like object, arrow methods, and then scope operator fn, you have this ambiguity. Is this referencing the method of that name? Or is it referencing a callable stored inside the property of that name? This is like fundamentally ambiguous. The way we would resolve it is we will just say that this index is only usable with real simple, so it will always refer to a method, and you couldn't use the syntax to convert the callable stored in a property into a proper callable. I'm actually not sure how I should distinguish these two concepts, because we have the existing callable, strings and arrays, and the first class callables, which are really closure objects.

Derick Rethans 7:11

Which actually sort of brings me to the next question which just popped in my head, which is: Does this first class scalable syntax, what is returned as return a closure or an existing callable type as we have now, with a callable type being a single string, or this array syntax that we now use.

Nikita Popov 7:28

The syntax returns a closure. Actually, the syntax works essentially the same way as the closureFromCallable method. And we do need to return a closure otherwise, we don't get this behaviour where the scope is bound at the time where the callable is created, rather than called. I think maybe going forward, I would generally recommend that people use a closure type, instead of a callable type in type declarations. I mean, you already cannot use callable for property types. Exactly due to this problem that callability is context dependent. While we only forbid it in property types, the same general problem also exists for argument and return types. And especially with the new syntax being introduced here, I think it's best to use closure instead of callable in the future.

Derick Rethans 8:18

Does that sort of mean that first class scalable syntax is syntactic sugar? Or does it do more than the closureFromCallable method?

Nikita Popov 8:27

No, I think it's effectively just syntactic sugar for closureFromCallable.

Derick Rethans 8:34

I'm actually not sure whether Xdebug is be able to do anything with these closure from callable things to begin with. So that is something I'm going to have to investigate.

Nikita Popov 8:44

Be able as in like, display that it actually refers to a specific method rather than just some kind of closure?

Derick Rethans 8:51

Yeah, because at the moment, it shows you the file name and the line numbers, it doesn't have a name right if you create normal closures, but in this case, it's important to know that it actually refers to specific methods, which is the same thing as the closureFromCallable syntax would also do, but I've never done anything with that.

Nikita Popov 9:10

But I think there is a way to get like the underlying prototype for the closure, and you should be able to determine it from there.

Derick Rethans 9:18

The first class callable syntax, are there situations where you can't use it?

Nikita Popov 9:22

One place where you don't want to use the new syntax as if you don't want to actually create a closure object, and validate callability at the point of creation. For example, creating this first class callable also implies that you have to autoload the class for a static method. If you have some kind of like large definition of of handler, of static handler methods for routes or something like that, then using the first class callable syntax would imply that you have to immediately create closure objects for all of these and immediately load all those classes. That's a use case where you might want to stick with the old syntax.

Derick Rethans 10:01

But wouldn't opcache resolve that issue really?

Nikita Popov 10:04

No, opcache is really exactly the reason why you wouldn't want to do that. For example, for my fastroute library, I cache all the data as a static array. And that's something that OpCache can cache very efficiently because it's in shared memory and accessing it is essentially zero cost. If you include something like first class callables in it, then those have to always be created at runtime, because we don't have concept like, like a persistent object. That means that this can no longer, I mean, the whole script can be in shared memory, but it still has to be executed always at runtime to construct the whole data structure. And that's going to be less efficient. To give a more clear answer to your question is that the first class callable syntax has a cost when creating the callable, and if you are in a situation where avoiding that cost is really critical for performance, that's why you wouldn't want to use it.

Derick Rethans 11:00

And instead you'd have to use the old scalable syntax that we already have.

Nikita Popov 11:03

Exactly. So for that reason, I think that the old syntax is not going to be removed in the near future at least, though maybe we can deprecate certain aspects of it. For example, the syntax also allows you to do highly context dependent things like referencing self, which is even worse than the situation with a private method, because self could refer to something different every time you call it. Those are some things we might want to deprecate early, but the main syntax itself was probably going to stay for a while.

Derick Rethans 11:34

Because callability is checked when you create the closures does that mean it also checks for strictness then? If your PHP file has been declared with strict types?

Nikita Popov 11:45

Strictness is handled the same as with closureFromCallable.The strictness is still determined at the time where the call is made, not where the callable was created, which actually, I am not a fan of how PHP handles strict types together with dynamic calls. But that's like a pre existing problem. And this isn't touching on this.

Derick Rethans 12:06

The language has many issues that probably could have been done better if it was designed from scratch. But that ship has sailed 26 years ago.

Nikita Popov 12:15

The strict types are not quite that old.

Derick Rethans 12:17

No, that is true. The language itself is of course.

Derick Rethans 12:23

Okay, thank you very much then, for taking the time this morning to talk to me about first class scalable syntax.

Nikita Popov 12:29

Thanks for having me, Derick.

Derick Rethans 12:30

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.

Shortlink

This article has a short URL available: https://drck.me/pin092-gfi

Comments

No comments yet

PHP Internals News: Episode 91: is_literal

In this episode of "PHP Internals News" I chat with Craig Francis (Twitter, GitHub, Website), and Joe Watkins (Twitter, GitHub, Website) about the "is_literal" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 91. Today I'm talking with Craig Francis and Joe Watkins, talking about the is_literal RFC that they have been proposing. Craig, would you please introduce yourself?

Craig Francis 0:34

Hi, I'm Craig Francis. I've been a PHP developer for about 20 years, doing code auditing, pentesting, training. And I'm also the co-lead for the Bristol chapter of OWASP, which is the open web application security project.

Derick Rethans 0:48

Very well. And Joe, will you introduce yourself as well, please?

Joe Watkins 0:51

Hi, everyone. I'm Joe, the same Joe from last time.

Derick Rethans 0:56

Well, it's good to have you back, Joe, and welcome to the podcast Craig. Let's dive straight in. What is the problem that this proposal's trying to resolve?

Craig Francis 1:05

So we try to address the problem where injection vulnerabilities are being introduced by developers. When they use libraries incorrectly, we will have people using the libraries, but they still introduce injection vulnerabilities because they use it incorrectly.

Derick Rethans 1:17

What is this RFC proposing?

Craig Francis 1:19

We're providing a function for libraries to easily check that certain strings have been written by the developer. It's an idea developed by Christoph Kern in 2016. There is a link in the video, and the Google using this to prevent injection vulnerabilities in their Java and Go libraries. It works because libraries know how to handle these data safely, typically using parameterised queries, or escaping where appropriate, but they still require certain values to be written by the developer. So for example, when using a query a database, the developer might need to write a complex WHERE clause or maybe they're using functions like datediff, round, if null, although obviously, this function could be used by developers themselves if they want to, but the primary purpose is for the library to check these values.

Derick Rethans 2:05

That is a method of doing it. What is this RFC adding to PHP itself?

Craig Francis 2:09

It just simply provides a function which just returns true or false if the variable is a literal, and that's basically a string that was written by the developer. It's a bit like if you did is_int or is_string, it's just a different way of just sort of saying, has this variable been written by the developer?

Derick Rethans 2:28

Is that basically it?

Craig Francis 2:30

That's it? Yeah.

Joe Watkins 2:32

It would also return true for variables that are the result of concatenation of other variables that would pass the is literal check. Now, this differs from Google, because they introduced that at the language level, but not only at the language level, at the idiom level. So that when you open a file that's got queries in PHP, commonly, if they're long, basic concatenation is used to build the query and format it in the file so that it's readable. So that it wouldn't really be very useful if those queries that you see everywhere in stuff like PHPMyAdmin, and WordPress, and Drupal and just normal code weren't considered literal, just because they're spread over several lines with the concatenation operator. It's strictly not just stuff that's written by the programmer, but also stuff that was written by the programmer or concatenated, with other stuff that was written by the programmer.

Derick Rethans 3:33

Now in the past, we have seen something about adding taint supports to PHP, right? How is this different, or perhaps similar, to taint checking?

Craig Francis 3:44

At the moment today, there is a taint extension, which is something you need to go out your way to install, and actually learn about and how to use. But the main difference is that taint checking goes on the basis of say, this variable is safe or unsafe. And the problem is that it considers anything that had been through an escaping function like html_entities as safe. But of course, the problem is that escaping is difficult. And it's very easy to make mistakes with that. A classic example is if you take a value from a user, an SSH SSH, their homepage URL, if you use HTML encoding, and then put it into the href attribute of a link, that can also result in HTML injection vulnerability, because the escaping is not aware of the context which is used. Because if the evil user put in a JavaScript URL, that is in inline JavaScript, that has created a problem because taint checking would assume that because you use HTML encoding it is safe, and all I'm saying is that is it creates a false sense of security. And by stripping out all that support for escaping, it means that you can focus on libraries doing that work because they know the context, they understand the domain, and we can just keep it a much simpler, and much safer approach.

Derick Rethans 5:02

Would you say that the is_literal feature is mostly aimed at library authors and not individual developers?

Craig Francis 5:09

Yeah, exactly. Because the library authors know what they're doing. They're using well tested code, many eyes over it. The problem libraries have at the moment is that they trust the developer to write things themselves. And unfortunately, developers introduce a lot of injection vulnerabilities with those strings before they even get into the library.

Derick Rethans 5:30

How would a library deal with with strings that aren't literal then?

Craig Francis 5:35

So it really depends on each individual example. And the RFC does include quite a lot of examples of how each one will be dealt with. The classic one is, let's say you're sorting by a column in a database, because if we're dealing with SQL, the field name might come from the user. But that is also quite a risky thing to do if you start including whatever field name the user wrote. So in the RFC, I've created a very simple example where the developer would create an array of fields that you can sort by, and then whatever the user provides, you search through that array, and you pull out the one that you that matches and is fine. And therefore you are pulling out a literal and including into the SQL. To be fair, these ones are quite unique. And each one needs to be dealt with in its own way. But I've yet to find an example where you can't do it with a literal. Having said that, I think Larry Garfield actually gave an example where a content management system changed its database structure. And the way that would work is the library would have to deal with it, they would receive the value for a field, and then that field would be escaped and treated as a field, it understands it as a field, and it will process it as such, then it can include into the SQL, knowing full well that everything else in that SQL is a literal, and then it can just build up SQL in its own way internally.

Derick Rethans 6:58

Okay, talking a little bit about the implementation here. Since PHP seven, we have this concept of interned strings, or maybe even before that actually, I don't quite remember. Which is pretty much a flag on each string and PHP that says, this's been created by the engine, or by coconut. Why would strings have to have an extra flag here to remember that it is created by the programmer?

Joe Watkins 7:21

Well, interned does not mean literal. It's an optimization in the engine, should we use strings. We're free to do whatever we want with that. At the moment, it by happenstance, most interned strings are those written by the programmer. If you think about the sort of strings that are written by the programmer, like a class name, when those things are declared internally, by an extension, or by core code, those things are interned as if they were written by the programmer. They don't mean literal, we're free to use interned strings for whatever we want. For example, a while ago, someone suggested that we should intern keys while JSON decoding or unserializing. It didn't happen, but it could happen. And then we'd have the problem of, well, how do we separate out all this other input. There is another optimization attached to interned strings, which is one character strings, where if you type only one character, or you call a Class A or B, or whatever, the permanent interned string will be used. That results in when the chr function is called, that results in the return of that function always being marked as interned. So it would show as literal, which is not a very nice side effect. And that's just a side effect that we can see today. We don't want to reuse the string really, it does need to be distinct. Also, if you're going to concatenate, whether you do it with the VM or a specific function, obviously, you need to be able to distinguish between an interned string and a literal string, which interned means it has a specific life cycle and specific value. And we can't break that.

Derick Rethans 9:00

So there are really two different concepts, is what you're saying, and hence, they need to have a special flag for that?

Joe Watkins 9:06

Yeah, they're very, two very separate concepts. And we don't we don't want to restrict the future of what interned strings may be used for. We don't want to muddy the concept of a literal.

Derick Rethans 9:16

Of course, any sort of mechanism that languages built into solve or prevent injections in any sort of form, there's always ways around it. Theoretically, how would you go around the is_literal checks to still get a user inputted value into something that passes the is_literal check?

Craig Francis 9:36

Generally speaking, you would never need it because the library should know how to deal with every scenario anyway. And it's not that difficult. We're only talking about things like in the database world, you'll be taking value from field names and therefore it should receive field names or table names. And, you know, we are providing a guardrail as a safety net. And what should happen is that the default way in which programmers work should guide them, to do it the right way. We're not saying that you can't do weird things to intentionally work around this. A really ugly version, which you should never do, but use eval and var_export together, it's horrible. But if you are so desperate, you need to get around this. That's what we're doing it. But in reality, we can't find any examples where you'd actually need to do this.

Joe Watkins 10:22

I would say that, hey, there's this idea that most people writing PHP are using libraries, and they're using frameworks. I don't actually find that to be true. I've been working in PHP for a long time. And most of the big projects I've worked on for a long time did not start out using frameworks. And they did not start out using libraries. They look a bit like that today, but their core, they are custom. There may be a framework buried in there. But there is so much code that the framework is a component and is not the main deal. Most code, we actually do write ourselves, because that's what we're paid to do. I think we don't decide how people are going to use it, and we don't decide where they're going to use it. The fact is, like Craig said, it's a guardrail that you can work around easily. And if you find a use case for doing that, then we shouldn't prejudge, and say, well, that's the wrong thing to do. It might not be the wrong thing to do. For example, an earlier version of the idea included support for integers. We considered integers safe, regardless of their source. If you wanted to do that, in your application, you could do that very easily and still retain the integrity of the guardrail is not compromised. I wouldn't focus on this is for libraries, and this is for frameworks, because these things become so small in the scheme of things that they're meaningless. I mean, most of the code we work on is code that we wrote, it is not frameworks.

Derick Rethans 11:48

That also nicely answers my next question, which is what's happened to integers, which have now nicely covered. The RFC talks about that as hard to educate people to do the right thing. And that is_literal is more focused, so to say, on libraries, and perhaps query building frameworks as the RFC alludes to. But I would say that most of these query building tools or libraries already deal with escaping from input value. So why would it make sense for them to start using is_literal if you're handling most of these cases already anyway?

Craig Francis 12:24

If you look at the intro of the RFC, there's a link to show examples of how libraries currently receive the strings. And you're right about the Query Builder approach is a risky thing, I would still argue it's an important part. That's why libraries still provide them. Doctrine has a nice example of DQL. The doctrine query language is an abstraction that they've created, which is also vulnerable to injection vulnerabilities. And it gives the developer a lot more control over a very basic API. I still think people should try and use the higher level API's because they do provide a nice safe default, but that depends on which library use, they're not always safe by default. So for example, when you're sort of saying: I want to find all records where field parameter one, is equal to value two, a lot of the libraries assumed that the first parameter there is safe and written by the developer. They can't just necessarily simply escape it as though it's a field because that value might be something like date, bracket, field, bracket, and it's sort of relying on the developer to write that correctly, and not make any mistakes. And that hasn't proven to be the case, you know, they do include user values in there.

Derick Rethans 13:43

Just going back a little bit about some of the feedback, because feedback to the RFC has happened for quite some time now. And there were lots of different approaches first tried as well, and suggested to add additional functions and stuff like that. So what's been the major pushback to this latest iteration of the RFC?

Joe Watkins 14:01

So I think the most pushback has come from an earlier suggestion that we could allow integers to be concatenated and considered literal. We experimented with that, and it is possible, but in order to make it possible, you have to disable an optimization in the engine, that would not be an acceptable implementation detail for Dmitri. It turns out we didn't actually, we don't need to track their source technically, but it made people extremely uncomfortable when we said that, and even when we got an independent security expert to comment on the RFC, and he tried to explain that it was no problem, but it was just not accepted by the general public. I'm not sure why.

Derick Rethans 14:45

All right. Do you have anything to add Craig?

Craig Francis 14:48

The explanation given by people is they liked the simpler definition of what that was as if it's a string written by the developer. Once you start introducing integers from any source, while it is safe, it made people feel, yeah, what is this. And that's where we also had the slight issue because we had to find a new name for it. And I did the silly thing of sort of asking for suggestions, and then bringing up a vote. And then we had, I think it's 18 to three people saying that it should be called is_trusted, and you have that sinking moment of going, Oh, this is going to cause problems, but hey, democracy. It creates that illusion that it's something more. So that's why we sort of went actually, while I like Scott's idea of having the idea of maybe calling it is_noble. It is a vague concept, which people have to understand. And it's a bit strange. Whereas going back to the simpler, original example, they've all seem to grasp grasp of that one. And we could just keep with the original name of is_literal, which I've not heard any real complaints about.

Derick Rethans 15:53

I think some people were equivalenting is_trusted with something that we've had before in PHP called Safe mode, which was anything but of course.

Craig Francis 16:02

Yes, no, definitely.

Derick Rethans 16:03

We're sort of coming to the end of what to chat about here. Does the introduction of is literal introduce any BC breaks?

Craig Francis 16:11

Only if the user land version of is_literal, which I'm fairly sure is going to be unlikely. So on dividing their own function called that.

Derick Rethans 16:18

Did you check for it?

Craig Francis 16:20

Yes.

Derick Rethans 16:21

So if you haven't found it, then it's unlikely to to exist.

Craig Francis 16:24

There are still private repositories, we can't shop through all their show, check through all their code. But yeah.

Derick Rethans 16:29

Did I miss anything?

Craig Francis 16:31

We covered future scope, which is the potential for a first class type, which I think would be useful for IDs and static analysers. But this is very much a secondary discussion, because that could build on things like intersection types, but we still need to focus on what the flag does. And there's also possibility of using this with the native functions themselves, but we do have to be careful with that one, because, you know, we got things like PHPMyAdmin. We have to be able to make the output from libraries as trusted because they're unlikely to still be providing a literal string at the end of it. So that's a discussion for the future. And the only other thing is that, you know, the vote ends on the 19th of July.

Derick Rethans 17:08

Which is the upcoming Monday. How is the vote going? Are you confident that it will pass?

Craig Francis 17:13

Not at the moment, we're sort of trying to talk to the people who voted against it. And we've not actually had any complaints as such. The only person who sort of mentioned anything was saying that we should rely on documentation and the documentation is already there. And it's not working. I think a lot of people just voted no, because they just sort of going well, that's the safe default. I don't think it's necessary. Or, you know, I'd like the status quo. And we still are trying to sell the idea and say: Look, it's really simple. It's not really having a performance impact. And it can really help libraries solve a problem, which is actually happening.

Derick Rethans 17:46

Is this something that came out of the people that write PHP libraries or something that you came up with?

Craig Francis 17:52

So I've come gone to the library authors and suggested you know, this is how Google do it. Would you like something similar? And we've certainly had red bean and Propel ORM saw show positive support for that. And I've also talked to Matthew Brown, who works on the Psalm static checking analysis. He's very positive about it, so much so that Psalm now also includes this as well. Obviously, static analysis is not going to be used by everyone. So we would like to bring this back to PHP so that libraries can use it without relying on all developers using static analysis.

Derick Rethans 18:25

Thank you very much. Glad that you were both here to explain what this is_literal RFC is about.

Craig Francis 18:31

Thank you very much, Derick.

Joe Watkins 18:33

Thanks for having us.

Derick Rethans 18:37

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.

Show Notes

Shortlink

This article has a short URL available: https://drck.me/pin091-gf8

Comments

No comments yet

PHP Internals News: Episode 90: Read Only Properties

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Read Only Properties" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 90. Today I'm talking with Nikita Popov about the read only properties version two RFC that he's proposing. Nikita, would you please introduce yourself?

Nikita Popov 0:33

Hi, Derick. I'm Nikita and I do PHP core development work by JetBrains.

Derick Rethans 0:39

What does this RFC proposing?

Nikita Popov 0:41

This RFC is proposing read only properties, which means that the property can only be initialized once and then not changed afterwards. Again, the idea here is that since PHP 7.4, we have typed properties. A remaining problem with them is that people are not confident making public type properties because they still ensure that the type is correct, but they might not be upholding other invariants. For example, if you have some, like additional checks in your constructor, that string property is actually a non empty string property, then you might not want to make it public because then it could be modified to an empty value for example. One nowadays fairly common case is where properties are actually only initialized in the constructor and not changed afterwards any more. So I think this kind of mutable object pattern is becoming more and more popular in PHP.

Derick Rethans 1:35

You mean the immutable object?

Nikita Popov 1:37

Sorry, immutable. And read only properties address that case. So you can simply put a public read only typed property in your class, and then it can be initialized once in the constructor and you can be... You don't have to be afraid that someone outside the class is going to modify it afterwards. That's the basic premise of this RFC.

Derick Rethans 1:57

But it also means that objects of the class itself can modify that value any more, either.

Nikita Popov 2:01

Exactly. So that's, I think, a primary distinction we have to make. Genuinely, there are two ways to make this read only concept work. One is like actually read only or maybe more precisely init once, which is what this RFC proposes. We can only set that once and then even in the same class, you can't modify it again. And the alternative is the asymmetric visibility approach where you say that, okay, only in the public scope, the property can only be read, but in the private scope, you can modify it. I think the distinction there is very important, because read only property tells you that it's genuinely read only, like, if you access a property multiple times in sequence, you will always get back the same value. While the asymmetric visibility only says that the public interface is read only, but internally, it could be mutated. And that might like be, you know, intentional, just that you want to like have your state management private, but that the property is not supposed to be immutable.

Derick Rethans 3:05

How's this RFC different from read only properties, version one?

Nikita Popov 3:09

Read only properties version one was called write once properties. I think the naming is kind of one of the more important differences. The new RFC is also effectively write once, but I think it's really important to view it from an API perspective as read only because that's what the user gets to see. While write once gives you this impression, that is know that you can externally from outside the class like passing the value once I know like dependency injection, that is what they would think of when they hear write ones. And from the technical site, there is a related difference. And that difference is that new RFC only allows you to initialize read only properties inside the class scope. That means if you do something really weird, like leaving a property uninitialized in the constructor, it's not possible for someone outside the class to initialize it instead, just like an extra safety check. Of course, you can, as usual bypass that, like if you're writing a serializer or hydrator, you can use reflection to initialize it outside the class. But normally you won't be able to.

Derick Rethans 4:13

Does that mean that these read only properties can also be initialized from a normal method instead of just from the constructor?

Nikita Popov 4:19

That's true. Yes, that's possible.

Derick Rethans 4:21

So the RFC talks about that a read only property cannot be assigned from outside a class. Does that mean it can be set by a different object of the same class?

Nikita Popov 4:29

Yeah, that's how scoping works in PHP. Scoping is always class based, not object based, it's a common misconception that if you have like private scope, you can access different objects of the same class.

Derick Rethans 4:42

That was a surprise to me the first time I ran into that, but once you know, it's obvious that it's should work all over the place right. Now, the RFC states that you can only use read only with typed properties. Why is that?

Nikita Popov 4:55

So this is related to the initialisation concept, that typed introduced. Typed properties start out, if you don't give them a default value, they start out in an uninitialized state. And we reuse that state for read only properties. You can only assign to the property while it's uninitialized. And once it's initialized, you cannot assigned to it any more or even unset it back to an uninitialised state. For non typed properties, you also can get into this uninitialized state by explicitly unsetting it. But the problem is that this is not the default state. Untyped properties always have no default value, even if you don't specify one, which effectively means that these properties are always initialized. So they will be kind of useless if you used read only with them. Which is why we make this distinction to avoid any confusion. And if you want to use an untyped read only property, you do that by using the mixed type, which is the same but has the initialisation semantics of typed properties.

Derick Rethans 5:56

What would that mean if you have say, a resource or class typed property with a read only keyword? Can you not read or write to a resource any more? Or modify properties on an object, that is the value of a read only typed property?

Nikita Popov 6:12

No, you can still modify those, because we have to distinguish the concepts of like exterior and interior mutability here. So objects and resources are... Well, I mean, we often say they are passed by reference, which is not strictly true, because those are not PHP references. But the important part is that they only pass around some kind of handle and you can still modify the inside of that handle. What you can't do is you can't reassign to a different resource or reassign to a different object, it's always the same object, the same resource, but the insides of the objects, those can change. Of course, if your object also only contains read only properties, then you won't be able to change this.

Derick Rethans 6:55

Okay, and that answered another question that I have: how it possible to make the whole object read only or on all of its properties? Where the answer is by setting all the properties to read only.

Nikita Popov 7:05

We could like add a read only class modifier that makes all the properties implicitly read only but maybe future scope.

Derick Rethans 7:13

Is there a reason why read only properties can't have a default value?

Nikita Popov 7:18

So this is, again, same issue with initialization. If they have a default value, then they're already initialized. So you can't overwrite it. Like we could allow it, but you just would never be able to change them from the default value, which is something we could allow it just wouldn't be very useful.

Derick Rethans 7:36

in PHP, eight zero PHP introduced promoted or constructor promoted properties, I think it's the full name of it. How does this read only property tie in with that? Because can you set the read only flag on a constructor promoted property?

Nikita Popov 7:49

Yeah, you can. So that works as expected. Important bit there is that with the promoted properties, you can set the default value. And the reason is that for promoted properties, the default value is not the default value of the property. It's the default value of the parameter. And that works just fine.

Derick Rethans 8:07

But again, it would be a constant.

Nikita Popov 8:10

No, in that case, it's not a constant, because it's just a default value for the parameter and the code we generate, we just assign this parameter to the property IN the constructor.

Derick Rethans 8:19

Which means that if you instantiate the class with different arguments, they would of course, override the default value of the constructor argument value, right?

Nikita Popov 8:28

That's right. If you pass it explicitly, then we use the explicit value. If you don't pass an argument, then we use the default of the argument, but it still gets assigned to the property through the like automatically generated code for the promotion.

Derick Rethans 8:42

Promotion isn't actually... there isn't actually really a language feature, but more of a copy and paste mechanism.

Nikita Popov 8:50

Yes, this is pure bit of syntax sugar.

Derick Rethans 8:53

Which sometimes can be handy. But all kinds of interesting things that we added to properties or type system in general, usually inheritance comes into play. Are there any issues here with read only and inheritance? What does it do to variants or traits or things like that, all the usual things that we need to take into consideration?

Nikita Popov 9:13

Most of our rules for properties, most of our inheritance rules for properties are invariant, which means that the property in the child class has to basically look the same as the property in the parent class. And this is also true for read only properties. So we say that if the parent property is read only the child property has to be read only, and the same the other direction and for the type as well. So we say that the type of the read only property has to match with the parent property. Those rules are very conservative and like they could theoretically maybe be relaxed in some cases. For read only properties on read only is like kind of return type and return types in PHP are covariant. So one could argue that the type should also be covariant here The problem is that like here, we get into this little detail that read only is really not read only, but init once. So there is this one assignment. And assignment is more like an argument. So is contravariant. So if we made them covariant, then we would basically lie about this one initializing assignment. And we don't like to lie about these things, at least without some further consideration.

Derick Rethans 10:24

Read only doesn't change the variance rules at all, which is different from the property accessors RFC we spoke about a couple of months ago. Talking about that this is actually a competing RFC, or do they tie together?

Nikita Popov 10:37

Well, I should first say that probably the variance stuff defined in the accessors RFC maybe isn't right, for the same reason and should also like just keep things invariant. But it's more complicated there because it also has abstract properties, or abstract accessors. And then the abstract case, that's where the different variance rules are safe. To get back to your question, they are kind of competing, but could also be seen as just complimentary. So read only is basically a small subset of the accessors feature. I mean, accessors also allow you to implement read only properties as part of a much more general framework. And read only properties cover like only this small but very useful corner in a way that's much simpler. And I think that's not just simpler from a technical perspective, but also simpler to understand for the programmer, because there are a lot of less edge cases involved.

Derick Rethans 11:32

Because we're getting pretty close to feature freeze now. Are you still intending to take the property accessors RFC forward for PHP eight one?

Nikita Popov 11:41

No, definitely not.

Derick Rethans 11:43

But you have taken the read only properties one forwards because we're already voting on it. At the moment, it looks like the vote is going to pass quite easily. So there's that. We don't have to speculate about that, like we had to do with previous RFCs. Do you think I've missed anything talking about read only properties? Or have we covered everything?

Nikita Popov 12:01

We've missed one important bit. And that's the cloning. So the read only properties RFC is not entirely uncontroversial. And basically all the controversy is related to cloning. The problem are wither methods as they're used by PSR seven, for example, which are commonly implemented by cloning the original object, and then modifying one property in it. This doesn't work with read only properties, because you know, if you clone the object, then it already has all the properties initialized. So if you try to change something, then you'll get an error that you're modifying read only property. So these patterns are pretty fundamentally incompatible. There are possible solutions to the problem primarily a dedicated syntax that goes into the code name of cloneWith where you clone an object, but override certain properties directly as part of the syntax, which could bypass the read only property checks. The contention here is basically there are some people consider this clone support and these wither methods are very important. Well, I mean, read only properties are about immutable objects. And like PSR seven is a fairly popular read only object pattern, they think that without support for cloning or for withers, the feature loses all value. That's why the alternative suggestions are either to first introduce some more cloning functionality. So like this mentioned, cloneWith or instead implement asymmetric visibility, which does not have this problem because inside the class, you can always modify the property. So it works perfectly fine with cloning. My personal view on this is well I personally use cloning in PHP approximately never. So this is not a big loss for me, and I'm happy for this from my perspective, edge case, to be addressed at a later time. I can understand that other people put more value on this then I do.

Derick Rethans 13:55

There doesn't seem to be enough push against that for this RFC to fail. So there is that. Thanks Nikita for explaining the read only properties RFC which will very likely see in PHP eight one. Thanks for taking the time.

Nikita Popov 14:08

Thanks for having me Derick, once again.

Derick Rethans 14:13

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.

Shortlink

This article has a short URL available: https://drck.me/pin090-gey

Comments

No comments yet