PHP Internals News: Episode 61: Stable Sorting
In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Stable Sorting RFC.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
Transcript
- Derick Rethans 0:18
-
Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 61. Today I'm talking with Nikita Popov about a rather small RFC that he's proposing called stable sorting. Hello Nikita, how are you this morning?
- Nikita 0:36
-
Hey, Derick, I'm great. How are you?
- Derick Rethans 0:38
-
Not too bad myself. Let's jump straight in here. The title of the RFC is stable sorting, what does that mean, what is stable sorting, or what is sorting stability?
- Nikita 0:48
-
Sorting stability refers to the behaviour of the sort when it comes to equal elements. And equal share means that we sort comparison function. For example, the one you pass to usort says the elements are equal, but there is still some way to distinguish them. For example, if you're sorting some objects, to take the example from the RFC, we have an array with users, and users have an age, and we use usort to only sort the users by age. Then according to the comparison callback all users with the same age are equal. But of course, the user also has other fields on which we can distinguish it. And the question is now in what order will equal elements appear. If we have a stable sort, then they will appear in the order they were originally in. So it's something not going to change.
- Derick Rethans 1:41
-
And that is not what PHP sorting mechanism currently does?
- Nikita 1:44
-
Right. PHP currently uses an unstable sort, which means that the order is simply unspecified. It will be deterministic. I mean if you take the same input array and sort it, then every time we will get the same result. But there is no well specified order or relative order of elements. There's just some order. The reason why we have this behaviour is that well there are, I would say, two, the only two sorting algorithms. There is merge sort. Which is a guaranteed n log n sort that the stable, but has the disadvantage that that requires additional memory to perform the merge step. The other side there is a quicksort, which is an average case n log n sorting algorithm and is unstable, but does not require any additional memory. And in practice, everyone uses one of these algorithms, usually with a couple of extensions on sort of merge sort. Nowadays we use timsort, but which is still based on the same underlying principle, and for quicksort, we have sort which is better than quicksort, which tries to avoid some of the bad worst case performance which quicksort can have. PHP currently uses us a quicksort, which means that our sorting results are unstable.
- Derick Rethans 3:07
-
Okay, and this RFC suggesting to change that. How would you do that? How would you modify quicksort to make it stable?
- Nikita 3:15
-
Two ways. One is to just change the sorting algorithm. So as I mentioned, the really popular stable sorting is timsort, which is used by Python by Java and probably lots of other languages at this point. And the other possibility is to stick with an unstable source. So to stick with quicksort, but to artificially enforce that the comparison function does not have, does not report equal elements that are not really equal. And we can do that by introducing an extra artificial fallback comparison. We remember the order of the elements in the original array. And as the comparison function tells us that elements are equal. You will check against this original order, which means that, okay are sort of still unstable, but because the comparison, we'll never actually report that two elements are equal unless they really equal. It doesn't matter for the result.
- Derick Rethans 4:16
-
So you're basically artificially changing the key to have the original index in the array.
- Nikita 4:24
-
That's pretty much exactly the implementation. And this is actually also how you would implement the stable sort if you'd do it in PHP code. So you would take your array and convert it into an array of pairs, where you have the original array value and the original position of the array element. Difference is just that if you do this in PHP code this is extremely extremely inefficient, in terms of memory and performance, while when we do it internally it's essentially free because we already have a little bit of unused space in each array element. We can easily store the current position.
- Derick Rethans 5:02
-
Do you think there will be much of a performance hit here?
- Nikita 5:04
-
So I expect that there is a bit of performance hit, but for typical usage, not much. For the good case where your array does not actually contain any equal elements, the overhead should be very small, something like maybe one or 2%,. If your array does contain a huge number of duplicates. Then there is more overhead, and the effect is basically that the sort performance, no longer depends on the number of duplicates you have. Previously if you had a lot of duplicates, then the sort became faster, the more duplicates you had. Well now, as you add more duplicates, the sorting performance will stay both stable. That's really the difference in performance.
- Derick Rethans 5:53
-
If you have the numbers in the RFC I'll make sure to link to them. There are possibility that is that this is going to break any code?
- Nikita 6:01
-
Yes, it could break tests.
- Derick Rethans 6:04
-
Tests, because the test's output can change because the sorting order of arrays might have changed.
- Nikita 6:11
-
Exactly. So we already had such a change in PHP seven, where we switched from a pure quicksort, to a hybrid quicksort and insertion sort, which means that effectively we have a stable source for arrays smaller than 16 elements and an unstable source for larger arrays, which is weird, weird intermediate state.
- Derick Rethans 6:33
-
Yes.
- Nikita 6:35
-
I think that one already had quite a bit of fallout for testing purposes. Hopefully this one will be a little bit smaller because most tests will work on a few elements. Those would have already been stable previously. But there is definitely going to be a little bit of fallout for unit testing.
- Derick Rethans 6:56
-
At the moment we're talking about this, the RFC's already up for voting. By the time this podcast has come out. It's pretty likely that it has been accepted for PHP eight, because I think the voting was 51 to zero or something like this.
- Nikita 7:10
-
It's 36 to zero.
- Derick Rethans 7:13
-
There you go. Thank you, Nikita for taking the time this morning to talk to me about stable sorting.
- Nikita 7:19
-
Thanks for having me.
- Derick Rethans 7:23
-
Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.
Show Notes
-
RFC: Stable Sorting
Life Line
Updated a waste_basket
Created 11 benches, 2 life_rings, and 3 other objects; Updated 8 benches and a waste_basket; Deleted a bench and a log; Confirmed a cafe
I walked 7.8km in 1h52m27s
Tomorrow we have elections in the UK!
Lots of local authorities, all London Councils, the Welsh Senedd, and the Scottish Parliament.
Don't forget to vote if you have the right.
I get to vote for myself again 😎.
Benches, and corrections for the QE II Gardens
Addresses on College Road
Created 2 main entrances and an entrance; Updated an entrance, a residential building, and a house building
Created an apartments building and a main entrance
I walked 9.5km in 2h21m10s
Created a waste_basket
I walked 6.6km in 1h13m23s
On my walk from Aylesbury to Princes Risborough I spotted a few new bird species. I didn't get all the best photos though!
A Common Buzzard, a Yellow Wagtail, a Greater White throat, and a Green Woodpecker.
#photography #Birds #BirdPhotogaphy #BirdsOfMastodon #nature #Buckinghamshire
Updated an alcohol shop
I walked 9.4km in 1h58m25s
Updated 2 benches
Created a bench; Updated a bench
I hiked 19.0km in 4h35m50s
I hiked 19.0km in 4h35m50s
I walked 6.8km in 1h15m36s
Updated an estate_agent office
I walked 4.1km in 55m33s
I walked 1.1km in 10m05s
My First Lapwing!
I went to the London Wetland Centre yesterday, for a day out in nature.
While hiding in a hide, this chap and a friend showed up starting to forage for grubs.
#BirdPhotography #BirdsOfMastodon #Photography #Birds #London #Nature
Created a waste_basket; Updated a cafe and a restaurant; Confirmed an estate_agent office
I walked 6.6km in 1h8m53s



Shortlink
This article has a short URL available: https://drck.me/pin061-fng