Combining sorting and pagination in an API?

Benjamin Hall

One of my upcoming tasks is going to be enabling pagination on a particular set of REST(ish) API endpoints. The issues are:

This particular set of endpoints feeds into a spreadsheet/table display with sortable columns. And the pagination should respect the sort order (so if you're sorting by date and load more, it should grab the next X by date).
Some of the things we sort by are synthetic columns which are combinations of things returned from these endpoints and also from other endpoints.
None of the things are exactly in a state where we can let the database do the pagination directly (ie feeding in the limit/offset into the query), since the domain model is relatively complex and pieces are inferred from combinations of database-stored values.

This does have the simplifying factor that I'm in control of both sides of the API and there aren't 3rd-party integrators.

So the question becomes: anyone have experience with such things? Anything particular that you hate when working with a paginated API? Anything that makes life much easier?

Using express/Node.js as the API server, with PHP/JS and native (Swift/Kotlin) consumers, if that matters.

dkf

@Benjamin-Hall said in Combining sorting and pagination in an API?:

None of the things are exactly in a state where we can let the database do the pagination directly (ie feeding in the limit/offset into the query), since the domain model is relatively complex and pieces are inferred from combinations of database-stored values.

That's horrible. If you've got some kind of weird filtering and sorting going on then the only real option you've got is to hold the complete filtered, sorted collection somewhere (probably attached to the server-side session state) and computing that will make it expensive to do the common case of just wanting the first page of the collection.

Or you could do the approach used by a former colleague which was to just return a randomly sorted set each time, with filtering then applied. It took ages for people to realize that that's what he'd done.

Jaime

@Benjamin-Hall said in Combining sorting and pagination in an API?:

Anything particular that you hate when working with a paginated API

I hate the ideas of pagination, sorting, and data sets that change fairly rapidly all coming together.

The only way to make them work well, is to keep full snapshots on the server side and navigate through the previously materialized data. That's potentially very abusive to server resources and sometimes creates as many problems as it solves. Sometimes you can get the snappy UI you want by simply chunking the data to the client asynchronously and letting the client deal with paging.

PleegWat

@dkf said in Combining sorting and pagination in an API?:

That's horrible. If you've got some kind of weird filtering and sorting going on then the only real option you've got is to hold the complete filtered, sorted collection somewhere (probably attached to the server-side session state) and computing that will make it expensive to do the common case of just wanting the first page of the collection.

Page 3 only needs to be cheap if it gets used a lot. If 99.999% of requests are for the first page only you can probably get away with a topN algorithm.

Benjamin Hall

@dkf said in Combining sorting and pagination in an API?:

@Benjamin-Hall said in Combining sorting and pagination in an API?:

None of the things are exactly in a state where we can let the database do the pagination directly (ie feeding in the limit/offset into the query), since the domain model is relatively complex and pieces are inferred from combinations of database-stored values.

That's horrible. If you've got some kind of weird filtering and sorting going on then the only real option you've got is to hold the complete filtered, sorted collection somewhere (probably attached to the server-side session state) and computing that will make it expensive to do the common case of just wanting the first page of the collection.

Or you could do the approach used by a former colleague which was to just return a randomly sorted set each time, with filtering then applied. It took ages for people to realize that that's what he'd done.

Currently what we're doing (which works ok for the very small data sets we have, but will not work well later) is

Return the entire set, sorted by whatever order the index pulls out of the database for the "top level" object. This is usually the primary key, but since there are joins in the "main" query, I've seen some cases where it's not exactly that.
Then do the sorting client side and fake the pagination (basically only showing results 1-N without actually querying for anything more).

dkf

@Benjamin-Hall That has one huge advantage: it actually works.

Getting good performance (once you're not just pulling everything to the client) basically requires that you move all filtering and sorting to the database and have the right indices set up. There's no real way to get it right otherwise. (Yes, that makes inserts and updates potentially somewhat expensive.)

Tsaukpaetra

@Benjamin-Hall said in Combining sorting and pagination in an API?:

Some of the things we sort by are synthetic columns which are combinations of things returned from these endpoints and also from other endpoints.

😱

Yikes

Tsaukpaetra

But to answer the question: yeah the easiest way is to materialize the whole shebang (maybe with caching) but if you're doing it across multiple sources there's no good way about it.

Benjamin Hall

@Tsaukpaetra said in Combining sorting and pagination in an API?:

But to answer the question: yeah the easiest way is to materialize the whole shebang (maybe with caching) but if you're doing it across multiple sources there's no good way about it.

Problem with trying to cache stuff is that it means some form of nasty session-management/per-session cache. Because what a given call can see is specific to a user and agency and permission level.

If it were up to me, the database queries would be smart enough to "fill in all the details" so that the information returned would be ready to go out of the box. One big issue is that it involves user names (real names) for other users, which our particular service doesn't have direct access to--they're in a separate service's database (effectively in a shared "account" database since they're shared between multiple teams via I-wish-it-were-SSO). There are endpoints that our system could use to get that data, but that's comparatively expensive (another set of HTTP requests and a segregated DB lookup), so that can't be handled by the DB at all.

Most of the rest I guess I could fake--sort by one element of the conjoined thing, which happens to be the element we really care about, usually time. Which we have. That way, when the front-end assembles the displayed data, it's already in the right order.

I think, since I'm still kinda acting as Product Owner, I'm going to quietly de-prioritize this particular ticket until we really need it. It works the way it is, for now. And before trying to fix performance things like this, we need real customers to complain about things. Which thing (real customers for this particular product) is something we're rather lacking ATM.

Tsaukpaetra

@Benjamin-Hall said in Combining sorting and pagination in an API?:

Which thing (real customers for this particular product) is something we're rather lacking ATM.

I feel that.

At some point I need to convert these NVARCHAR(255) ID columns into the GUID type they actually are... But nobody is here to care about it yet.

Gribnit

@Tsaukpaetra said in Combining sorting and pagination in an API?:

@Benjamin-Hall said in Combining sorting and pagination in an API?:

Which thing (real customers for this particular product) is something we're rather lacking ATM.

I feel that.

At some point I need to convert these NVARCHAR(255) ID columns into the GUID type they actually are... But nobody is here to care about it yet.

It'll be fine. All the data has to move, undergoing a trivial transformation. What could go wrong?

Redirect Notice

Tsaukpaetra

@Gribnit said in Combining sorting and pagination in an API?:

Indeed...

@Gribnit said in Combining sorting and pagination in an API?:

undergoing a trivial transformation.

It is indeed trivial, on the surface, because (thankfully) the original dev used boilerplate code that doesn't actually care if the column type is indeed nvarchar, so long as it's a GUID, and since that hasn't been violated (yet), it's a quick and easy 500 line SQL alteration sequence.

Edit: I lied, apparently it's 1881 lines.

UpdateScript - Adjust AspNetUsers ID column to be more appropriate.sql

dkf

@Tsaukpaetra said in Combining sorting and pagination in an API?:

Edit: I lied, apparently it's 1881 lines.

UpdateScript - Adjust AspNetUsers ID column to be more appropriate.sql

I hope you're not maintaining that by hand? (Generated code is nasty, but not usually looked at.)

Tsaukpaetra

@dkf said in Combining sorting and pagination in an API?:

@Tsaukpaetra said in Combining sorting and pagination in an API?:

Edit: I lied, apparently it's 1881 lines.

UpdateScript - Adjust AspNetUsers ID column to be more appropriate.sql

I hope you're not maintaining that by hand? (Generated code is nasty, but not usually looked at.)

Absolutely not, that's just to check and make sure nothing insane is being done.

boomzilla

I've had to deal with this sort of thing in various ways.

Look for ways to break the data up, especially if you already have some filters you can prepopulate for the user (based on things about them, e.g., which department do they belong to?, or dates, or whatever makes sense). Load the stuff that would be in there and then load other stuff in the background. If that's now how your application works now...change it so that it does.

We've pretty much standardized on using datatables.net for presentation. It does the pagination and sorting and filtering for us and seems pretty performant.

Well, some of the filtering we do ourselves before we put the data into the table, in part because we might not have loaded everything and in part because while the built in search works, their filtering plugins don't (at least for Angular, for the version we're on, YMMV). But it's also easier to hook into our own widgets and then manipulate the table directly (via the API).

Gribnit

@boomzilla said in Combining sorting and pagination in an API?:

I've had to deal with this sort of thing in various ways.

Look for ways to break the data up, especially if you already have some filters you can prepopulate for the user (based on things about them, e.g., which department do they belong to?, or dates, or whatever makes sense). Load the stuff that would be in there and then load other stuff in the background. If that's now how your application works now...change it so that it does.

We've pretty much standardized on using datatables.net for presentation. It does the pagination and sorting and filtering for us and seems pretty performant.

Well, some of the filtering we do ourselves before we put the data into the table, in part because we might not have loaded everything and in part because while the built in search works, their filtering plugins don't (at least for Angular, for the version we're on, YMMV). But it's also easier to hook into our own widgets and then manipulate the table directly (via the API).

I am gonna find this, and hit refresh, so many times...

boomzilla

@Gribnit

dkf

@Tsaukpaetra said in Combining sorting and pagination in an API?:

Absolutely not, that's just to check and make sure nothing insane is being done.

There's plenty in there that's insane, but I suspect that it might be insane and correct. Still, I find the foreign key constraints that seem to not really constrain anything much on either update or delete a bit weird to say the least, and the pattern of doing conditional inserts via EVAL makes my teeth itch…

Tsaukpaetra

@dkf said in Combining sorting and pagination in an API?:

constrain anything much on either update or delete

Yeah there's nothing specified for ON UPDATE and ON DELETE for anything in this database.

There isn't much deleting and key updating in here anyways so it doesn't really matter much... As far as I know anyways.

Benjamin Hall

@dkf said in Combining sorting and pagination in an API?:

@Tsaukpaetra said in Combining sorting and pagination in an API?:

Absolutely not, that's just to check and make sure nothing insane is being done.

There's plenty in there that's insane, but I suspect that it might be insane and correct. Still, I find the foreign key constraints that seem to not really constrain anything much on either update or delete a bit weird to say the least, and the pattern of doing conditional inserts via EVAL makes my teeth itch…

We've got a fun situation with our databases:

We use a combination of tools to do patches to the schema when things change. Liquibase is the big one, using pt-online-schema-change in the background to do it without losing data and without downtime by working on a copy.
pt-online-schema-change prefixes the copied foreign keys with an underscore each time it has to copy the table. But since we didn't start out using this tool...our original production database and our non-production/other stacks have been copied different numbers of times. And so they have different numbers of underscores. This makes liquibase rollbacks (which are supposed to be automatic) fail, because it can't find the right foreign key.
ON DELETE CASCADE doesn't work. It's set up right, but liquibase isn't adding it right. And because of the previous issue, we can't just remove and re-add (not that there's any hope that would work).
pt-online-schema-change doesn't handle self-referential (same-table) foreign keys at all and fails trying to remove them or change them. Unless you set a flag...which liquibase doesn't do.

Gribnit

@Benjamin-Hall your team owns your own databases???