Problem: paginating requests from two sets of results

This is an issue I bumped into at work, and I’m not entirely sure I got to the right conclusion.

Lets say you’re a reading list curation company, and you run two services.

The first you call BookSite, which is an internal service. That project’s job is to collect ISBNs and book titles and match them together and offer that data through an API for you to use on the second website.

BookListSite lets users pull together interesting lists of books, using filters and sort orders and what not. However, it doesn’t keep any of the book’s metadata in its own data stores. It makes a RecommendedBook, but whenever the user looks up that recommendation, we make a call to BookSite to get the extra information.

BookSite offers a Graphiti API to get this data. That’s nice because it handles the vast majority of this complexity for us: the user can search by author names, book titles, order by release date or title, even pagination is handled by Graphiti. BookListSite can largely just make a request and pass it through a Presenter and job done!

Now, BookListSite has a brand new feature: recommended books.

Since all user data is kept on BookListSite, that’s where RecommendedBooks are generated and stored. Users has multiple recommendations with a ranking, 1 being the book we think they’ll most like. (They might even be generated by a data analytics team who are happier working in Python, off the side somewhere, and give a CSV with BookId,UserId,Ranking each day.)

So the problem: when a user searches for an author, and then asks for the books to be sorted by Recommended first, how the heck do we do that? When your PM helpfully tells you that the rest of the books should be ordered by release date, the problem gets a little more complicated.

Graphiti doesn’t support “order by this arbitrary sequence I’m providing,” and who can blame it.

My first attempt wasn’t a complete solution: order by release date via Graphiti, and then sort them by ranking “locally”. This worked just fine on my development machine. But on production my hubris quickly became evident: what if the recommended books appear on the second page of the Graphiti API? Those won’t get pulled to the top.

The solution we’ve gone live with is fairly simple, but still quite delicate.

Make an API request by BookID, asking for the all recommended books first.
Sort these by ranking locally.
Make a second API request for page 1 of the books in the typical order.
Filter out any from the Recommended list, to avoid them showing up again.
Get page 2 of the books when they’re (lazily) requested.

One thing I’m nervous about here is that this only works because I know there are typically a small number of Recommendations. Fewer than 16 usually, which can be a large page, but not awful.

The other disadvantage is that the first payload of books the users sees is a large one: potentially (RecommendedBooks.count + NormalBookSearch.count). And then after that, it’s possible that the second page is entirely empty! Our lazy loading will handle that, but the user will notice a delay whilst we load and throw away an entire page of results.

So, other ideas?

Would it be awful to let BookSite know about the Recommendations? That way, we could continue living in bliss. Graphiti can be made aware of the custom behaviour around this sort order, and nothing special has to happen on BookListSite to get pagination or filtering working.

The negatives here are largely around feature creep, I think. As well as the complexities of keeping the RecommendedBooks lists in sync between the two services. But ultimately, it just isn’t BookSite’s responsibility. (OR IS IT???)

Smarter pagination state. I think the real fix here might be having BookListSite be cleverer with pagination. Instead of continuing to think in BookSite API result pages, we should be thinking about BookSite Books and using an enumerator on those, hiding away how we got them in the first place.

Instead of the conversation being the 5 steps above, we should instead be asking a BookEnumerator (or something) for “the next book”, or “the next 12 books”. That Enumerator can keep track of where it’s getting them from. It could have an array of endpoints the exhaust:

bookLocations: ['books?filter[id]=12,87,33,21,...&filter[author]=Orwell', 'books?filter[author]=Orwell&sort=release_date']
currentLocationIndex: 0,
nextPage: 'books?filter[id]=12,87,33,21&page[number]=2'

Once the first bookLocation has run out of books to return, the Enumerator can increment currentLocationIndex, and start on results from the next location.

When asked for the next 10 items, they may come from two different pages or endpoints. Completely transparently! That means that BookListSite can keep asking for results until it has a page-full to send to the user.

I’ll try implementing something like this tomorrow. Please do shout out if there are other, BETTER patterns. I have to run to the gym now in the terrible rain, so I must appolgeoise for the even-less-than-usual editing in this post.

Tomorrow

Ha! Tomorrow! “Tomorrow!” It’s the 19th of November now.

I managed to get in a few hours working on this problem today and gosh, it’s complex. Lets recap the requirements:

We’ve got two sites: a database API called BookSite and a user facing one called BookListSite. BookListSite has no metadata about books, but does let users search through them by making calls to BookSite.
There’s a page on BookListSite which shows users books by authors they’ve collected. The user can choose to order these by a number of facets: author, release date, etc.
- Using an API, we can ask for an author’s books and have them ordered for us by BookSite.
A new product is RecommendedBooks. Recommended book data is stored on BookListSite. So, to get the metadata for those, we have to send a long list of IDs. Good news is that there are usually only 16 recommended books, so can usually be collected in one page.
We want the RecommendedBooks to appear at the top of the results pages.
These need to be ordered by a special ‘recommendation ranking’.
If a recommended book appears in the normal search result, only display it as recommended. Never duplicated.

I’ve made an attempt at the smarter pagination idea mentioned above.

With the code I have, we can now tell BookSearchPaginator the two queries we have, and it will seemlessly pagination through them.

You can see the code in this gist, if you’re curious.

all_recommendations = BookSiteJsonApiClient::Book.where(id: [23, 11, 88, 19])
all_sherlock = BookSiteJsonApiClient::Book.where(author: 'Sherlock').order(:release_date)

paginator = BookSearchPaginator.new([all_recommendations, all_sherlock]) { /* some auth stuff ; yield */

(results, cursor) = paginator.next(page_size: 20)

This will return the first four recommended books, and the remaining 16 will be Sherlock books. The cursor might look like:

{
  current_query_index: 1,
  current_page_number: 1,
  current_item: 16
}

The cursor can be passed back into the paginator to get the next set of books:

(results, new_cursor) = paginator.next(page_size: 20, cursor: cursor)

Thanks to the information in the cursor, the paginator knows where to continue from. The idea would be that this cursor can be hashed and sent to the browser, which would send it back when asking for the next page.

There are two frustrating problems with this though:

There’s no guarentee of the order that our API will return Book.where(id: [23, 11, 88, 19]) in, so we still need to order those records. Could this be solved by passing along a post_fetch_sort proc for each endpoint? What if the collection is 21 items, rather than the expected 16 that we set our page size too? That 17th item won’t be ‘sorted’ until the second page. No good!
Requirement 6 is now quite hard to do. As the problematic code on production works right now, it’s easy to say “get the recommended books; get the other books - recommended ones”. However, with the two APIs merged into a single Enumerator of books, it’s not possible to tell “are we still looking at the recommended books, or the normal ones?”

Is it time to call this feature too complex?

There are other ways to present Recommend Books - not splicing them in with non-recommended listings for instance. Maybe this is a problem to solve with a Product Owner, rather than one that needs more code.