Clojure で GraphQL の DataLoader を実現する superlifter の紹介（前編）

alt

Clojure で GraphQL の DataLoader を実現するライブラリ、 superlifter の紹介前半の今回は、Laciniaサーバの構築と、GraphQLで起こりがちな N+1 問題そのものについて、解説します。

あいさつ
GraphQL における DataLoader とは
Lacinia Pedestal の GraphQL サーバ構築
「N+1問題」を再現してみる
まとめ
- いつもの

あいさつ

こんにちは！ Opt Technologies の @atsfour です。かつてはマネージャーとして勤務していました（過去記事）が、技術に触れていたい気持ちが勝り、今はClojureを使ってGraphQLのAPI開発をしています。

Clojure歴は1年程度、プロダクトでは Lacinia と Pedestal を利用しています。

GraphQLでは、 Apollo が有名ですが、今回は弊社で利用しているClojureのLaciniaと、DataLoader機能を提供する superlifter を使ったサンプルで DataLoaderの具体的な実装についてご紹介したいと思います。

GraphQL における DataLoader とは

GraphQLそのものについては、他にも多くの記事がありますので、説明は割愛します。英語ですが、 GraphQLの公式サイトの説明が大変わかりやすく、基本的なことはここだけでも理解できるのではないかと思います。（英語ヘタレ勢の私は、どんどん精度の高まっているGoogle翻訳先生を原文と並べて読むのが最近のトレンドです。）

GraphQL では、APIのユーザー側が、関連するデータを好きなだけ掘り下げてクエリできることが魅力ですが、この柔軟なクエリを実現するためにサーバサイドが「N+1問題」に陥りやすいという側面があります。この「N+1問題」への対抗策として有力な方法が、DataLoaderです。

Lacinia Pedestal の GraphQL サーバ構築

今回はあくまでDataLoaderの解説が主眼なので、サーバ構築は最小限で行こうと思います。

本記事のために構築したサーバのサンプルコードはこちらに置いてあります。

Lacinia Pedestal のサーバ構築ついては、弊社の同僚でもある @lagénorhynque さんが書いたこちらの記事が参考になります。少し前の記事なので、アップデートされている部分もありますが、概ねそのまま利用できると思います。

今回はこちらの記事を参考にしつつ、新規に最小構成のサーバを構築しました。

設定

最小限のサーバを作成するための各種設定はこちらです。

resources/lacinia_superlifter_sample/config.edn

{:duct.profile/base
 {:duct.core/project-ns  lacinia-superlifter-sample

  :duct.server/pedestal
  {:base-service #ig/ref :lacinia-superlifter-sample.graphql/service
   :service #:io.pedestal.http{:type :jetty
                               :join? true
                               :host #duct/env "SERVER_HOST"
                               :port #duct/env ["SERVER_PORT" Int :or 8888]}}

  :lacinia-superlifter-sample.graphql/schema
  {:path "lacinia_superlifter_sample/schema.graphql"}

  :lacinia-superlifter-sample.graphql/service
  {:schema #ig/ref :lacinia-superlifter-sample.graphql/schema
   :options {:api-path "/graphql"
             :ide-path "/"
             :asset-path "/assets/graphiql"
             :app-context {}
             :env :prod}}}

 :duct.profile/dev #duct/include "dev"
 :duct.profile/prod {}

 :duct.module/pedestal {}}

src/clj/lacinia_superlifter_sample/graphql.clj

(ns lacinia-superlifter-sample.graphql
    (:require
      [clojure.java.io :as io]
      [com.walmartlabs.lacinia.parser.schema :as parser.schema]
      [com.walmartlabs.lacinia.pedestal2 :as lacinia.pedestal2]
      [com.walmartlabs.lacinia.schema :as schema]
      [com.walmartlabs.lacinia.util :as util]
      [integrant.core :as ig]
      [io.pedestal.http :as http]
      [lacinia-superlifter-sample.resolver :as resolver]))


(defmethod ig/init-key ::schema
           [_ {:keys [path]}]
           (-> (io/resource path)
               slurp
               (parser.schema/parse-schema)
               (util/inject-resolvers {})))


(defn routes
      [interceptors {:keys [api-path ide-path asset-path]
                     :as options}]
      (into #{[api-path :post interceptors
               :route-name ::graphql-api]
              [ide-path :get (lacinia.pedestal2/graphiql-ide-handler options)
               :route-name ::graphiql-ide]}
            (lacinia.pedestal2/graphiql-asset-routes asset-path)))


(defmethod ig/init-key ::service
           [_ {:keys [schema options]}]
           (let [compiled-schema (schema/compile schema)
                 interceptors (lacinia.pedestal2/default-interceptors compiled-schema (:app-context options))]
                (lacinia.pedestal2/enable-graphiql
                  {:env (:env options)
                   ::http/routes (routes interceptors options)
                   ::http/allowed-origins (constantly true)
                   ::http/container-options {}})))

inject-resolver のところで使用するリゾルバを差し替えていって、GraphQLの機能部分を作っていきます。

GraphQLスキーマ

今回使用するGraphQLスキーマはこちら。とてもシンプルですが、 friends の friends のような、ネットワークを掘り下げるクエリができるところがミソです。

schema {
  query: Query
}

type Query {
  persons: [Person!]!
}

type Person {
  id: Int
  name: String
  friends: [Person!]!
}

データ

最小実装なのでDBは使わず、データとデータ取得についてもハードコーディングしちゃいます。 person-friendships については、友人関係の有向グラフを作ってる感じです。

（本データは架空のデータですので、人間関係について思いを馳せないでください）

(def ^:const ^:private persons
  [{:id 1
    :name "夜神 月"}
   {:id 2
    :name "L"}
   {:id 3
    :name "夜神 総一郎"}
   {:id 4
    :name "弥 海砂"}
   {:id 5
    :name "ニア"}])


(def ^:const ^:private person-friendships
  [{:person_id 1 :friend_id 2}
   {:person_id 1 :friend_id 4}
   {:person_id 2 :friend_id 1}
   {:person_id 2 :friend_id 3}
   {:person_id 2 :friend_id 5}
   {:person_id 3 :friend_id 1}
   {:person_id 3 :friend_id 2}
   {:person_id 4 :friend_id 1}
   {:person_id 4 :friend_id 2}
   {:person_id 4 :friend_id 3}
   {:person_id 5 :friend_id 2}])

データ取得

データ取得についても簡単に定義します。それぞれ、RDB対してSQLを飛ばしているようなイメージだと思って見てください。

SQLクエリに相当する関数には、ロギングを仕込んでおきます。これで、GraphQLのクエリに対して、何回のSQLが発行されるのかを知ることができます。

(def ^:private person-map
  (-> (group-by :id persons)
      (update-vals first)))


(def ^:private friendship-map
  (-> (group-by :person_id person-friendships)
      (update-vals #(map (fn [{:keys [friend_id]}] (get person-map friend_id)) %))))


(defn- db-access-log
       [name]
       (log/info (str "Repository accessed. name: " name)))


(defn list-persons
      (db-access-log "list-persons")
      persons)


(defn fetch-friends-by-person-id
      [id]
      (db-access-log "fetch-friends-by-id")
      (get friendship-map id))

「N+1問題」を再現してみる

Laciniaのリゾルバについて

Laciniaのリゾルバについて、簡単に説明します。詳細は公式のドキュメントをご参照ください。

Laciniaのリゾルバは次のような関数を定義し、関数をフィールドに対応させる感じになっています。

(def resolver-map
  {:Query/some (fn [context args value] resolved-value)
   :SomeType/some_field (fn [context args value] resolved-value)
   :SomeType/other_field ...})

関数の引数にはそれぞれ次のような意味があります。

context: GraphQLスキーマ、受け取ったクエリ、DB接続情報など、アプリケーションのグローバルな状態や、セッションの情報を入れることができる枠
args: そのフィールドのフィールド引数に付与された値（今回は使いません）
value: そのフィールドを持つ親フィールド ( some_field の親となる SomeType ) の、解決済みの値

SomeType の some_field を取得しようとした際にリゾルバが発動し、結果をそのフィールドの値として返すという動きをします。例えば、以下のようなクエリの場合、

query Persons {
  persons {
    id
    name
  }
}

まずはトップレベルの Query/persons に対応するリゾルバが発動します。 persons の値が解決されたのちに、 Person/id と Person/name のリゾルバが発動する、といった流れになります。リゾルバが定義されていない場合、親の解決済みの値の中に入っていればそれを利用し、入っていなければ null を解決済みの値として返します。

素朴なリゾルバ

とりあえず、シンプルなリゾルバを作成します。

(def resolver-map
  {:Query/persons (fn [_ _ _] (repository/list-persons))
   :Person/friends (fn [_ _ {:keys [id]}] (repository/fetch-friends-by-id id))})

personsを取得した時点で id と name はすでに入っているのでリゾルバの定義は不要で、 friends はまだ入っていないので別途取得する必要があります。 persons に対しては、 list-persons を投げ、friends を取得するときは元の Person のIDから fetch-friends-by-person-id　で取得するような感じですね。

さて、ここで次のようなクエリを素直に投げてみます。

query Persons {
  persons {
    id
    name
    friends {
      id
      name
    }
  }
}

戻り値はこのような形になります。

{
  "data": {
    "persons": [
      {
        "id": 1,
        "name": "夜神 月",
        "friends": [
          {
            "id": 2,
            "name": "L"
          },
          {
            "id": 4,
            "name": "弥 海砂"
          }
        ]
      },
      {
        "id": 2,
        "name": "L",
        "friends": [
          {
            "id": 1,
            "name": "夜神 月"
          },
          {
            "id": 3,
            "name": "夜神 総一郎"
          },
          {
            "id": 5,
            "name": "ニア"
          }
        ]
      },
      {
        "id": 3,
        "name": "夜神 総一郎",
        "friends": [
          {
            "id": 1,
            "name": "夜神 月"
          },
          {
            "id": 2,
            "name": "L"
          }
        ]
      },
      {
        "id": 4,
        "name": "弥 海砂",
        "friends": [
          {
            "id": 1,
            "name": "夜神 月"
          },
          {
            "id": 2,
            "name": "L"
          },
          {
            "id": 3,
            "name": "夜神 総一郎"
          }
        ]
      },
      {
        "id": 5,
        "name": "ニア",
        "friends": [
          {
            "id": 2,
            "name": "L"
          }
        ]
      }
    ]
  }
}

この時のログはこんな感じになります。

INFO  io.pedestal.http - {:msg "POST /graphql", :line 80}
INFO  io.pedestal.http.cors - {:msg "cors request processing", :origin "http://localhost:8888", :allowed true, :line 84}
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: list-persons args: []
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [1]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [2]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [3]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [4]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [5]

5人の Person に対して、それぞれの friends を検索するので、 5+1 回のクエリを投げています。

一度にとればいいのでは？

list-person の時点で、friends をJOINしてしまえば、一発で取れそうですね。というわけで、リポジトリの方をJOINした感じのものに変更して、一発で取ってみようと思います。

(defn list-persons-with-friends
      []
      (db-access-log "list-persons-with-friends")
      (map #(assoc % :friends (get friendship-map (:id %))) persons))

リゾルバ側では、今度は persons の取得と同時に friends を取得するので、こんなふうになります。

(def joined-resolver-map
  {:Query/persons (fn [_ _ _] (repository/list-persons-with-friends))})

これで、先ほどと同じGraphQLクエリを投げてみますと、同じレスポンスが返ってくるのが確認できます。ログをチェックしてみると

INFO  io.pedestal.http - {:msg "POST /graphql", :line 80}
INFO  io.pedestal.http.cors - {:msg "cors request processing", :origin "http://localhost:8888", :allowed true, :line 84}
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: list-persons-with-friends args: []

といった感じで、クエリの回数も1回に抑えられます。めでたしめでたし。ではないんです。

今度は friends の friends を取ろうと、次のようなクエリを投げてみます。

query Persons {
  persons {
    id
    name
    friends {
      id
      name
      friends {
        id
        name
      }
    }
  }
}

1階層分のJOINを前提としたクエリだと、次のようなレスポンスとなります。

{
  "data": null,
  "extensions": {
    "errors": [
    {
      "message": "Non-nullable field was null.",
      "locations": [
        {
          "line": 8,
          "column": 7
        }
      ],
      "path": [
        "persons",
        0,
        "friends",
        0,
        "friends"
      ]
    },
    ...
  }
}

つまり、 Person.friends の値は取っているけど、 Person.friends.friends の値を取るには2回JOINをしないといけません。この方法でN+1問題を解決するには、想定されるJOINの最大数を決めて、その回数分だけJOINした結果を返す、というような方法を取ることになります。

その方法で実用上問題ないケースもあると思いますが、GraphQLは、このようなネットワークを表現するデータを柔軟に取得できるのが長所でもあります。使用できるクエリの形式自体に制限を加えるのは、その長所を損なうことになります。

素朴な実装の方だと、データの取得については問題ありません。

ログはこんな感じです。いわば N*N+1 って感じですが、この素朴な実装にも利点があるわけです。

INFO  io.pedestal.http - {:msg "POST /graphql", :line 80}
INFO  io.pedestal.http.cors - {:msg "cors request processing", :origin "http://localhost:8888", :allowed true, :line 84}
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: list-persons args: []
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [1]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [2]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [4]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [2]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [1]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [3]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [5]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [3]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [1]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [2]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [4]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [1]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [2]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [3]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [5]
INFO  lacinia-superlifter-sample.repository - Repository accessed. name: fetch-friends-by-id args: [2]

N+1問題の解決と、任意のネストしたクエリの実現、両方のバランスを取る方策がDataLoaderというわけです。

まとめ

という感じで、 N+1問題がどのように発生し、シンプルなJOINで解決しにくいということを説明しましたが、すでに結構な量になってしまったので、本来のDataLoaderの解説は次回にしたいと思います。

後編では、本題のsuperlifterを使ったDataLoaderの実装と、その意味合いについても書きたいと思います。

いつもの

Opt Technologies ではエンジニアを募集中です。カジュアル面談も可能ですので、下記リンク先よりお気軽にご応募ください。

Opt Technologies Magazine

オプトテクノロジーズ公式Webマガジン