Graph Machine Learning using TensorFlow

Intro

We have shown in related articles how StellarGraph can be used for node and linke predictions using diverse algorithms. All these algorithms effectively turn a graph structure into a more flat (tabular) structure so one can use traditional machine learning algorithms. For example, a random graph walk can collect inforation about the topology of a graph and this data can be added to the existing payload attached to a node or an edge. Using these intermediate ‘tricks’ one can in principle consume any of the existing machine learning approaches and frameworks. Keras and TensorFlow are no exception. You only need to work your way towards appropriate input and output adapter to ingest graph data.
TensorFlow has a separate development branch dedicated to graph learning which they call Neural Structured Learning (NSL). Much like their TensorFlow Probabilty framework for probabilistic reasoning and other TensorFlow extensions it’s a mixed bag; it can allow you to get things done but it also feels unpolished and the API is inpenetrable. On the other hand, if your pipeline is relying on TensorFlow code then this can be a way to increase your models by including graph data (like knowledge graphs or ontologies).
In this article the NSL extension is used to approach our favorite Cora dataset. The takeaway from the explanation is this:

  • in comparison to frameworks like StellarGraph (and other frameworks specifically designed to apply machine learning on graphs) the Keras/TensorFlow code needed to achieve similar results is tremendous. You have a lot of flexibility but it feels more like an uncharted territory than flexibility.
  • the data transformation necessary to make things happen is way more complex as well.
  • if you wish to include graph learning in your (business) projects you are better off with StellarGraph
  • the increased accuracy obtained by including graph data in the ‘normal’ data is not spectacular but can be significant in some domains (say cancer research or predictive analytics).

Setup and imports

    !pip install tf-nightly==2.2.0.dev20200119
    !pip install neural-structured-learning
    from __future__ import absolute_import, division, print_function, unicode_literals
    import neural_structured_learning as nsl
    import tensorflow as tf

The Cora dataset

We have used the dataset over and over again in previous articles and there is a separate article explaining in detail how to download it and how to interprete the data.
In contrast with StellarGraph the Cora set needs to be converted into a TFRecord format:

  1. Generate neighbor features using the original node features and the graph.
  2. Generate train and test data splits containing tf.train.Example instances.
  3. Persist the resulting train and test data in the TFRecord format.

The code necessary to do this is straightforward and convoluted at the same time:

    """Tool that preprocesses Cora data for Graph Keras trainers.
    The Cora dataset can be downloaded from:
    https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz
    In particular, this tool does the following:
    (a) Converts Cora data (cora.content) into TF Examples,
    (b) Parses the Cora citation graph (cora.cites),
    (c) Merges/combines the TF Examples and the graph, and
    (d) Writes the training and test data in TF Record format.
    The 'cora.content' has the following TSV format:
      publication_id<TAB>word_1<TAB>word_2<TAB>...<TAB>publication_label
    Each line of cora.content is a publication that:
    - Has an integer 'publication_id'
    - Described by a 0/1-valued word vector indicating the absence/presence of the
      corresponding word from the dictionary. In other words, each 'word_k' is
      either 0 or 1.
    - Has a string 'publication_label' representing the publication category.
    The 'cora.cites' is a TSV file that specifies a graph as a set of edges
    representing citation relationships among publications. 'cora.cites' has the
    following TSV format:
      source_publication_id<TAB>target_publication_id
    Each line of cora.cites represents an edge that 'source_publication_id' cites
    'target_publication_id'.
    This tool first converts all the 'cora.content' into TF Examples. Then for
    training data, this tool merges into each labeled Example the features of that
    Example's neighbors according to that instance's edges in the graph. Finally,
    the merged training examples are written to a TF Record file. The test data
    will be written to a TF Record file w/o joining with the neighbors.
    Sample usage:
    $ python preprocess_cora_dataset.py --max_nbrs=5
    """
    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function
    import collections
    import random
    import time
    from absl import app
    from absl import flags
    from absl import logging
    from neural_structured_learning.tools import graph_utils
    import six
    import tensorflow as tf
    FLAGS = flags.FLAGS
    FLAGS.showprefixforinfo = False
    flags.DEFINE_string(
        'input_cora_content', '/tmp/cora/cora.content',
        """Input file for Cora content that contains ID, words and labels.""")
    flags.DEFINE_string('input_cora_graph', '/tmp/cora/cora.cites',
                        """Input file for Cora citation graph in TSV format.""")
    flags.DEFINE_integer(
        'max_nbrs', None,
        'The maximum number of neighbors to merge into each labeled Example.')
    flags.DEFINE_float(
        'train_percentage', 0.8,
        """The percentage of examples to be created as training data. The rest
        are created as test data.""")
    flags.DEFINE_string(
        'output_train_data', '/tmp/cora/train_merged_examples.tfr',
        """Output file for training data merged with graph in TF Record format.""")
    flags.DEFINE_string('output_test_data', '/tmp/cora/test_examples.tfr',
                        """Output file for test data in TF Record format.""")
    def _int64_feature(*value):
      """Returns int64 tf.train.Feature from a bool / enum / int / uint."""
      return tf.train.Feature(int64_list=tf.train.Int64List(value=list(value)))
    def parse_cora_content(in_file, train_percentage):
      """Converts the Cora content (in TSV) to `tf.train.Example` instances.
      This function parses Cora content (in TSV), converts string labels to integer
      label IDs, randomly splits the data into training and test sets, and returns
      the training and test sets as outputs.
      Args:
        in_file: A string indicating the input file path.
        train_percentage: A float indicating the percentage of training examples
          over the dataset.
      Returns:
        train_examples: A dict with keys being example IDs (string) and values being
        `tf.train.Example` instances.
        test_examples: A dict with keys being example IDs (string) and values being
        `tf.train.Example` instances.
      """
      # Provides a mapping from string labels to integer indices.
      label_index = {
          'Case_Based': 0,
          'Genetic_Algorithms': 1,
          'Neural_Networks': 2,
          'Probabilistic_Methods': 3,
          'Reinforcement_Learning': 4,
          'Rule_Learning': 5,
          'Theory': 6,
      }
      # Fixes the random seed so the train/test split can be reproduced.
      random.seed(1)
      train_examples = {}
      test_examples = {}
      with open(in_file, 'rU') as cora_content:
        for line in cora_content:
          entries = line.rstrip('\n').split('\t')
          # entries contains [ID, Word1, Word2, ..., Label]; 'Words' are 0/1 values.
          words = map(int, entries[1:-1])
          features = {
              'words': _int64_feature(*words),
              'label': _int64_feature(label_index[entries[-1]]),
          }
          example_features = tf.train.Example(
              features=tf.train.Features(feature=features))
          example_id = entries[0]
          if random.uniform(0, 1) <= train_percentage:  # for train/test split.
            train_examples[example_id] = example_features
          else:
            test_examples[example_id] = example_features
      return train_examples, test_examples
    def _join_examples(seed_exs, nbr_exs, graph, max_nbrs):
      r"""Joins the `seeds` and `nbrs` Examples using the edges in `graph`.
      This generator joins and augments each labeled Example in `seed_exs` with the
      features of at most `max_nbrs` of the seed's neighbors according to the given
      `graph`, and yields each merged result.
      Args:
        seed_exs: A `dict` mapping node IDs to labeled Examples.
        nbr_exs: A `dict` mapping node IDs to unlabeled Examples.
        graph: A `dict`: source -> (target, weight).
        max_nbrs: The maximum number of neighbors to merge into each seed Example,
          or `None` if the number of neighbors per node is unlimited.
      Yields:
        The result of merging each seed Example with the features of its neighbors,
        as described by the module comment.
      """
      # A histogram of the out-degrees of all seed Examples. The keys of this dict
      # range from 0 to 'max_nbrs' (inclusive) if 'max_nbrs' is finite.
      out_degree_count = collections.Counter()
      def has_ex(node_id):
        """Returns true iff 'node_id' is in the 'seed_exs' or 'nbr_exs dict'."""
        result = (node_id in seed_exs) or (node_id in nbr_exs)
        if not result:
          logging.warning('No tf.train.Example found for edge target ID: "%s"',
                          node_id)
        return result
      def lookup_ex(node_id):
        """Returns the Example from `seed_exs` or `nbr_exs` with the given ID."""
        return seed_exs[node_id] if node_id in seed_exs else nbr_exs[node_id]
      def join_seed_to_nbrs(seed_id):
        """Joins the seed with ID `seed_id` to its out-edge graph neighbors.
        This also has the side-effect of maintaining the `out_degree_count`.
        Args:
          seed_id: The ID of the seed Example to start from.
        Returns:
          A list of (nbr_wt, nbr_id) pairs (in decreasing weight order) of the
          seed Example's top `max_nbrs` neighbors. So the resulting list will have
          size at most `max_nbrs`, but it may be less (or even empty if the seed
          Example has no out-edges).
        """
        nbr_dict = graph[seed_id] if seed_id in graph else {}
        nbr_wt_ex_list = [(nbr_wt, nbr_id)
                          for (nbr_id, nbr_wt) in six.iteritems(nbr_dict)
                          if has_ex(nbr_id)]
        result = sorted(nbr_wt_ex_list, reverse=True)[:max_nbrs]
        out_degree_count[len(result)] += 1
        return result
      def merge_examples(seed_ex, nbr_wt_ex_list):
        """Merges neighbor Examples into the given seed Example `seed_ex`.
        Args:
          seed_ex: A labeled Example.
          nbr_wt_ex_list: A list of (nbr_wt, nbr_id) pairs (in decreasing nbr_wt
            order) representing the neighbors of 'seed_ex'.
        Returns:
          The Example that results from merging the features of the neighbor
          Examples (as well as creating a feature for each neighbor's edge weight)
          into `seed_ex`. See the `join()` description above for how the neighbor
          features are named in the result.
        """
        # Make a deep copy of the seed Example to augment.
        merged_ex = tf.train.Example()
        merged_ex.CopyFrom(seed_ex)
        # Add a feature for the number of neighbors.
        merged_ex.features.feature['NL_num_nbrs'].int64_list.value.append(
            len(nbr_wt_ex_list))
        # Enumerate the neighbors, and merge in the features of each.
        for index, (nbr_wt, nbr_id) in enumerate(nbr_wt_ex_list):
          prefix = 'NL_nbr_{}_'.format(index)
          # Add the edge weight value as a new singleton float feature.
          weight_feature = prefix + 'weight'
          merged_ex.features.feature[weight_feature].float_list.value.append(nbr_wt)
          # Copy each of the neighbor Examples features, prefixed with 'prefix'.
          nbr_ex = lookup_ex(nbr_id)
          for (feature_name, feature_val) in six.iteritems(nbr_ex.features.feature):
            new_feature = merged_ex.features.feature[prefix + feature_name]
            new_feature.CopyFrom(feature_val)
        return merged_ex
      start_time = time.time()
      logging.info(
          'Joining seed and neighbor tf.train.Examples with graph edges...')
      for (seed_id, seed_ex) in six.iteritems(seed_exs):
        yield merge_examples(seed_ex, join_seed_to_nbrs(seed_id))
      logging.info(
          'Done creating and writing %d merged tf.train.Examples (%.2f seconds).',
          len(seed_exs), (time.time() - start_time))
      logging.info('Out-degree histogram: %s', sorted(out_degree_count.items()))
    def main(unused_argv):
      start_time = time.time()
      # Parses Cora content into TF Examples.
      train_examples, test_examples = parse_cora_content(FLAGS.input_cora_content,
                                                         FLAGS.train_percentage)
      graph = graph_utils.read_tsv_graph(FLAGS.input_cora_graph)
      graph_utils.add_undirected_edges(graph)
      # Joins 'train_examples' with 'graph'. 'test_examples' are used as *unlabeled*
      # neighbors for transductive learning purpose. In other words, the labels of
      # test_examples are not used.
      with tf.io.TFRecordWriter(FLAGS.output_train_data) as writer:
        for merged_example in _join_examples(train_examples, test_examples, graph,
                                             FLAGS.max_nbrs):
          writer.write(merged_example.SerializeToString())
      logging.info('Output training data written to TFRecord file: %s.',
                   FLAGS.output_train_data)
      # Writes 'test_examples' out w/o joining with the graph since graph
      # regularization is used only during training, not testing/serving.
      with tf.io.TFRecordWriter(FLAGS.output_test_data) as writer:
        for example in six.itervalues(test_examples):
          writer.write(example.SerializeToString())
      logging.info('Output test data written to TFRecord file: %s.',
                   FLAGS.output_test_data)
      logging.info('Total running time: %.2f minutes.',
                   (time.time() - start_time) / 60.0)
    if __name__ == '__main__':
      # Ensures TF 2.0 behavior even if TF 1.X is installed.
      tf.compat.v1.enable_v2_behavior()
      app.run(main)

With all of this in place you can run the script on the Cora data like so:

    !python preprocess_cora_dataset.py \
    --input_cora_content=/tmp/cora/cora.content \
    --input_cora_graph=/tmp/cora/cora.cites \
    --max_nbrs=5 \
    --output_train_data=/tmp/cora/train_merged_examples.tfr \
    --output_test_data=/tmp/cora/test_examples.tfr
preprocess_cora_dat 100%[===================>]  11.15K  --.-KB/s    in 0s
2020-02-02 08:25:58 (152 MB/s) - ‘preprocess_cora_dataset.py.1’ saved [11419/11419]
Reading graph file: /tmp/cora/cora.cites...
Done reading 5429 edges from: /tmp/cora/cora.cites (0.01 seconds).
Making all edges bi-directional...
Done (0.00 seconds). Total graph nodes: 2708
Joining seed and neighbor tf.train.Examples with graph edges...
Done creating and writing 2155 merged tf.train.Examples (1.09 seconds).
Out-degree histogram: [(1, 386), (2, 468), (3, 452), (4, 309), (5, 540)]
Output training data written to TFRecord file: /tmp/cora/train_merged_examples.tfr.
Output test data written to TFRecord file: /tmp/cora/test_examples.tfr.
Total running time: 0.04 minutes.

Variables and Hyperparameters

The file paths to the train and test data are based on the command line flag
values used to invoke the ‘preprocess_cora_dataset.py’ script above.

    ### Experiment dataset
    TRAIN_DATA_PATH = '/tmp/cora/train_merged_examples.tfr'
    TEST_DATA_PATH = '/tmp/cora/test_examples.tfr'
    ### Constants used to identify neighbor features in the input.
    NBR_FEATURE_PREFIX = 'NL_nbr_'
    NBR_WEIGHT_SUFFIX = '_weight'

Next, we’ll use a class defining the hyperparameters and constants used for training and evaluation.:

  • dropout_rate: Controls the rate of dropout following each fully
    connected layer
  • num_fc_units: The number of fully connected layers in our neural
    network.
  • train_epochs: The number of training epochs.
  • batch_size: Batch size used for training and evaluation.
  • num_classes: There are a total 7 different classes
  • max_seq_length: This is the size of the vocabulary and all instances in
    the input have a dense multi-hot, bag-of-words representation. In other
    words, a value of 1 for a word indicates that the word is present in the
    input and a value of 0 indicates that it is not.
  • distance_type: This is the distance metric used to regularize the sample
    with its neighbors.
  • graph_regularization_multiplier: This controls the relative weight of
    the graph regularization term in the overall loss function.
  • num_neighbors: The number of neighbors used for graph regularization.
    This value has to be less than or equal to the max_nbrs command-line
    argument used above when running preprocess_cora_dataset.py.
  • eval_steps: The number of batches to process before deeming evaluation
    is complete. If set to None, all instances in the test set are evaluated.

    class HParams(object):
      """Hyperparameters used for training."""
      def __init__(self):
        ### training parameters
        self.train_epochs = 100
        self.batch_size = 128
        self.dropout_rate = 0.5
        ### eval parameters
        self.eval_steps = None  # All instances in the test set are evaluated.
        ### dataset parameters
        self.num_classes = 7
        self.max_seq_length = 1433
        ### neural graph learning parameters
        self.distance_type = nsl.configs.DistanceType.L2
        self.graph_regularization_multiplier = 0.1
        self.num_neighbors = 1
        ### model architecture
        self.num_fc_units = [50, 50]
    HPARAMS = HParams()
    

Train and test data

The preprocessing already transformed the data into train an test data. Now we only have to read it in a mold it into a TFRecordDataset set.

    def parse_example(example_proto):
      """Extracts relevant fields from the `example_proto`.
      Args:
        example_proto: An instance of `tf.train.Example`.
      Returns:
        A pair whose first value is a dictionary containing relevant features
        and whose second value contains the ground truth labels.
      """
      # The 'words' feature is a multi-hot, bag-of-words representation of the
      # original raw text. A default value is required for examples that don't
      # have the feature.
      feature_spec = {
          'words':
              tf.io.FixedLenFeature([HPARAMS.max_seq_length],
                                    tf.int64,
                                    default_value=tf.constant(
                                        0,
                                        dtype=tf.int64,
                                        shape=[HPARAMS.max_seq_length])),
          'label':
              tf.io.FixedLenFeature((), tf.int64, default_value=-1),
      }
      # We also extract corresponding neighbor features in a similar manner to
      # the features above.
      for i in range(HPARAMS.num_neighbors):
        nbr_feature_key = '{}{}_{}'.format(NBR_FEATURE_PREFIX, i, 'words')
        nbr_weight_key = '{}{}{}'.format(NBR_FEATURE_PREFIX, i, NBR_WEIGHT_SUFFIX)
        feature_spec[nbr_feature_key] = tf.io.FixedLenFeature(
            [HPARAMS.max_seq_length],
            tf.int64,
            default_value=tf.constant(
                0, dtype=tf.int64, shape=[HPARAMS.max_seq_length]))
        # We assign a default value of 0.0 for the neighbor weight so that
        # graph regularization is done on samples based on their exact number
        # of neighbors. In other words, non-existent neighbors are discounted.
        feature_spec[nbr_weight_key] = tf.io.FixedLenFeature(
            [1], tf.float32, default_value=tf.constant([0.0]))
      features = tf.io.parse_single_example(example_proto, feature_spec)
      labels = features.pop('label')
      return features, labels
    def make_dataset(file_path, training=False):
      """Creates a `tf.data.TFRecordDataset`.
      Args:
        file_path: Name of the file in the `.tfrecord` format containing
          `tf.train.Example` objects.
        training: Boolean indicating if we are in training mode.
      Returns:
        An instance of `tf.data.TFRecordDataset` containing the `tf.train.Example`
        objects.
      """
      dataset = tf.data.TFRecordDataset([file_path])
      if training:
        dataset = dataset.shuffle(10000)
      dataset = dataset.map(parse_example)
      dataset = dataset.batch(HPARAMS.batch_size)
      return dataset
    train_dataset = make_dataset(TRAIN_DATA_PATH, training=True)
    test_dataset = make_dataset(TEST_DATA_PATH)

To get an idea of how the tensors look like:

    for feature_batch, label_batch in train_dataset.take(1):
      print('Feature list:', list(feature_batch.keys()))
      print('Batch of inputs:', feature_batch['words'])
      nbr_feature_key = '{}{}_{}'.format(NBR_FEATURE_PREFIX, 0, 'words')
      nbr_weight_key = '{}{}{}'.format(NBR_FEATURE_PREFIX, 0, NBR_WEIGHT_SUFFIX)
      print('Batch of neighbor inputs:', feature_batch[nbr_feature_key])
      print('Batch of neighbor weights:',
            tf.reshape(feature_batch[nbr_weight_key], [-1]))
      print('Batch of labels:', label_batch)
Feature list: ['NL_nbr_0_weight', 'NL_nbr_0_words', 'words']
Batch of inputs: tf.Tensor(
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]], shape=(128, 1433), dtype=int64)
Batch of neighbor inputs: tf.Tensor(
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]], shape=(128, 1433), dtype=int64)
Batch of neighbor weights: tf.Tensor(
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1.], shape=(128,), dtype=float32)
Batch of labels: tf.Tensor(
[1 2 3 1 4 0 4 6 5 2 2 2 4 2 2 4 3 2 3 0 5 6 1 2 2 2 2 0 3 6 6 1 2 1 0 2 2
 4 6 3 6 2 1 2 6 2 5 2 6 1 3 1 0 2 4 1 5 2 2 6 0 2 2 6 5 1 2 0 2 2 6 5 2 2
 2 1 4 1 1 1 2 4 2 2 2 1 3 2 1 6 3 5 0 2 3 2 2 6 4 2 2 4 1 5 4 6 1 3 2 6 0
 3 1 2 2 2 3 2 1 2 4 2 5 0 0 5 6 1], shape=(128,), dtype=int64)

and similarly

    for feature_batch, label_batch in test_dataset.take(1):
      print('Feature list:', list(feature_batch.keys()))
      print('Batch of inputs:', feature_batch['words'])
      nbr_feature_key = '{}{}_{}'.format(NBR_FEATURE_PREFIX, 0, 'words')
      nbr_weight_key = '{}{}{}'.format(NBR_FEATURE_PREFIX, 0, NBR_WEIGHT_SUFFIX)
      print('Batch of neighbor inputs:', feature_batch[nbr_feature_key])
      print('Batch of neighbor weights:',
            tf.reshape(feature_batch[nbr_weight_key], [-1]))
      print('Batch of labels:', label_batch)
Feature list: ['NL_nbr_0_weight', 'NL_nbr_0_words', 'words']
Batch of inputs: tf.Tensor(
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]], shape=(128, 1433), dtype=int64)
Batch of neighbor inputs: tf.Tensor(
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]], shape=(128, 1433), dtype=int64)
Batch of neighbor weights: tf.Tensor(
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.], shape=(128,), dtype=float32)
Batch of labels: tf.Tensor(
[5 2 2 2 1 2 6 3 2 3 6 1 3 6 4 4 2 3 3 0 2 0 5 2 1 0 6 3 6 4 2 2 3 0 4 2 2
 2 2 3 2 2 2 0 2 2 2 2 4 2 3 4 0 2 6 2 1 4 2 0 0 1 4 2 6 0 5 2 2 3 2 5 2 5
 2 3 2 2 2 2 2 6 6 3 2 4 2 6 3 2 2 6 2 4 2 2 1 3 4 6 0 0 2 4 2 1 3 6 6 2 6
 6 6 1 4 6 4 3 6 6 0 0 2 6 2 4 0 0], shape=(128,), dtype=int64)

The model

In order to show the difference between normal learning and graph learning we’ll use a base model and a graph’d model.
The sequential base model is a standard MLP with the amount of layers specified in the constant above:

    def make_mlp_sequential_model(hparams):
      """Creates a sequential multi-layer perceptron model."""
      model = tf.keras.Sequential()
      model.add(
          tf.keras.layers.InputLayer(
              input_shape=(hparams.max_seq_length,), name='words'))
      # Input is already one-hot encoded in the integer format. We cast it to
      # floating point format here.
      model.add(
          tf.keras.layers.Lambda(lambda x: tf.keras.backend.cast(x, tf.float32)))
      for num_units in hparams.num_fc_units:
        model.add(tf.keras.layers.Dense(num_units, activation='relu'))
        # For sequential models, by default, Keras ensures that the 'dropout' layer
        # is invoked only during training.
        model.add(tf.keras.layers.Dropout(hparams.dropout_rate))
      model.add(tf.keras.layers.Dense(hparams.num_classes, activation='softmax'))
      return model

The functional base model looks like this

    def make_mlp_functional_model(hparams):
      """Creates a functional API-based multi-layer perceptron model."""
      inputs = tf.keras.Input(
          shape=(hparams.max_seq_length,), dtype='int64', name='words')
      # Input is already one-hot encoded in the integer format. We cast it to
      # floating point format here.
      cur_layer = tf.keras.layers.Lambda(
          lambda x: tf.keras.backend.cast(x, tf.float32))(
              inputs)
      for num_units in hparams.num_fc_units:
        cur_layer = tf.keras.layers.Dense(num_units, activation='relu')(cur_layer)
        # For functional models, by default, Keras ensures that the 'dropout' layer
        # is invoked only during training.
        cur_layer = tf.keras.layers.Dropout(hparams.dropout_rate)(cur_layer)
      outputs = tf.keras.layers.Dense(
          hparams.num_classes, activation='softmax')(
              cur_layer)
      model = tf.keras.Model(inputs, outputs=outputs)
      return model

Finally, we subclass the the MLP model

    def make_mlp_subclass_model(hparams):
      """Creates a multi-layer perceptron subclass model in Keras."""
      class MLP(tf.keras.Model):
        """Subclass model defining a multi-layer perceptron."""
        def __init__(self):
          super(MLP, self).__init__()
          # Input is already one-hot encoded in the integer format. We create a
          # layer to cast it to floating point format here.
          self.cast_to_float_layer = tf.keras.layers.Lambda(
              lambda x: tf.keras.backend.cast(x, tf.float32))
          self.dense_layers = [
              tf.keras.layers.Dense(num_units, activation='relu')
              for num_units in hparams.num_fc_units
          ]
          self.dropout_layer = tf.keras.layers.Dropout(hparams.dropout_rate)
          self.output_layer = tf.keras.layers.Dense(
              hparams.num_classes, activation='softmax')
        def call(self, inputs, training=False):
          cur_layer = self.cast_to_float_layer(inputs['words'])
          for dense_layer in self.dense_layers:
            cur_layer = dense_layer(cur_layer)
            cur_layer = self.dropout_layer(cur_layer, training=training)
          outputs = self.output_layer(cur_layer)
          return outputs
      return MLP()

Using of the above we can output the characteristics of our base model

    base_model_tag, base_model = 'FUNCTIONAL', make_mlp_functional_model(HPARAMS)
    base_model.summary()
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
words (InputLayer)           [(None, 1433)]            0
_________________________________________________________________
lambda (Lambda)              (None, 1433)              0
_________________________________________________________________
dense (Dense)                (None, 50)                71700
_________________________________________________________________
dropout (Dropout)            (None, 50)                0
_________________________________________________________________
dense_1 (Dense)              (None, 50)                2550
_________________________________________________________________
dropout_1 (Dropout)          (None, 50)                0
_________________________________________________________________
dense_2 (Dense)              (None, 7)                 357
=================================================================
Total params: 74,607
Trainable params: 74,607
Non-trainable params: 0
_________________________________________________________________

Training

From here on all is similar to any other learning based on TensorFlow and training our base model is simply:

    base_model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    base_model.fit(train_dataset, epochs=HPARAMS.train_epochs, verbose=1)
Epoch 1/100
17/17 [==============================] - 0s 26ms/step - loss: 1.9442 - accuracy: 0.1675
Epoch 2/100
17/17 [==============================] - 0s 10ms/step - loss: 1.8739 - accuracy: 0.2770
Epoch 3/100
17/17 [==============================] - 0s 10ms/step - loss: 1.7915 - accuracy: 0.3374
Epoch 4/100
17/17 [==============================] - 0s 13ms/step - loss: 1.6789 - accuracy: 0.3698
...
Epoch 99/100
17/17 [==============================] - 0s 11ms/step - loss: 0.0542 - accuracy: 0.9842
Epoch 100/100
17/17 [==============================] - 0s 11ms/step - loss: 0.0405 - accuracy: 0.9898

Base model accuracy

Our base model achieves a 78% accuracy. Not spectacular but it only serves as a baseline for the next step.

    def print_metrics(model_desc, eval_metrics):
      """Prints evaluation metrics.
      Args:
        model_desc: A description of the model.
        eval_metrics: A dictionary mapping metric names to corresponding values. It
          must contain the loss and accuracy metrics.
      """
      print('\n')
      print('Eval accuracy for ', model_desc, ': ', eval_metrics['accuracy'])
      print('Eval loss for ', model_desc, ': ', eval_metrics['loss'])
      if 'graph_loss' in eval_metrics:
        print('Eval graph loss for ', model_desc, ': ', eval_metrics['graph_loss'])
    eval_results = dict(
        zip(base_model.metrics_names,
            base_model.evaluate(test_dataset, steps=HPARAMS.eval_steps)))
    print_metrics('Base MLP model', eval_results)
      5/Unknown - 0s 22ms/step - loss: 1.2329 - accuracy: 0.7830
Eval accuracy for  Base MLP model :  0.7830018
Eval loss for  Base MLP model :  1.2328713834285736

Graph regularization

Incorporating graph regularization into the loss term of an existing tf.Keras.Model requires just a few lines of code. The base model is wrapped to create a new tf.Keras subclass model, whose loss includes graph regularization.

    base_reg_model_tag, base_reg_model = 'FUNCTIONAL', make_mlp_functional_model(
        HPARAMS)
    # Wrap the base MLP model with graph regularization.
    graph_reg_config = nsl.configs.make_graph_reg_config(
        max_neighbors=HPARAMS.num_neighbors,
        multiplier=HPARAMS.graph_regularization_multiplier,
        distance_type=HPARAMS.distance_type,
        sum_over_axis=-1)
    graph_reg_model = nsl.keras.GraphRegularization(base_reg_model,
                                                    graph_reg_config)
    graph_reg_model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    graph_reg_model.fit(train_dataset, epochs=HPARAMS.train_epochs, verbose=1)
Epoch 1/100
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
17/17 [==============================] - 1s 78ms/step - loss: 1.9388 - accuracy: 0.1791 - graph_loss: 0.0083
Epoch 2/100
17/17 [==============================] - 0s 12ms/step - loss: 1.8447 - accuracy: 0.3030 - graph_loss: 0.0106
Epoch 3/100
17/17 [==============================] - 0s 13ms/step - loss: 1.7526 - accuracy: 0.3346 - graph_loss: 0.0228
Epoch 4/100
17/17 [==============================] - 0s 11ms/step - loss: 1.6512 - accuracy: 0.3675 - graph_loss: 0.0456
...
Epoch 99/100
17/17 [==============================] - 0s 15ms/step - loss: 0.0825 - accuracy: 0.9870 - graph_loss: 0.3407
Epoch 100/100
17/17 [==============================] - 0s 14ms/step - loss: 0.0823 - accuracy: 0.9856 - graph_loss: 0.3328

Graph regularization accuracy

With graph information added we increase our accuracy to 81% compared to the 78% baseline from above. Again, nothing hyperbolic but it shows that graph regularization leads to an improved model.

    eval_results = dict(
        zip(graph_reg_model.metrics_names,
            graph_reg_model.evaluate(test_dataset, steps= HPARAMS.eval_steps)))
    print_metrics('MLP + graph regularization', eval_results)
      5/Unknown - 0s 68ms/step - loss: 1.0859 - accuracy: 0.8156 - graph_loss: 0.0000e+00
Eval accuracy for  MLP + graph regularization :  0.8155515
Eval loss for  MLP + graph regularization :  1.0859381407499313
Eval graph loss for  MLP + graph regularization :  0.0