Skip to content

Conversation

@gabrieldemarmiesse
Copy link
Contributor

Summary

@taehoonlee
Follow-up on #11037
Still trying to make those tests look younger and prettier.

Related Issues

PR Overview

  • This PR requires new unit tests [y/n] (make sure tests are included)
  • This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • This PR is backwards compatible [y/n]
  • This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

@gabrieldemarmiesse
Copy link
Contributor Author

A unrelated test failed. It seems flaky with very low probability of failing. For history purposes and in case we need it later since it seems strange, here is the stacktrace:

_________________________ test_stateful_metrics[list] __________________________
[gw1] linux -- Python 3.6.6 /home/travis/miniconda/envs/test-environment/bin/python
metrics_mode = 'list'
    @keras_test
    @pytest.mark.parametrize('metrics_mode', ['list', 'dict'])
    def test_stateful_metrics(metrics_mode):
        np.random.seed(1334)
    
        class BinaryTruePositives(keras.layers.Layer):
            """Stateful Metric to count the total true positives over all batches.
    
            Assumes predictions and targets of shape `(samples, 1)`.
    
            # Arguments
                name: String, name for the metric.
            """
    
            def __init__(self, name='true_positives', **kwargs):
                super(BinaryTruePositives, self).__init__(name=name, **kwargs)
                self.stateful = True
                self.true_positives = K.variable(value=0, dtype='int32')
    
            def reset_states(self):
                K.set_value(self.true_positives, 0)
    
            def __call__(self, y_true, y_pred):
                """Computes the number of true positives in a batch.
    
                # Arguments
                    y_true: Tensor, batch_wise labels
                    y_pred: Tensor, batch_wise predictions
    
                # Returns
                    The total number of true positives seen this epoch at the
                        completion of the batch.
                """
                y_true = K.cast(y_true, 'int32')
                y_pred = K.cast(K.round(y_pred), 'int32')
                correct_preds = K.cast(K.equal(y_pred, y_true), 'int32')
                true_pos = K.cast(K.sum(correct_preds * y_true), 'int32')
                current_true_pos = self.true_positives * 1
                self.add_update(K.update_add(self.true_positives,
                                             true_pos),
                                inputs=[y_true, y_pred])
                return current_true_pos + true_pos
    
        metric_fn = BinaryTruePositives()
        config = metrics.serialize(metric_fn)
        metric_fn = metrics.deserialize(
            config, custom_objects={'BinaryTruePositives': BinaryTruePositives})
    
        # Test on simple model
        inputs = keras.Input(shape=(2,))
        outputs = keras.layers.Dense(1, activation='sigmoid', name='out')(inputs)
        model = keras.Model(inputs, outputs)
    
        if metrics_mode == 'list':
            model.compile(optimizer='sgd',
                          loss='binary_crossentropy',
                          metrics=['acc', metric_fn])
        elif metrics_mode == 'dict':
            model.compile(optimizer='sgd',
                          loss='binary_crossentropy',
                          metrics={'out': ['acc', metric_fn]})
    
        samples = 1000
        x = np.random.random((samples, 2))
        y = np.random.randint(2, size=(samples, 1))
    
        val_samples = 10
        val_x = np.random.random((val_samples, 2))
        val_y = np.random.randint(2, size=(val_samples, 1))
    
        # Test fit and evaluate
        history = model.fit(x, y, validation_data=(val_x, val_y),
                            epochs=1, batch_size=10)
        outs = model.evaluate(x, y, batch_size=10)
        preds = model.predict(x)
    
        def ref_true_pos(y_true, y_pred):
            return np.sum(np.logical_and(y_pred > 0.5, y_true == 1))
    
        # Test correctness (e.g. updates should have been run)
        np.testing.assert_allclose(outs[2], ref_true_pos(y, preds), atol=1e-5)
    
        # Test correctness of the validation metric computation
        val_preds = model.predict(val_x)
        val_outs = model.evaluate(val_x, val_y, batch_size=10)
        assert_allclose(val_outs[2], ref_true_pos(val_y, val_preds), atol=1e-5)
        assert_allclose(val_outs[2], history.history['val_true_positives'][-1],
                        atol=1e-5)
    
        # Test with generators
        gen = [(np.array([x0]), np.array([y0])) for x0, y0 in zip(x, y)]
        val_gen = [(np.array([x0]), np.array([y0])) for x0, y0 in zip(val_x, val_y)]
        history = model.fit_generator(iter(gen), epochs=1, steps_per_epoch=samples,
                                      validation_data=iter(val_gen),
                                      validation_steps=val_samples)
        outs = model.evaluate_generator(iter(gen), steps=samples, workers=0)
        preds = model.predict_generator(iter(gen), steps=samples, workers=0)
    
        # Test correctness of the metric re ref_true_pos()
        np.testing.assert_allclose(outs[2], ref_true_pos(y, preds),
                                   atol=1e-5)
    
        # Test correctness of the validation metric computation
        val_preds = model.predict_generator(iter(val_gen), steps=val_samples, workers=0)
        val_outs = model.evaluate_generator(iter(val_gen), steps=val_samples, workers=0)
        np.testing.assert_allclose(val_outs[2], ref_true_pos(val_y, val_preds),
                                   atol=1e-5)
        np.testing.assert_allclose(val_outs[2],
                                   history.history['val_true_positives'][-1],
>                                  atol=1e-5)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=1e-05
E       
E       (mismatch 100.0%)
E        x: array(2.)
E        y: array(3.)
tests/keras/metrics_test.py:219: AssertionError

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@fchollet fchollet merged commit c7f4ad5 into keras-team:master Sep 1, 2018
jlherren added a commit to jlherren/keras that referenced this pull request Sep 3, 2018
* keras/master: (327 commits)
  Added in_train_phase and in_test_phase in the numpy backend. (keras-team#11061)
  Make sure the data_format argument defaults to ‘chanels_last’ for all 1D sequence layers.
  Speed up backend tests (keras-team#11051)
  Skipped some duplicated tests. (keras-team#11049)
  Used decorators and WITH_NP to avoid tests duplication. (keras-team#11050)
  Cached the theano compilation directory. (keras-team#11048)
  Removing duplicated backend tests. (keras-team#11037)
  [P, RELNOTES] Conv2DTranspose supports dilation (keras-team#11029)
  Doc Change: Change in shape for CIFAR Datasets (keras-team#11043)
  Fix line too long in mnist_acgan (keras-team#11040)
  Enable using last incomplete minibatch (keras-team#8344)
  Better UX (keras-team#11039)
  Update lstm text generation example (keras-team#11038)
  fix a bug, load_weights doesn't return anything (keras-team#11031)
  Speeding up the tests by reducing the number of K.eval(). (keras-team#11036)
  [P] Expose monitor value getter for easier subclass (keras-team#11002)
  [RELNOTES] Added the mode "bilinear" in the upscaling2D layer. (keras-team#10994)
  Separate pooling test from convolutional test and parameterize test case (keras-team#10975)
  Fix issue with non-canonical TF version name format.
  Allow TB callback to display float values.
  ...
@gabrieldemarmiesse gabrieldemarmiesse deleted the duplicated_tests_in_the_backend_again branch September 18, 2018 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants