cntk.layers.sequence module¶
First / higher-order functions over sequences, like Recurrence()
.
-
Delay
(T=1, initial_state=0, name='')[source]¶ Layer factory function to create a layer that delays input the input by a given number of time steps. Negative means future. This is provided as a layer that wraps
delay()
so that it can easily be used in a Sequential() expression.Example
>>> # create example input: one sequence with 4 tensors of shape (3, 2) >>> from cntk.layers import Sequential >>> from cntk.layers.typing import Tensor, Sequence >>> x = C.input_variable(**Sequence[Tensor[2]]) >>> x0 = np.reshape(np.arange(6,dtype=np.float32),(1,3,2)) >>> x0 array([[[ 0., 1.], [ 2., 3.], [ 4., 5.]]], dtype=float32) >>> # trigram expansion: augment each item of the sequence with its left and right neighbor >>> make_trigram = Sequential([tuple(Delay(T) for T in (-1,0,1)), # create 3 shifted versions ... splice]) # concatenate them >>> y = make_trigram(x) >>> y(x0) [array([[ 2., 3., 0., 1., 0., 0.], [ 4., 5., 2., 3., 0., 1.], [ 0., 0., 4., 5., 2., 3.]], dtype=float32)] >>> # --(t-1)-- ---t--- --(t+1)--
Parameters: - T (int) – the number of time steps to look into the past, where negative values mean to look into the future, and 0 means a no-op (default 1).
- initial_state – tensor or scalar representing the initial value to be used when the input tensor is shifted in time.
- name (str, optional) – the name of the Function instance in the network
Returns: A function that accepts one argument (which must be a sequence) and returns it delayed by
T
stepsReturn type:
-
Fold
(folder_function, go_backwards=False, initial_state=0, return_full_state=False, name='')[source]¶ Layer factory function to create a function that runs a step function recurrently over an input sequence, and returns the final state. This is often used for embeddings of sequences, e.g. in a sequence-to-sequence model. Pseudo-code:
# pseudo-code for h = Fold(step_function)(x) # x: input sequence of tensors along the dynamic axis # h: resulting final step-function output tensor that no longer has a dynamic axis h = initial_state # h = output of previous step ("state"), and also the final result for x_n in x: # pseudo-code for looping over all steps of input sequence along its dynamic axis h = step_function(h, x_n) # pass previous state and new data to step_function -> new state # now h is the result of the final invocation of the step function
Fold()
is the same asRecurrence()
except that only the final state is returned (whereasRecurrence()
returns the entire state sequence). Hence, this documentation will only focus on the differences toRecurrence()
, please seeRecurrence()
for a detailed information on parameters.Commonly, the
folder_function
is a recurrent block such as an LSTM. However, one can pass any binary function. E.g. passingplus
will sum up all items of a sequence; whileelement_max
would perform a max-pooling over all items of the sequence.Note: CNTK’s Fold() is similar to the fold() catamorphism known from functional programming.
go_backwards=False
corresponds to a fold-left, andTrue
to a fold-right, except that thefolder_function
signature is always the one of fold-left.Example
>>> from cntk.layers import * >>> from cntk.layers.typing import *
>>> # sequence classifier. Maps a one-hot word sequence to a scalar probability value. >>> # The recurrence is a Fold(), meaning only the final hidden state is produced. >>> # The Label() layer allows to access the final hidden layer by name. >>> sequence_classifier = Sequential([ Embedding(300), ... Fold(LSTM(500)), ... Dense(1, activation=sigmoid) ])
>>> # element-wise max-pooling over an input sequence >>> x = C.input_variable(**Sequence[Tensor[2]]) >>> x0 = np.array([[ 1, 2 ], ... [ 6, 3 ], ... [ 4, 2 ], ... [ 8, 1 ], ... [ 6, 0 ]]) >>> seq_max_pool = Fold(C.element_max) >>> y = seq_max_pool(x) >>> y(x0) array([[ 8., 3.]], dtype=float32)
>>> # element-wise sum over an input sequence >>> seq_sum = Fold(C.plus) >>> y = seq_sum(x) >>> y(x0) array([[ 25., 8.]], dtype=float32)
Parameters: - folder_function (
Function
or equivalent Python function) – This function must have N+1 inputs and N outputs, where N is the number of state variables (typically 1 for GRU and plain RNNs, and 2 for LSTMs). - go_backwards (bool, defaults to
False
) – ifTrue
then run the recurrence from the end of the sequence to the start. - initial_state (scalar or tensor without batch dimension; or a tuple thereof) – the initial value for the state. This can be a constant or a learnable parameter. In the latter case, if the step function has more than 1 state variable, this parameter must be a tuple providing one initial state for every state variable.
- return_full_state (bool, defaults to
False
) – ifTrue
and the step function has more than one state variable, then the layer returns the final value of a all state variables (a tuple of sequences); whereas if not given orFalse
, only the final value of the first of the state variables is returned to the caller. - name (str, optional) – the name of the Function instance in the network
Returns: A function that accepts one argument (which must be a sequence) and performs the fold operation on it
Return type: - folder_function (
-
PastValueWindow
(window_size, axis, go_backwards=False, name='')[source]¶ Layer factory function to create a function that returns a static, maskable view for N past steps over a sequence along the given ‘axis’. It returns two matrices: a value matrix, shape=(N,dim), and a valid window, shape=(N,1).
This is used for attention modeling. CNTK presently does not support nested dynamic axes. Since attention models require nested axes (encoder hidden state vs. decoder hidden state), this layer can be used to map the encoder’s dynamic axis to a static tensor axis. The static axis has a maximum length (
window_size
). To account for shorter input sequences, this function also returns a validity mask of the same axis dimension. Longer sequences will be truncated.Example
>>> # create example input: one sequence with 4 tensors of shape (3, 2) >>> from cntk.layers import Sequential >>> from cntk.layers.typing import Tensor, Sequence >>> x = C.input_variable(**Sequence[Tensor[2]]) >>> x0 = np.reshape(np.arange(6,dtype=np.float32),(1,3,2)) >>> x0 array([[[ 0., 1.], [ 2., 3.], [ 4., 5.]]], dtype=float32) >>> # convert dynamic-length sequence to a static-dimension tensor >>> to_static_axis = PastValueWindow(4, axis=-2) # axis=-2 means second last >>> y = to_static_axis(x) >>> value, valid = y(x0) >>> # 'value' contains the items from the back, padded with 0 >>> value array([[[ 4., 5.], [ 2., 3.], [ 0., 1.], [ 0., 0.]]], dtype=float32) >>> # 'valid' contains a scalar 1 for each valid item, and 0 for the padded ones >>> # E.g., when computing the attention softmax, only items with a 1 should be considered. >>> valid array([[[ 1.], [ 1.], [ 1.], [ 0.]]], dtype=float32)
Parameters: - window_size (int) – maximum number of items in sequences. The axis will have this dimension.
- axis (int or
Axis
, optional, keyword only) – axis along which the concatenation will be performed - name (str, optional, keyword only) – the name of the Function instance in the network
Returns: A function that accepts one argument, which must be a sequence. It returns a fixed-size window of the last
window_size
items, spliced alongaxis
.Return type:
-
Recurrence
(step_function, go_backwards=False, initial_state=0, return_full_state=False, name='')[source]¶ Layer factory function that implements a recurrent model, including the common RNN, LSTM, and GRU recurrences. This factory function creates a function that runs a step function recurrently over an input sequence, where in each step, Recurrence() will pass to the step function a data input as well as the output of the previous step. The following pseudo-code repesents what happens when you call a Recurrence() layer:
# pseudo-code for y = Recurrence(step_function)(x) # x: input sequence of tensors along the dynamic axis # y: resulting sequence of outputs along the same dynamic axis y = [] # result sequence goes here s = initial_state # s = output of previous step ("state") for x_n in x: # pseudo-code for looping over all steps of input sequence along its dynamic axis s = step_function(s, x_n) # pass previous state and new data to step_function -> new state y.append(s)
The common step functions are
LSTM()
,GRU()
, andRNNStep()
, but the step function can be anyFunction
or Python function. The signature of a step function with a single state variable must be(h_prev, x) -> h
, whereh_prev
is the previous state,x
is the new data input, and the output is the new state. The step function will be called item by item, resulting in a sequence of the same length as the input.Step functions can have more than one state output, e.g.
LSTM()
. In this case, the first N arguments are the previous state, followed by one more argument that is the data input; and its output must be a tuple of N values. In this case, the recurrence operation will, by default, return the first of the state variables (in the LSTM case, theh
), while additional state variables are internal (like the LSTM’sc
). If all state variables should be returned, passreturn_full_state=True
.To provide your own step function, just use any
Function
(or equivalent Python function) that has a signature as described above. For example, a cumulative sum over a sequence can be computed asRecurrence(plus)
, where each step consists of plus(s,x_n), where s is the output of the previous call and hence the cumulative sum of all elements up to x_n. Another example is a GRU layer with projection, which could be realized asRecurrence(GRU(500) >> Dense(200))
, where the projection is applied to the hidden state as fed back to the next step.F>>G
is a short-hand forSequential([F, G])
.Optionally, the recurrence can run backwards. This is useful for constructing bidirectional models.
initial_state
must be a constant. To pass initial_state as a data input, e.g. for a sequence-to-sequence model, useRecurrenceFrom()
instead.Note:
Recurrence()
is the equivalent to what in functional programming is often calledscanl()
.Example
>>> from cntk.layers import Sequential >>> from cntk.layers.typing import Tensor, Sequence
>>> # a recurrent LSTM layer >>> lstm_layer = Recurrence(LSTM(500))
>>> # a bidirectional LSTM layer >>> # using function tuples to implement a bidirectional LSTM >>> bi_lstm_layer = Sequential([(Recurrence(LSTM(250)), # first tuple entry: forward pass ... Recurrence(LSTM(250), go_backwards=True)), # second: backward pass ... splice]) # splice both on top of each other >>> bi_lstm_layer.update_signature(Sequence[Tensor[13]]) >>> bi_lstm_layer.shape # shape reflects concatenation of both output states (500,) >>> tuple(str(axis.name) for axis in bi_lstm_layer.dynamic_axes) # (note: str() needed only for Python 2.7) ('defaultBatchAxis', 'defaultDynamicAxis')
>>> # custom step function example: using Recurrence() to >>> # compute the cumulative sum over an input sequence >>> x = C.input_variable(**Sequence[Tensor[2]]) >>> x0 = np.array([[ 3, 2], ... [ 13, 42], ... [-100, +100]]) >>> cum_sum = Recurrence(C.plus, initial_state=Constant([0, 0.5])) >>> y = cum_sum(x) >>> y(x0) [array([[ 3. , 2.5], [ 16. , 44.5], [ -84. , 144.5]], dtype=float32)]
Parameters: - step_function (
Function
or equivalent Python function) – This function must have N+1 inputs and N outputs, where N is the number of state variables (typically 1 for GRU and plain RNNs, and 2 for LSTMs). - go_backwards (bool, defaults to
False
) – ifTrue
then run the recurrence from the end of the sequence to the start. - initial_state (scalar or tensor without batch dimension; or a tuple thereof) – the initial value for the state. This can be a constant or a learnable parameter. In the latter case, if the step function has more than 1 state variable, this parameter must be a tuple providing one initial state for every state variable.
- return_full_state (bool, defaults to
False
) – ifTrue
and the step function has more than one state variable, then the layer returns a all state variables (a tuple of sequences); whereas if not given orFalse
, only the first state variable is returned to the caller. - name (str, optional) – the name of the Function instance in the network
Returns: A function that accepts one argument (which must be a sequence) and performs the recurrent operation on it
Return type: - step_function (
-
RecurrenceFrom
(step_function, go_backwards=False, return_full_state=False, name='')[source]¶ Layer factory function to create a function that runs a step function recurrently over an input sequence, with initial state. This layer is very similar to
Recurrence()
, except that the initial state is not a constant but computed from input data. Thus, in RecurrenceFrom(), the initial state is passed to the layer function as a data input rather than, like Recurrence(), as an initialization parameter to the factory function. This form is meant for use in sequence-to-sequence scenarios. This documentation only covers this case; for additional information on parameters, seeRecurrence()
. In pseudo-code:# pseudo-code for y = RecurrenceFrom(step_function)(s,x) # x: input sequence of tensors along the dynamic axis # s: initial state for the recurrence (computed from input data elsewhew) # y: resulting sequence of outputs along the same dynamic axis y = [] # result sequence goes here for x_n in x: # pseudo-code for looping over all steps of input sequence along its dynamic axis s = step_function(s, x_n) # pass previous state and new data to step_function -> new state y.append(s)
The layer function returned by this factory function accepts the initial state as data argument(s). The initial state can be non-sequential data, as one would have for a plain sequence-to-sequence model, or sequential data. In the latter case, the last item is the initial state.
Example
>>> from cntk.layers import * >>> from cntk.layers.typing import *
>>> # a plain sequence-to-sequence model in training (where label length is known) >>> en = C.input_variable(**SequenceOver[Axis('m')][SparseTensor[20000]]) # English input sentence >>> fr = C.input_variable(**SequenceOver[Axis('n')][SparseTensor[30000]]) # French target sentence
>>> embed = Embedding(300) >>> encoder = Recurrence(LSTM(500), return_full_state=True) >>> decoder = RecurrenceFrom(LSTM(500)) # decoder starts from a data-dependent initial state, hence -From() >>> emit = Dense(30000) >>> h, c = encoder(embed(en)).outputs # LSTM encoder has two outputs (h, c) >>> z = emit(decoder(h, c, sequence.past_value(fr))) # decoder takes encoder outputs as initial state >>> loss = C.cross_entropy_with_softmax(z, fr)
Parameters: - step_function (
Function
or equivalent Python function) – This function must have N+1 inputs and N outputs, where N is the number of state variables (typically 1 for GRU and plain RNNs, and 2 for LSTMs). - go_backwards (bool, defaults to
False
) – ifTrue
then run the recurrence from the end of the sequence to the start. - initial_state (scalar or tensor without batch dimension; or a tuple thereof) – the initial value for the state. This can be a constant or a learnable parameter. In the latter case, if the step function has more than 1 state variable, this parameter must be a tuple providing one initial state for every state variable.
- return_full_state (bool, defaults to
False
) – ifTrue
and the step function has more than one state variable, then the layer returns a all state variables (a tuple of sequences); whereas if not given orFalse
, only the first state variable is returned to the caller. - name (str, optional) – the name of the Function instance in the network
Returns: A function that accepts arguments
(initial_state_1, initial_state_2, ..., input_sequence)
, where the number of initial state variables must match the step function’s. The initial state can be a sequence, in which case its last (or first ifgo_backwards
) item is used.Return type: - step_function (
-
UnfoldFrom
(generator_function, until_predicate=None, length_increase=1, name='')[source]¶ Layer factory function to create a function that implements a recurrent generator. Starting with a seed state, the
UnfoldFrom()
layer repeatedly appliesgenerator_function
and emits the sequence of results. It stops after a maximum number of steps, or earlier when, if provided, until_predicate evaluates to True for the current output. The maximum number of steps is based on a second input from which only the dynamic-axis information is used. This is best explained in pseudo-code:# pseudo-code for y = UnfoldFrom(generator_function, until_predicate)(s,axis_like) # s: initial state for the recurrence (computed from input data elsewhew), # axis_like: any input with a dynamic axis, unfold will happen along the same dynamic axis, # y: resulting sequence of outputs along the same dynamic axis y = [] # result sequence goes here for _ in axis_like: # pseudo-code for looping over all steps of dynamic axis of axis_like s = generator_function(s) # pass previous state and new data to step_function -> new state y.append(s) if until_predicate(s): break # now y is the output of repeatedly applying generator_function()
A typical application is the decoder of a sequence-to-sequence model, the generator function
f
accepts a two-valued state, with the first being an emitted word, and the second being an internal recurrent state. The initial state would be a tuple(w0, h0)
wherew0
represents the sentence-start symbol, andh0
is a thought vector that encodes the input sequence (as obtained from aFold()
operation).A variant allows the state and the emitted sequence to be different. In that case,
f
returns a tuple (output value, new state), andUnfoldFrom(f)(s)
would emit the sequencef(s)[0], f(f(s)[1])[0], f(f(f(s)[1])[1])[0], ...
.The maximum length of the output sequence is not unlimited, but determined by the argument to the layer function, multiplied by an optional increase factor.
Optionally, a function can be provided to denote that the end of the sequence has been reached.
Note: In the context of functional programming, the first form of this operation is known as the unfold() anamorphism.
Example
TO BE PROVIDED after signature changes.
Parameters: - generator_function (
Function
or equivalent Python function) – This function must have N inputs and a N-tuple-valued output, where N is the number of state variables. If the emitted value should be different from the state, then the function should have a tuple of N+1 outputs, where the first output is the value to emit, while the others are the state. - until_predicate (
Function
or equivalent Python function) – A function that denotes when the last element of the unfold has been emitted. It takes the same number of arguments as the generator, and returns a scalar that must be 1 for the last element of the sequence, and 0 otherwise. This is subject to the maximum length as determined by the input sequence andlength_increase
. If this parameter is not provided, the output length will be equal to the specified maximum length. - length_increase (float, defaults to 1) – the maximum number of output items is equal to the number of items of the dynamic_axis_like argument to the returned unfold() function, multiplied by this factor. For example, pass 1.5 here if the output sequence can be at most 50% longer than the input.
- name (str, optional) – the name of the Function instance in the network
Returns: A function that accepts two arguments (initial state and dynamic_axis_like), and performs the unfold operation on it. The initial state argument is the initial state for the recurrence. The dynamic_axis_like must be a sequence and provides a reference for the maximum length of the output sequence.
Return type: - generator_function (