The following essay is excerpted from the book ‘Long Short Term Memory’ (2017), a collaboration between the author and a group of deep learning models. The book can be accessed here:
Long Short Term Memory (STD-IO 028)
What does it mean to hijack the latent space of a computation? Not just any computation, but a cognitive act about which we can say – a decision was made, an inference took place here. First there is the admission of a latent space itself, an interiority – to assert an algorithm could develop, in the language of Kant, an ‘inner sense’. Then there are the implications – that computation could maintain its own myriad languages of thought, as Fodor once proposed,1 that its acts are not performances for us, may not be performances at all, that they may ultimately be unintelligible to human reason.
By latent space we denote the tensorial data passed between inner layers of an Artificial Neural Network (ANN). Tensors are high-dimensional representations of input data, structures which are reshaped as they flow through the net. Reshaping here refers to dimensional plasticity, to moving fluidly between representations of an input space, learning higher level embeddings of data – in short, engaging in acts of multi-level abstraction.
The crisis of explication that characterises contemporary AI can be viewed either as a semantic limitation proper to the ANN model – its inability to attain the requisite level of concept formation – or instead a more fundamental limitation on the order of linguistic correspondence, a problem arising from the act of mapping human concepts to this latent space. A non-correspondence of vocabularies is merely the observation that no necessary bijection exists between language sets. The Inuit may create n words for snow, just as ‘umami’ may only be translatable through analogy, but this fact alone should not prompt a descent to relativism. An altogether stronger claim is at stake here, namely the incommensurability of cognitive acts mediated by diverse languages, a principle first proposed by Feyerabend in historical form,2 and subsequently critiqued by Putnam.3
A computational theory of mind can quickly converge on its own claim of incommensurability, with implications for the epistemic status of inferences made by AI. In this account, reason is modeled as a set of linguistic statements, a ‘canon’ of every sensical alethic statement in a language. The assumption here is that reason can be modeled as a formal grammar R, given by the tuple:
R = (V, T, P, S)
Where V is a set of variables representing the vocabulary (a set of strings), T is a finite set of terminals, namely the symbols that form sentences, P is a set of production rules that form the recursive definition of a language, and S is a start symbol from which induction can progress.
R is conceived as an adaptive grammar of inferential rules, learned via interaction with an environment, through a procedure we shall leave undefined for now. In this formalism, sets of statements are themselves sensical propositions, they can be combined and retain not only soundness but meaning. R is constrained by its environment at the level of rule production, but not in terms of its generative capacity. To propose an infinite domain of such statements is to impose no theoretical limit on the conceptual labour characteristic of reasoning.
Even within the invariant grammar of R, a power set of statements is irreducible to the canon, demonstrable through Cantor’s Theorem – it is always possible to diagonalize out of R. It follows that reason is not conceivable as a canon at all, but instead represents an organon of thought. The epistemic repercussions are considerable – if AI represents a bootstrapping of human rationality in the project of unpacking reasons, human reason cannot presume to circumscribe this domain. The aim of such a model is not merely to instigate a relativistic attitude to AI, but rather to highlight the intractability of interpretation in the face of runaway statistical inference, to get a sense of the epistemic limits of said labour.
The strictly formal conception of reason offered by such a theory is not without its problems. Brandom reminds us that an account of reason cannot dismiss either the pragmatic insistence on material rules of inference – the practice of concept use – or the rational emphasis on the expressive role of logical vocabulary.4 Likewise, the Sellarsian view of concept acquisition as “mastery of the use of a word” implies a linguistic act integral to material practice. Pragmatic objections as to the role of social speech acts in concept formation can be integrated into the model in the form of agent interactions, but material correctness remains an elusive concept in this formalism. A verificationist theory of meaning – the claim that statements cannot be rational premises without material validity – is irreconcilable with the unpacking of reasons sketched out by a purely set theoretic account. This semantic hurdle provides a challenge to the functional closure of reason, best exemplified by Putnam in developing the position known as semantic externalism, the notion that meaning is not in the mind.5
Such computationalist theories of mind can be accused of over-emphasizing the inferential role of formal logic in reasoning – of conflating logic with the act of justifying, of giving and asking for reasons. In Brandom’s expressivist account of reason, the role of logical vocabulary is not as metalanguage, but rather a formalization of ordinary vocabulary to the level of propositions – an epistemic mediator of truth statements, akin to an expressive toolkit which makes explicit our commitments.6 Logic neither undergirds nor circumscribes reason in Brandom’s scheme – it is neither foundational nor transcendent, but rather autonomous – one can reason without recourse to logic, simply grasping the how of causes without the formal expression of such. Reasoning can be, in a sense, informal, without descending into purely irrational belief, and this informs Brandom’s subtle rendering of reason, in which normative statements provide a bridge to logical vocabulary, a position he calls normative pragmatics.
Nevertheless, contemporary AI confronts us with modes of statistical inference that lack explication but exhibit validity within domains marked out by human epistemic practices. An open attitude to inhuman reasoning might instead view the semantic void at the heart of AI as a result of linguistic incommensurables arising from a process of abstraction. Abstraction as a practice of mapping between domains introduces a challenge to a theory of language in the realm of representation, namely of providing an account of how cognition transitions in and out of the properly linguistic conceptual domain.
One might consider such a conceptual domain emerging from the learned tensorial representations native to ANNs. A vector space may not seem a plausible basis for a language at first sight, but what exactly is required for a language? No more than a vocabulary and a syntax. ANNs’ ability to learn internal embeddings of various spoken languages is the basis for recent developments in natural language processing.7 Assuming a distributional semantics, relationships between words are captured in a lower dimensional vector space, allowing the model to exhibit an awareness of abstract concept classes, to construct analogies, and so on. Such abstractions emerge from the curse of dimensionality – namely the difficulty of working with sparse vector representations – the latent semantic indexing is simply a side effect of this learned representation.
Embeddings are not limited solely to natural language – any input domain can be collapsed into a common vector space, as evinced by models which combine image and word data into a single embedding, allowing for breakthroughs in image captioning.8 Autoencoders, another family of ANNs, also leverage an ability to reduce the dimensionality of an input domain, essentially achieving a distributed form of compression akin to an embedding. These acts alone are not evidence of language formation as such, but rather a capacity to abstract both the syntax and semantics of existing (visual, textual) vocabularies into a novel encoding. It is in transitioning out of embeddings back into the input domain that we see how they constitute internal languages – take for example, translation ANNs that are able to translate between language pairs they were never explicitly trained on, indicating an internal “interlingua” with a strong claim to the status of a language.9
The era of feature engineering – spoon-feeding a vocabulary to a model which could be used to reason about its inferences, to account for its decisions by shaping its representations of the input data at the outset, is quickly receding. This conceptual dictation of man over machine turned out to be counterproductive – sidelining human concepts improved the models’ ability to generalize. Even within supervised deep learning today, there is no shared means of reasoning about decisions – concepts are applied as labels attached to desired classificatory outcomes, which themselves cannot be the basis for inference.
If embeddings create the ground for internal language models, then agent interactions form the basis for the development of external languages proper to ANNs. Machine learning techniques based on interaction – namely adversarial modes of learning – have created a phase transition in the generative capacity of AI models. A predictive processing (PP) model of mind ascribes a central role to such generative models in its account of perception.10 In PP, perception is characterised as the output of a predictive model, and sense data no more than an error to be back-propagated. The refinement of such models, which is the continual process of perception itself, provides the means to learn a causal matrix that underpins our knowledge of the world. Such an account of perception resonates with contemporary deep learning, in which generative nets often train themselves against discriminatory nets. The development of learning techniques which focus on agent interaction promises a fertile ground for agent languages, arising from acts of communication in service of just such a generative optimization process as that posited by PP.
A key architectural development for dealing with language in deep learning models is Long Short Term Memory (LSTM). LSTMs constitute a family of ANNs composed of recurring units housing tanh and sigmoid activation functions marshaled by forget gates. The nets learn which information to retain, and which to discard, when performing tasks on sequences of data. This is an architecture for forgetting, imperfect recall as a means of managing attention at different scales. The labour of grasping patterns from a stream of symbols hinges on discrimination, compression, detecting redundancy, and judging relevance. Strategic forgetfulness, it turns out, is integral to the performance of such feats of learning. These acts, like the entire regime of deep learning, still exist firmly within the domain of inductive logic, framed entirely as the generalization of patterns in input data.
Brandom’s inferentialist account of reason would locate contemporary AI at merely the first rung of inference – to climb the ladder of abstraction that affords the kind of self-reflection proper to those acts we consider rational is the challenge put to AI by such accounts.11 If machine learning is to develop beyond mere acts of labelling – classificatory feats of inductive inference – then it must develop language(s), be they internal mentalese or external modes of communication. As Brandom notes, to be counted as a concept user one must move beyond simple differential responsiveness to stimuli, as exhibited by a simple thermostat or a pigeon trained to respond to different colours – to acquire concepts means instead to deploy them within a web of inferences, to offer them as premises or conclusions in acts of reasoning.12 This marks the rationalist distinction between sentience and sapience.
In order for concepts to enter into rational roles, they must provide justification for each other on multiple levels of abstraction. A hierarchy of synthetic and analytic concept formation must be at play. It is this multi-level labour of abstraction which latent spaces and embeddings make theoretically possible. But even so, AI remains bound to an inductive regime of logic that precludes the kind of normative claims which play a central role in reasoning. The work of Judea Pearl is instructive here in suggesting a path forward for AI, not only as a critique of deep learning,13 but as an exploration of inferential webs in the form of bayesian networks.14 Pearl develops computational modes of causal inference which are conspicuously absent from mainstream AI.
One way of describing ANNs capacity for abstraction might be through the concept of arity. ANNs can be formalized as a system of nested functional composition, a complex network of activation functions learned over time. The arity of a function denotes its promiscuity, its degree of interdependence within a network of relations. Neurons in this analogy are simple morphisms which take n functions and compose them with an activation curve, outputting a single function in their place – they exhibit n-arity, where n is bounded by the number of neurons in a layer, in the limit case of a fully connected architecture. The arity of a connection in such a network is the dimensionality of that relation.
Arity can act as a short-hand for both connectivity and dimensionality in a functional model, akin to degree in graph models, an indicator of valence within a broader system. Learning is in effect a rewiring of the neural architecture that renders plasticity a variadic reconfiguring of the net. Learning turns neurons into variadic functions, as some functional relations are enhanced and others discarded by tuning their weights in an effort to minimize error – variadic plasticity within a latent space becomes the key to generalization.
The work presented in the pages that follow is a visual and textual exploration of abstraction in ANNs and an inquiry into the generative capacity of LSTM architectures. The methodology presented is akin to administering an injection, a violation of sorts, one which questions the integrity of a body. Submitted to this procedure, a black box of cognition reveals a membrane with dispositions, a tangle of tensorial fibre, matter imbued with memory.
An LSTM is injected with random data adopting a specified distribution (pareto, gaussian, etc), effectively hijacking its latent space. This is fed through the untrained network, which outputs values as intensities of light. The output is an exposure of its internal structure – a nesting of activation curves, smooth gradients, bulbous growths, and fibrous strands. The exposed image passes through a second network in the form of an autoencoder which scales the output to arbitrary sizes, using a lossy representation stored in a distributed manner amongst its neurons. Dimensionality reduction here is a means of compression, collapsing representations in an input space into their characteristic features.
Alongside these exposures a Recurrent Neural Net (RNN) learns to read a corpus of contemporary theory on computation, finitude, and mathematics. The RNN has read the work of Luciana Parisi, Gregory Chaitin, Quentin Meillasoux, and others. Language is but a stream of characters to the RNN. It has no a priori notion of the concept ‘word’, let alone grammar. It too determines which associations to forget and which to retain, a limited memory allowing it to compose synthetic philosophical claims. This is a regime entirely devoid of semantics but rich in syntax and sensicality. It exhibits a form of creativity redolent of Desjardin’s theory of knowledge, namely emergence through generative recombination.15 Its generative capacity – learnt through many hours of training – hints at the constructibility of grammars, at a generalized axiomatics of inhuman linguistic formalisms.