|There is no dark side of the moon. Matter of fact, it's all dark. The only thing that makes it look light is the sun.|
The typical conception of a resource like "Arxiv" or "Wikipedia" is summed up in the name: it is an archive or an encyclopedia that anyone can edit. "Stack Exchange" is somewhat less self-descriptive -- but nevertheless, the emphasis on "exchange" is apt, since this site is not quite a gift economy in Eric S. Raymond's sense, but rather a place where questions are exchanged for answers, and both are exchanged for reputation in the form of points and badges.
Nevertheless, in broad brushstrokes all three resources have something much more essential as a common basis: they all grow in the course of use. That said, they do not typically transform radically along the way. Arxiv remains an archive; Wikipedia remains a wiki and encyclopedia; Stack Exchange is and always will be a Q&A site focused on questions with specific, usually technical, answers.
But what if we get "outside" of these systems, stop thinking about them as objects, and start thinking about them as (collections of composable) processes? From this vantage point, a change in type is as likely as a change in magnitude. For example, we can imagine a next-generation computer system that combines data from these various sources and that can serve as a dialogue partner.
Dubious? Consider that from our new vantage point outside the system the people who use the resources are indistinguishable from computational agents that accomplish similar tasks. Look up something here, ask about it there, write about something related somewhere else.
Social scientists have studied interaction with Wikipedia, but a more computational flavour of research asks how the contributed material is organised, and attempts to anticipate what the system would need to do in order to achieve some new processing task. For example, how much would the system need to know in order to begin to ask and answer questions, or participate in a collaborative problem solving process?
We can get some clues about these speculative questions by considering the current state of such systems. A resource like Wikipedia or Stack Exchange can be used as a "prism" or "fractionator" that will allow a given input text to be parsed. Let's consider the mathematical domain, where there is the convenient epistemological gold standard of "proof."
We can use the structure of an encyclopedia-like resource to deduce that "circle" is a basic mathematical concept, whereas "holomorphic function" involves more information and requires more subject-specific background to understand and apply. This is not to say that "circle" would be an easy concept for an AI system. Rather, I'm suggesting that the process of solving a mathematical problem, participating in a dialogue, or proving a new theorem involves building up the domain of discourse in which a given term or query becomes meaningful.
In principle, everything is related to everything else. In practice, some things are more related than others. Finding points of overlap or connection should allow users to recover the bridging concepts that are useful for answering questions (whether routine or novel).
I've previously done some pilot study work using the idea of "fractionating" texts according to the depth of their constituent terms, and I also looked at timestamped data to try and determine the precipitating causes of a given action from a system. But more recently, I've been thinking about how to approach similar issues from a simulation point of view. Thus, e.g., generating texts, or programs, or critiques.
More specifically and technically: what I have in mind is developing generative testing approaches to programmatic interaction that will allow the computer to build out its own system in new directions with minimal guidance from the programmer. The intuition here is that what we do depends on what we perceive needs to be done in a given situation. This is relevant to thinking about how the computer could participate in dialogue.
But just now these proposals may seem quite abstract and in that respect rather hopeless. Moving from big collections of words to models of word cooccurrence (for example) is natural enough. Moving from there to text understanding and dialogue sounds quite a bit harder.
Let me therefor break the proposal down a bit more, along three quite specific directions.
- [PEER] LEARNING - Organising learning pathways is quite similar to automatic programming, insofar as the pathways need to change and adapt depending on circumstances. If someone doesn't understand "holomorphic function," for example, they might want to review "differentiable function." More broadly, a peer learning experience can help surface and fill in gaps in understanding in an emergent and ad hoc manner.
- [MATHEMATICAL] COLLABORATION - Links between different resources like Wikipedia, Stack Exchange, and Arxiv are perhaps even more interesting than the resources themselves, from an AI point of view. Here we consider collaboration "in the large" and the mostly-technical challenge of integrating new knowledge into a large scale model, without disrupting contributors' workflows.
- [COMPUTATIONAL] CREATIVITY - If we want mathematical agents to help us solve problems (or indeed, to solve problems that we are not able to solve directly) then they need to know what the problems mean, i.e., they need to be able to parse and flesh out our imperfectly expressed ideas and queries.
The gold standard of proof is likely to come in handy if we ever need to break down large problems into smaller ones. Thus, instead of trying to deduce A's from Q's or vice versa, we could try to deduce proof steps in a paper from the foregoing discussion, lemmas, and citations.
If the system gets stuck, which it inevitably will do often, we then ask: what does it need to LEARN? What COLLABORATIVE processes might it participate in to clarify some specific uncertainty? What new methods might it need to CREATE to answer the foregoing questions on its own?