Tim Berners-Lee

Tim Berners-Lee

Tim Berners-Lee

Strange you might think, why would an SEO want to keep an eye on what the co-founder of the web is doing?

Well quite simply for me, Tim had an insight and forethought to make it possible for me to be sat at my desk now, what he offers through his blog becomes slowly apparent the more you read it.  Net neutrality is a hot topic right now, but read further into his archives and you will be pleasantly suprised where you end up. I recommend you either bookmark his blog or subscribe to his feed, the man is a genius, FACT.

The latest from Tims Blog;

Map and Territory in RDF APIs

RDF specs and APIs have made a bit of a mess out of a couple pretty basic tools of math and computing: graphs and logic formulas. With the RDF next steps workshop coming up and Pat Hayes re-thinking RDF semantics Sandro thinking out loud about RDF2, I'd like us to think about RDF in more traditional terms. The scala programming language seems to be an interesting framework to explore how they relate to RDF.

The Feb 1999 RDF spec wasn't very clear about the map and the territory. It said that statements are made out of parts in the territory, rather than features on the map, which doesn't make very much sense. RDF APIs seem to inherit this confusion; e.g. from an RDF::Value class for ruby:

Examples:

Checking if a value is a resource (blank node or URI reference)

value.resource

Blank nodes and URI references are parts of the map; resources are in the territory.

Likewise in Package org.jrdf.graph:

Resource A resource stands for either a Blank Node or a URI Reference.

The 2004 RDF specs take great pains to clarify these use/mention distinctions, but they also go on at great length.

Let's review Wikipedia on graphs:

In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. ...  The edges may be directed (asymmetric) or undirected (symmetric) ... and the edges are called directed edges or arcs; ... graphs which have labeled edges are called edge-labeled graphs.


With that in mind, in the swap-scala project, we summarize the RDF abstract syntax as an edge-labelled directed graph with just one or two wrinkles:

package org.w3.swap.rdf

trait RDFGraphParts {
  type Arc = (SubjectNode, Label, Node)

  type Node
  type Literal <: Node
  type SubjectNode <: Node
  type BlankNode <: SubjectNode
  type Label <: SubjectNode
}

The wrinkles are:

  • Arcs can only start from BlankNodes or Labels, i.e. SubjectNodes
  • Arcs labels may also appear as Nodes

We use another trait to relate concrete datatypes to these abstract types:

trait RDFNodeBuilder extends RDFGraphParts {
def uri(i: String): Label
type LanguageTag = Symbol
def plain(s: String, lang: Option[LanguageTag]): Literal
def typed(s: String, dt: String): Literal
def xmllit(content: scala.xml.NodeSeq): Literal
}

This doesn't pin down what a Label is, but in any concrete implementation, you can build one from a String using the uri method. The RDFNodeBuilder trait is used to implement RDF/XML, RDFa, and turtle parsers that are agnostic to the concrete implementation of an RDF graph.

Now let's look at terms of first order logic:

 The set of terms is inductively defined by the following rules:

  1. Variables. Any variable is a term.
  2. Functions. Any expression f(t1,...,tn) of n arguments (where each argument ti is a term and f is a function symbol of valence n) is a term.
This is represented straightforwardly in scala a la:
package org.w3.swap.logic1
/**
* A Term is either a Variable or an FunctionTerm.
*/
sealed abstract class Term { ... }

class Variable extends Term { ...}

abstract class FunctionTerm() extends Term {
def fun: Any
def args: List[Term]
}

The core RDF doesn't cover all of first order logic; it corresponds fairly closely to the conjunctive query fragment:

The conjunctive queries are simply the fragment of first-order logic given by the set of formulae that can be constructed from atomic formulae using conjunction \wedge and existential quantification \exists, but not using disjunction \lor, negation \neg, or universal quantification \forall.

We can then excerpt just the relevant parts of the definition of formulas:

The set of formulas is inductively defined by the following rules:

  1. Predicate symbols. If P is an n-ary predicate symbol and t1, ..., tn are terms then P(t1,...,tn) is a formula.
  2. Binary connectives. If φ and ψ are formulas, then (φ \rightarrow ψ) is a formula. Similar rules apply to other binary logical connectives.
  3. Quantifiers. If φ is a formula and x is a variable, then \forall x \varphi and \exists x \varphi are formulas.
Our scala representation follows straightforwardly:
package org.w3.swap.logic1ec 

sealed abstract class ECFormula
case class Exists(vars: Set[Variable], g: And) extends ECFormula
sealed abstract class Ground extends ECFormula
case class And(fmlas: Seq[Atomic]) extends Ground
case class Atomic(rel: Symbol, args: List[Term]) extends Ground

Now that we have scala representations for RDF graphs and conjunctive query formulas, how do we relate them? This is the fun part:

package org.w3.swap.rdflogic

import swap.rdf.RDFNodeBuilder
import swap.logic1.{Term, FunctionTerm, Variable}
import swap.logic1ec.{Exists, And, Atomic, ECProver, ECFormula}

/**
* RDF has only ground, 0-ary function terms.
*/
abstract class Ground extends FunctionTerm {
override def fun = this
override def args = Nil
}

case class Name(n: String) extends Ground
case class Plain(s: String, lang: Option[Symbol]) extends Ground
case class Data(lex: String, dt: Name) extends Ground
case class XMLLit(content: scala.xml.NodeSeq) extends Ground


/**
* Implement RDF Nodes (except BlankNode) using FOL function terms
*/
trait TermNode extends RDFNodeBuilder {
type Node = Term
type SubjectNode = Term
type Label = Name

def uri(i: String) = Name(i)

type Literal = Term
def plain(s: String, lang: Option[Symbol]) = Plain(s, lang)
def typed(s: String, dt: String): Literal = Data(s, Name(dt))
def xmllit(e: scala.xml.NodeSeq): Literal = XMLLit(e)
}

The abstract RDFGraphBuilder node types are implemented as first order logic terms. For formulas, we use a "holds" predicate:

 object RDFLogic extends ... {
def atom(s: Term, p: Term, o: Term): Atomic = {
Atomic('holds, List(s, p, o))
}
def atom(arc: (Term, Term, Term)): Atomic = {
Atomic('holds, List(arc._1, arc._2, arc._3))
}
}

Then all the semantic machinery up to simple entailment between RDF graphs just falls out of conjunctive query.

I haven't done RDFS Entailment yet; the plan is to do basic rules first (N3rules or RIF BLD) and then use that for RDFS, OWL2-RL, and the like.

Existentials in ACL2 and Milawa make sense; how about level breakers?

Since my Sep 2006 visit to the ACL 2 seminar, I've been trying to get my head around existentials in ACL2. The lightbulb finally went off this week while reading Jared's Dec 2009 Milawa thesis.

3.7 Provability

Now that we have a proof checker, we can use existential quantification to
decide whether a particular formula is provable. Recall from page 61 the notion
of a witnessing (Skolem) function.
We begin by introducing a witnessing function,
logic.provable-witness, whose defining axiom is as follows.


Definition 92: logic.provable-witness
(por* (pequal* ...))

Intuitively, this axiom can be understood as: if there exists an appeal which is
a valid proof of x, then (logic.provable-witness x axioms thms atbl) is such
an appeal.

Ding! Now I get it.

This witnessing stuff is published in other ACL publications, noteably:

  • Structured Theory Development for a Mechanized Logic, M. Kaufmann and J Moore, Journal of Automated Reasoning 26, no. 2 (2001), pp. 161-203.

But I can't fit those in my tiny brain.

Thanks, Jared, for explaining it at my speed!

Here's hoping I can turn this new knowledge into code that converts N3Rules to ACL2 and/or Milawa's format. N3Rules covers RDF, RDFs, and, I think, OWL2-RL and some parts of RIF. Roughly what stuff FuXi covers.

I'm somewhat hopeful that the rest of N3 is just quoting. That's the intuition that got me looking into ACL2 and Milawa again after working on some TAG stuff using N3Logic to encode ABLP logic. Last time I tried turning N3 {} terms in to lisp quote expressions was when looking at IKL as a semantic framework for N3. I didn't like the results that time; I'm not sure why I expect it to be different this time, but somehow I do...

Another question that's keeping me up at night lately: is there a way to fit level-breakers such as log:uri (or name and denotation, if not wtr from KIF) in the Milawa architecture somehow?

DIG losing the battle with spammers again

Blog spam went out of control again; the only remedy I could find was a very big hammer: turn off the drupal comments module altogether and in doing so, unpublish all comments ever posted to this site. I suppose they're still in the database and could be published again, if we could separate them from the spam.

The drupal expertise in our group seems to have gone on to greener pastures. That prompted me to divest from my family business drupal installation and start a hosted wordpress site and makes me wonder how safe is stuff that I write here...

Any MIT students want to help this research group manage a community presence? Please get in touch.

No such thing as bad publicity for Facebook

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Anecdotal evidence suggests that there’s no such thing as bad publicity (at least for Facebook). In the wake of the recent flap about Facebook’s change in its terms of service, I seem to be experiencing a spike in new friend requests on Facebook. Of course, there may be no causal relationship whatsoever but I don’t think I’ve become any nicer or more popular. :-) I have a feeling people just have Facebook on the brain.

Obama’s Tech Stimulus plan - Health IT, Broadband, and smart grid

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Steve Lohr has a nice piece in the New York Times (’Technology Gets a Piece of Stimulus,’ 26 Jan 2009, p. C1) this morning about the role that technology and innovation will play in the economic recovery (aka stimulus) bill supported by the Obama Administration.

In the past, health IT deployment has been approached as an engineering problem: what computers have to be part of which networks exchanging which types of data? This loses sight of the purpose of electronic medical records: helping doctors to provide better care to their patients and transforming the system at a macro scale so that it enables data-driven, evidence-based research on how to provide effective, cost-efficient care. Today, because most doctors are paid based on how many procedures they perform, as opposed to how good they are at keeping patients healthy, will actually lose money if new information systems help them to deliver care more efficiently and keep people healthier. So, the key challenge for electronic medical record deployment is to marry up overall changes in healthcare policy with the the right innovation environment to produce the health information infrastructure we need to support safer, more efficient health care.

A quick infusion of stimulus spending, combined with a long term commitment to spend much of this money in a way that rewards doctors for delivering better care and data needed to measure effectiveness and efficiency (as opposed to just subsidizing them to put expensive hardware and software on their desks), can help lay the groundwork for the systems needed for health care reform. As Lohr explains:

The time-tested way for governments to create jobs in a hurry is to pour money into old-fashioned public works projects like roads and bridges. President Obama’s economic recovery plan will do that, but it also has some ambitious 21st century twists.

The $825 billion stimulus plan presented this month by House Democrats called for $37 billion in spending in three high-tech areas: $20 billion to computerize medical records, $11 billion to create smarter electrical grids and $6 billion to expand high-speed Internet access in rural and underserved communities.
[..]
The technology industry is not typically viewed as a prolific job producer. Much of its manufacturing is highly automated. But bringing technology to services fields like health care, telecommunications and energy can be labor intensive and thus generate jobs.

The issues surrounding electronic health records illustrate the policy challenges of targeted programs. Mr. Obama has advocated spending $50 billion over five years to accelerate the use of such records and the sharing of health information across a national network.
[..]
The computerized records, when used properly, are an indispensable tool for measuring, tracking and improving patient care — yet only about 17 percent of the nation’s doctors are using them. They are commonplace at large medical groups, but 75 percent of doctors practice in small offices of 10 physicians or fewer.

Doctors often benefit from inefficiency, because the dominant fee-for-service payment system means they are paid for doing more — more doctor visits, tests, surgical procedures, pills.

“Paying to put computer hardware and software in physicians’ offices isn’t going to do anything unless you change the incentives in the system,” said Dr. David J. Brailer, former national health information technology coordinator in the Bush administration.
[..]
“You want to pay for achievement — better health quality and efficiency,” said Dr. David Blumenthal, director of the Institute for Health Policy at the Harvard Medical School, who advised the Obama campaign. “But in the transition period, before financial incentives are reformed, you need to provide incentives or grants to use electronic health records because this technology is sort of the opening wedge to reform.”

And summarizes the current contents of HealthIT stimulus proposals developed by the transition team and current being considered by Congress:

Those eligible for grants to buy technology, a member of the Obama transition team said, will include inner-city and rural hospitals and small doctor practices. But most money, he said, will go to incentive payments to improve quality and safety of care.

The big leverage that that the Federal Government has is the over $700 Billion dollars that it spends on Medicare and Medicaid each year. All together the Federal government pays for over 40% of all healthcare in the US so directing that spending in a way that encourages a more data-driven health care system is the key to success. The stimulus spending will be the first step toward creating a system in which that money can be used to encourage smart, data-driven health care.

Find out more about Tim and what he does through w3.org or in case the feed above is broken for any reason you can view his blog directly

Leave a Reply

Please leave these two fields as-is:

Protected by Invisible Defender. Showed 403 to 192 bad guys.