Visualisation Lab - early prototype

happybeing · March 4, 2020, 2:44pm

Here’s an update on what’s happening when I go quiet, although the last week was also spent getting a replacement car for one that literally went bang, and restoring my OS which decided that was a good time to fail in the middle of an upgrade.

This is something I put together to help me understand what I’m thinking about. I don’t expect it will make much sense to others but if you have questions I’ll be happy to respond.

Data Pipeline Summary

The diagram shows stages from data loading to visualisation, from SourceResult, to ViewModel, and finally a VisualSchema:

The SourceResults on the left hold arbitrary serialisations along with metadata about their origin (creation date, file path, SPARQL query etc.) and may be cached locally.
In the middle are ViewModels suitable for visualisation when loaded with an appropriate VisualSchema.
The VisualSchemas on the right are reactive components which can provide an interactive visual presentation of suitable ViewModel
F - Filter is a means of identifying a subset of the contained data, which may have interactivity through the UI.
T - Transform changes the serialisation and/or structure of a SourceResult or ViewModel and may have interactivity through the UI.

See below the diagram for an explanation of what happens in different parts of the pipeline, and the application features this is designed to support.

Interaction
User interaction is not shown in the diagram, but may be supported in any of the processing (circles) between stages (boxes). So the UI will present a unified set of controls that allow selection, querying of the source, choice of a transform to produce a visual model, and choice of a visualisation component to present the data.

Actions
For clarity the flow in the diagram only shows data from source to visualisation, whereas the design is intended to cater for actions to be invoked which flow in the other direction all the way from a VisualComponent to an external data source. Actions will be optional, declared on the ViewModel, and can support interactive features such as those just mentioned, and in addition the ability to make changes to data held locally or the data source itself (local or remote storage, such as a file, database). To provide the UI for invoking actions, a VisualComponent will check which actions are available according to the ViewModel.

Data Pipeline Explained

ViewUI Components and Views
The boxes with graphics on the right of the diagram are examples of ViewUI Components which provide the presentation and UI for a view of data held in a ViewModel. Each ViewUI Component is implemented as a Svelte component, using a View such as a ViewForceGraph to provide the underlying functionality. These present information about data for a set of data held in a ViewModel to which they are connected. Some Views may be capable of combining data from more than one ViewModel in a single presentation.

Note: for every kind of ViewModel there must be at least one VisualComponent capabale of presenting it, even if this is just a text summary or metadata related to the data and its source. So not all View are graphical or particularly complex. Some can be very simple. For example, a visual component might provide a textual summary of SourceResults via a simple metadata ViewModel of the form { records: Number, nodes: Number, source: Object, created: String }.

A VisualComponent embodies a VisualSchema, which is the combination of what data it can present, and how it will be presented.

The component may have interactive capabilities where the Actions available on a ViewModel permit, for example:

to reveal information on clicks or hover
to highlight information through selection
to explore a data source by issuing query instructions to a stage in the pipeling including the original data source, the SourceResults, or the ViewModel to which a datum is connected
to issue a new query to add to or replace the data in the pipleline
to mutate a data source (e.g. remote database, or local store) to add to, modify or delete data in the source
selection, copy and paste between VisualCoponents and/or ViewModels
cloning to create multiple coordinated views of the same ViewModel

The VisualSchema identifies what kind of ViewModels a VisualComponent can present and how these will be represented. Also, whether or not the component supports more than one model of each supported type in the same presentation. For example, a graph component may expect a visual model in the form of { nodes: [], links: [] } with particular characteristics, but this might also be suitable for other components such as a tree or table. So a tabular component could be designed to handle models such as { rows: [], columns: [] } which as well as the one which the graph component handles.

Coordinating Views
Where different components can present the data from the same type of visual model, they can be combined in the user interface to show different presenetations of the same data alongside each other, and for selection in one component to be reflected in any component presenting the same data from the same ViewModel.

This also allows multiple instances of the same component to be included alongside each other to provide multiple presentations in the same style (e.g. at different scales, or as subsets of the data through panning and zoom).

ViewModels
A set of ViewModels need to be defined so that suitable transforms can be made to generate them from SourceResults. The approach here is to define them based on the common conventions found in existing visualisations including Vega and Vega Lite, D3 examples and so on, with slight adaptations to keep transforms simple and representations concise and intuitive.

A ViewModel acts as a bridge between data loaded from a source, and the VisualComponent which provides a view of data to the user. On the one hand it knows quite a lot about the raw data (held in a SourceResult) from which its model is derived. On the other it makes this and other capabilities available in a standardised form which can be understood by View. The latter includes optional capabilities (Actions) which can be invoked through the UI of the VisualComponent, such as expand (e.g. issue a new query to the source), editing of properties, addition or deletion of data elements (entities, links) etc.

I envisage the ViewModel will comprise a number of aspects, some of which will be optional such as below.

Schema: identification of its visual model (schema), used to determine the set of compatible View
Transforms: either take the data of SourceResult and use it to create a new model, or can take data from another ViewModel and create its model from that.
Filters: apply a ‘selected’ flag to the elements of the model according to values provided for the filter’s criteria. Criteria can be matching of properties (e.g. nodes with age > 20), including lists of type and identity (e.g. nodes with ids in a given list). This allows selection by search and match on the one hand, or selection of arbitrary sets through user interaction. Some Actions and Transforms would have the option to only operate on items with a selection.
Model: the core of the model available to a VisualComponent. This is an object with properties according to the implemented schema, and methods for supported Actions.
Actions: functions on both the model as a whole, and on the elements (nodes, links, rows, columns) of the model. For example, we define a set of ViewModel functions which are undefined if not implemented on a particular ViewModel or model elements. Actions on the whole model might include delete all, **re-load/discard changes, add new model element. **Actions on a model element can include expand, reload, modify, delete. In addition, actions may be a way to invoke Transforms which operate on the model itself, or which produce a new ViewModel derived from this model. Alternatively derived models could be implemented by creating them with a ViewModel as their source rather than a SourceResult. This raises the possibility that a SourceResult has a base implementation in common with a ViewModel, although my gut says that this will confuse rather than clarify the design. ← Needs thinking about.
Sets: are subsets of the elements in a model, and may overlap with each other. A Set can be created from a selection (i.e. using Filters). Membership of a set would be useful input to both Filters and Transforms to allow sets, selections and new data models to be built from combinations of sets in the current model.

Filters
This is in effect a way of identifying a “selection” within a set of data, and so can be applied though either query style matching interface and/or interactively though a visualisation (e.g. using mouse clicks, or drawing a boundary etc.). In combination with a Transform, changing the filter applied to a SourceResult can change a downstream ViewModel and be reflected immediately in the presentation (of any View connected to the particular ViewModel).

Transforms
Two kinds of transform are needed:

transforms which map SourceResults (JSON or non-JSON serialisations such as RDF, csv etc.) to a particular ViewModel for which a VisualComponent.
transforms which create one kind of ViewModel from another.

Notes:

perhaps a SourceResult can contain more than one serialisation, or would transformation create a new SourceResult?

Division of Responsibilities

SourceInterface and SourceResults

SourceInterface and SourceResults should remain limited to the details of loading data from different sources, and providing that in a small number of easy to generate forms. They should not be involved in creating the mappings to a visual model. This is intended to make it easy to retain the links between a set of SourceResults and the source, so this can easily be reference and if necessary re-generated. Also, to make it possible to maintain the relationship between each result in a SourceResults and the corresponding element in the source, and so to enable fine grained interaction with the source including modification of properties of a result, deletion etc (e.g. using CRUD, or file editing).

VisualModel and its VM Subclasses

The VisualModel and its VM subclasses consume SourceResults in a small number of standard forms (typically RDF/JS Dataset or JSON objects) and use a Transform to generate their internal visual model as a JSON object which can be accessed by a View class to present the data in a visual UI.

Where the data needs to be mapped, this can be achieve through a VM whose Transform consumes one JSON ViewModel in order to create a new ViewModel, and so on.

A Transform is implemented as a VM class which consumes one model (e.g. SourceResult or ViewModel) in order to generate its own view model.

A pipeline can have a single VM, or a chain of VM objects between a SourceResult and a final ViewModel intended for use by a particular View. Each model in a pipeline will for now include a reference to the source (ie the SourceResult at the end of the pipeline) as well as the preceding VM (if it doesn’t consume the SourceResult directly).

Each model element (e.g. entity type, column ref etc) along a pipeline will have a sourceResultId to maintain it’s correspondence with its origin in the source (e.g. a record id, RDF URI etc.)

Want to play?

The latest version is deployed fairly regularly at vlab.happybeing.com

happybeing · March 4, 2020, 3:13pm

Oh, and a bit more from last night. This shows the area between SourceResults and ViewModel, specifically the intention that entities from different sources can be made into composite model elements where they are deemed to represent the same thing (e.g. organisation):

happybeing · March 6, 2020, 6:08pm

UPDATE:
I just published an early version of the JSON ViewModel Specification which is being developed along with Visualisation Lab.

Feedback welcome, see: JSON ViewModel Specification · happybeing/visualisation-lab Wiki · GitHub

happybeing · April 22, 2020, 1:51pm

Still working on this but its now a useful tool so it would be great if any of you would like to have a play and offer feedback or suggestions. Questions too of course.

It’s a SPARQL endpoint interrogator which I’ve been adding to VisLab. Wut?

A SPARQL endpoint is like a public database of semantic information you can query. Which is like Wikipedia but for computers, in a language which conveys the meaning not just the data. In fact one of the sources is very like Wikipedia - dbPedia, and there’s also Wikidata. Both in the table so you can find them from there if you want to follow up.

What this does is scan these ‘endpoints’ and tell you if they work (many have disappeared over the years) and roughly what you can do with them. There’s a lot more I hope to do - such as indicate what kind of data they hold and how to ask questions about it. It is pretty basic for now so I’m interested in how much sense people can make of what is there so far, how it looks, the UI etc. So not just for SPARQL and Semantic Web folks.

All levels of feedback are welcome. Don’t be afraid, just polite!

Try it here: http://vlab.happybeing.com

Southside · April 22, 2020, 1:57pm

Awwww - and I was so looking forward to taking part in this…

Southside · April 22, 2020, 2:00pm

First problem - it loads real slow and may initially give you an error - just give it 10 -15 seconds

JPL · April 22, 2020, 2:07pm

Could you (kindly) provide some sample queries? It’s hard to know what to do.

Southside · April 22, 2020, 2:21pm

I just click wildly and see where it takes me…

This was one of my better clicks Europeana REST API | Europeana Pro

happybeing · April 22, 2020, 3:03pm

John, as you delve around please make a note of anything you think would help make things more obvious. In this case, that you don’t need to provide any queries yourself. I won’t say more than that at this stage if you don’t mind, happy if you want me to explain though.

You are not required to be polite Willie. It just wouldn’t feel right.

Useful to know. It can be made a separate app and slimmed down a bit but for now its useful for people to know. Thanks

Southside · April 22, 2020, 3:21pm

One day I might get the hang of it. One day…

davidpbrown · April 22, 2020, 3:49pm

cough Wikidata:Lists/SPARQL endpoints - Wikidata

What is WQS? “federated at WQS?”

happybeing · April 22, 2020, 4:18pm

Nice find! I’m not sure but WQS might be Wikidata Query Service, just a guess.

davidpbrown · April 22, 2020, 4:25pm

Does SPARQL give access to the whole dataset or are those limiting to single queries?

Would be interested to see any rdf sources that go beyond DataSetRDFDumps - W3C Wiki

happybeing · April 22, 2020, 4:43pm

I’m not sure what you mean here. The SPARQL endpoints are exposing queries that run over their whole datasets which are held in a triple store like Virtuoso (a very common one), but for large datasets such as dbPedia or Wikidata which are very large it is easy to write a query that takes too long or would return too much data and the server will timeout or abandon the query in such cases.

If you have access to a ‘dump’ you would need something to load that in order to query it. SPARQL endpoints are very useful because the query language is very powerful and can give fast responses over very large datasets. This relies on powerful server side computing and software though, which means things would need to be handled differently on SAFE. It is possible to do SPARQL querying on the client using a library such as Comunica, but handling very large datasets locally is going to be impractical unless the load can be handled by SAFE Network.

As an aside, one of the hard things I’ve found is figuring out what is in a dataset that I might be interested in, which is what got me thinking about adding this feature (SPARQL endpoint interrogation) to VisLab. But as soon as I started looking for endpoints to try ideas out on it was hard to even discover usable endpoints or even the basics of what they supported (e.g. formats they provide content responses in), so the feature I have has started with that.

The interrogator now attempts to answer questions like:

is there a SPARQL endpoint there
what version does it implement
do these basic SPARQL queries work
can it provide responses in Turtle, CSV, XML, JSON

Next will be to start probing the nature of the data it holds to help with my original question, but I’m still working on the above at the moment.

davidpbrown · April 22, 2020, 4:53pm

Yes, I don’t mind volume - those lists in the link above are not a problem, for taking the whole and looking at it to know what is there. I prefer that to asking a blackbox questions, which SPARQL perhaps is… though very useful for some purposes, it seems talked of as endpoint not the raw substance; and looking at rdf sources, it seems the quality is a big issue. So, I wonder good practice to put the raw available, alongside tools like SPARQL.

JPL · April 22, 2020, 7:34pm

OK I think I get it, sort of. You’re trying to provide a window into the opaque world of Sparql endpoints to show at a glance which are live and what their capabilities are. Nice idea. From my very limited foray into RDF I’ve found it very hard to know what these repositories contain. They are definitely not user friendly.

One thing though, I found a sample Sparql query on the web and cut and pasted it into the box and seemed to get a meaningful answer in that some endpoints said they could handle it, others not. But some of those that said they could are dead links eg http://environment.data.gov.uk/sparql/bwq/query

happybeing · April 22, 2020, 8:01pm

Yep, you’ve got it. Thanks for looking around and after finishing off a few things in that first stage my next task is to add some features to help a bit with this:

If you want to explore SPARQL a bit more there are a couple of options in that first drop down menu that will be of interest. Each option selects a different way of getting data into VisLab, and then into one of the different kinds of visualisation I’m experimenting with.

One option has a selection of example SPARQL queries I stole from an app which used them to create different visualisations. In VisLab the results are blithely piped into the graph visualisation underneath. Not very useful except for testing, but I’m chuffed with it.

Another of the options gives you a simple search form that let’s you search dbPedia for people. You type in a name etc and it creates the SPARQL query, and if there are any results they are dumped into the graph view underneath along with immediate relatives, and any pictures. Very crude, just experimental at the moment, but may give an idea of where I’m headed.

Thanks again for having a play.

JPL · April 22, 2020, 8:13pm

Yes I tried those before - Les Miserables wasn’t it?

Anyone making RDF etc more understandable to the average Joe is doing God’s work! Keep it up.

happybeing · April 22, 2020, 8:47pm

Yes that’s another one - loading from a JavaScript file.

The person search is new though. The SPARQL queries came in-between so you may have tried thoee.

JPL · April 22, 2020, 10:46pm

Looking at it again now, yes I did did try it. Who could forget old Cnut, Sigrid the Haughty, Sywen Forkbeard and the rest of the family?

A bit of feedback on the layout of the Tabulation page, maybe hide the query box unless the user selects Custom Query. As the box is at the top of the page, my inclination is to start by typing something in, like a search engine.

Topic		Replies	Views
Visualisation libraries in JavaScript Development javascript , linkeddata , app , rdf , visualisation	6	1354	January 15, 2020
Article: Federated SPARQL queries in your browser Development linkeddata , sparql , databases	2	960	March 15, 2018
POC applications for identity management and RDF data Apps rdf , webid	85	6368	July 22, 2019
BBC R&D Data Privacy Project using RDF & Solid Apps solid , linkeddata , rdf , bbc	0	609	September 16, 2020
Web browser: personalisation utilising semantic data collection in browser storage Apps linkeddata , triplestore , browser	2	1721	March 15, 2018