Timelines through YQL


OpenHackNYC gave me an excuse to start playing with the Yahoo Query Language. With YQL, you build a binding to datasource, an XML table or other web service, and use an “expressive SQL-like language” to manipulate the  data. Instant database functionality with minimal overhead.

The Heilbrunn Timeline of Art History has these great individual timelines of key art/world events. The timelines are classified by time period and geographically with an encompasses free text string, e.g. Eastern Europe and Scandinavia, 1600–1800 A.D. encompasses “Belarus, Denmark, Estonia, Finland, Iceland, Latvia, Lithuania, Norway, western Russia, Sweden, and Ukraine.”

I’d like to point to all of the related timelines for an arbitrary work of art – provided it has an associated geographical term and date range. I scraped all of the timeline links, titles, dates, and encompasses strings, formatted all the data in an XML table, made a simple binding, and rigged a pipe front-end.

USE "https://netfiles.uiuc.edu/pdadamcz/www/museumpipes/yql/Timelines.xml" AS Timelines; SELECT * FROM Timelines
Try it in the YQL console.

YQL / Timelines Pipe

YQL / Timelines Pipe

USE "https://netfiles.uiuc.edu/pdadamcz/www/museumpipes/yql/Timelines.xml" AS Timelines; SELECT title, link FROM Timelines where encompasses like "%Poland%" and datebegin > 1000 and dateend < 2000
Try it in the YQL console.

//TODO: The entire vocabulary of the encompasses strings isn’t exhaustive – understandably. But maybe I can pass the query from the pipe through a geo service to find neighbor terms and send them all through the YQL statement?

Relative size


Another feature that could be helpful is a display of relative size. ArtsConnectEd has a solid implementation (for example – scroll down to the details, and click the Scale tab). The Met provides dimensions (in a number of formats, grumble) but does not present any cues as to the relative size of the work. So when comparing works with similar aspect ratios like Kensett’s Lake George and Homer’s Prisoners from the Front, it would be easy to assume that the works are similar in size.

Parsing the tombstone gave me width and height in centimeters. Note, I’m only handling dimensions formatted as (ABC x XYZ cm) like those in American Paintings and Modern. WolframAlpha gave me an average human height, 162 cm. And AIGA has all of the classic Symbol Signs available… I set 1 pixel = 1 cm so things wouldn’t get too large. The only sneaky bit was using Google Charts to draw the scaled rectangle for the work of art, just a bar chart with width and height set to the dimensions of the work.

//TODO: Extend to other dimension formats. Handle 3D works. And for smaller works, coffee or martini?

Google Maps for Browsing Collections

Browsing with Google Maps

Modern Art Highlights Browser (detail)

The V&A just released a great beta of their collection search. I really like the jQuery Wall they’ve provided for browsing the collection – it’s an (apparently) infinite canvas of objects from the V&A collection with a movable viewport. The SFMOMA ArtScope does something similar, but I imagine the Wall could be a bit more flexible and maybe a bit quicker loading once some of the interaction lag gets ironed out (maybe some added visual interaction cues as well?). Looking at these reminded me of two tools that help in making tilesets of large images for use with Google Maps.

The UCL Centre for Advanced Spatial Analysis has the Java Image Cutter and the Google Maps resource Mapki provides the Photoshop script Automatic Tile Cutter.

I collected the enlarged images of the 100 Modern department highlights (these are width limited to 500px with a variable height), made a 10 x 10 contact sheet in Photoshop, and had the Image Cutter break down the resulting 5,000 x 8,000 pixel image. Allowing for zooming down to level 7 on the Google Map took 5,461 256 x 256 pixel tiles.

I combined the automatically generated Image Cutter Google Maps code with a few functions to load markers from an XML file. I collected the URL for each highlight with a quick processing sketch. It took some trial and error to figure out how to place the first few markers but the uniform spacing helped once reference points were set.

This simple Google Maps solution isn’t nearly as dynamic as either the V&A Wall or the SFMOMA ArtScope, but I like how it has the potential to move quickly between giving a broad overview of the collection, to showing details-on-demand, and then providing multiple detail or high resolution images all in a single interface.

//TODO: Polygon overlays could be used instead of markers to make an entire collection object clickable. More information could be added to the tooltips – I was having trouble with special characters in XML CDATA.

Network analysis, first steps

Modern Art / OCLC Network in TouchGraph

Since putting together the New York Times identity network, I’ve wanted to look more closely at a larger network of art identities and subjects. I reworked some of the OCLC pipes that pull related identities and associated subjects from an Identity page to output something a bit closer to a TouchGraph data file, wrapped the whole business in a processing sketch, and had it crawl 100 objects from the Met’s Modern Art department.

Modern Art / OCLC Network TouchGraph (detail)

Modern Art / OCLC Network TouchGraph (detail)

After some data cleaning, the network contains,
~3,500 nodes = 1,200 related identities + 2,200 associated subjects + 100 Modern Art Records and
~7,200 edges = 1,700 -> related identities + 5,500 -> associated subjects

Data files – Nodes (Tab delimited), Edges (XLS, Tab), Identities only X3D model

The same terms can appear as related identities and associated subjects. As in the image above, Jasper Johns the associated subject is selected, while the identity is in the upper left. I’ve color coded the nodes in the graph (blue identities, gray subjects) and they are distinct in the data.

For a network this large, TouchGraph works well a single node at a time, but extending the locality stressed out my machine and I still wanted to see the whole network. Pajek to the rescue. Below is the 3D force-based layout of only the identities.

OCLC Network - Identities Only

Modern Art / OCLC Network - Identities Only

Better, but I still wanted a closer look. Pajek exports to X3D.
Used Octaga to render; took a quick screencast…

Directionality is missing from the images but the edges only go from the numbered nodes (the starting set of Met Modern Art records) to Identity records from OCLC.

The major nodes are those you might expect. Each associated subject is presented in a tag cloud on the Identity page with a variable font size. I’ve used those sizes as edge weights where appropriate and summed them across the network here.

Associated Subjects
Sum of Weights
Exhibition catalogs 412
Criticism, interpretation, etc. 377
Catalogs 326
Biography 318
Art 298
United States 295
Artists 246
History 244
Painters 236
Art, Modern 165
Related Identities Occurences
Museum of Modern Art (New York, N.Y.)
De Kooning, Willem 1904-1997
Picasso, Pablo 1881-1973
Pollock, Jackson 1912-1956
Rothko, Mark 1903-1970
Marin, John 1870-1953
Matisse, Henri 1869-1954
Weber, Max 1881-1961
Braque, Georges 1882-1963
Stieglitz, Alfred 1864-1946

(Ahem, Metropolitan Museum of Art appears only 3 times in the network.)

I’ve started looking at the network metrics in UCINET and Pajek but I think there has to be something said about validity at this point. What we have is a two-mode network, i.e. a bipartite data set. Not a problem; plenty of ways to look at the data. But this is more an artifact of the data collection method than reality. Object records don’t point to one other and, since I didn’t iterate, there are no connections between the collected nodes. Of course the validity of the whole data set is pretty dubious. My selection criteria were intentionally broad and uninformed – picking the top 3 identies from OCLC and then pulling in everything, ignoring rank and weight in the data collection phase. The initial goal of the pipes was to find out more about the quality of the results from OCLC – to see if a simple query would suffice. So the pipe structure will need to change if we want validity. I don’t know nearly enough about how associated subjects are mapped to identities, or how an identity is “related” to any others, or for that matter how complete the coverage is for Modern Art in OCLC. Ultimately, any analysis will be saying more about the OCLC data surrounding books rather than about the Met’s holdings. I’ll be sure to present the network analysis metrics on a more “complete” dataset.

With all of that criticism about lack of rigor out of the way; Wow. With a large enough starting set, the resulting network gets rid of the noise pretty well. I think this network is a good place to start with clear direction for improvement.

Internet Archive


I received a special request for an Internet Archive pipe. Starting from the advanced search page there was plenty to work with. The Advanced XML Search form returns whichever record fields you might want as XML, JSON, CSV, or an HTML table. The form exposes all of passed parameters in the search response URL, making it straightforward to rig a pipe to create a well formatted query.

Internet Archive Search Pipe

Internet Archive Search Pipe

Some convenient details in the pipe. Adding long strings in the Pipes interface is annoying due to the short textbox lengths so having each of the record fields added to an array, fl[], makes it easy to see all the parameters. And the Pipes team have added a Create RSS module which makes converting the returned fields to an RSS feed much cleaner.

One quibble with the data format from the Internet Archive; the record fields are returned in XML as repeated elements which makes it just a little harder to manipulate. The JSON response is great with every field placed in a distinct element.

I tried tuning the quality of results. By default the search string John Singer Sargent gets translated into this baroque query:

(title:john^100 OR description:john^15 OR collection:john^10 OR language:john^10 OR text:john^1) (title:singer^100 OR description:singer^15 OR collection:singer^10 OR language:singer^10 OR text:singer^1) (title:sargent^100 OR description:sargent^15 OR collection:sargent^10 OR language:sargent^10 OR text:sargent^1)

The ^s are boosting operators in Lucene with the numbers setting the relevancy weights. Exact phrase matching did not work as well since the artist name could be formatted differently. Compare John Singer Sargent to “John Singer Sargent”.  Lucene’s proximity search operator could do the trick though it can cast the net too wide.

I’ve added the pipe to the Met object information aggregator.

Internet Archive results for John Singer Sargent

Internet Archive results for John Singer Sargent

//TODO: Format the results to include the media type icon that the Internet Archive provides – book, audio, video.

Parallel Sets

Parallel Sets is a tool and visualization method for exploring categorical data. Multidimensional data is going to be hard to present without significant design work and hard to interpret for most information seekers. There is a learning curve with these graphs, but once you get used to them they really are very rich and easy to query.

I think the datasets may have stretched the tool a bit. Labels and scaling got a little wonky, but once the data was filtered to a more reasonable set of values along a dimension, brilliant. The order in which the variables are added to the visualization can change the presentation dramatically which really helps in answering different sets of questions.

This could be getting closer to a chart that would be useful for at-a-glance comparisons across collections.

//TODO: There are similar presentations that might suggest how to augment the design for museum data.

Speculative visualizations, continued

SunBurst, IORing

I’ve been surprised by how many hierarchies can be extracted from aggregated museum object data. I’ve always liked the look of John Stasko’s SunBursts. Like the treemaps, another space-filling hierarchical display, but radial and a bit mesmerizing when the interaction is done just right. I repurposed some of the arc diagram code and made a quick SunBurst processing sketch. I mashed up / pared down some other visualizations into another really basic sketch. I’d like to think I can combine a few more visualizations in interesting ways (e.g. Bloom Diagram) – still just sketches rather than full applications; not done exploring.

All of the visualizations, the more practical and the speculative, are meant to augment collections navigation and search in some way. More for browsing than directed search tasks, but maybe a bit of both… Reading Marti Hearst’s great survey, Search User Interfaces (ch. 10 in particular).

//TODO: Refine one of the sketches to include some basic interface widgets. And some event triggers – though the visualizations are starting to  look OK, interaction is going to take a while to get right.