Network analysis, first steps

Modern Art / OCLC Network in TouchGraph

Since putting together the New York Times identity network, I’ve wanted to look more closely at a larger network of art identities and subjects. I reworked some of the OCLC pipes that pull related identities and associated subjects from an Identity page to output something a bit closer to a TouchGraph data file, wrapped the whole business in a processing sketch, and had it crawl 100 objects from the Met’s Modern Art department.

Modern Art / OCLC Network TouchGraph (detail)

Modern Art / OCLC Network TouchGraph (detail)

After some data cleaning, the network contains,
~3,500 nodes = 1,200 related identities + 2,200 associated subjects + 100 Modern Art Records and
~7,200 edges = 1,700 -> related identities + 5,500 -> associated subjects

Data files – Nodes (Tab delimited), Edges (XLS, Tab), Identities only X3D model

The same terms can appear as related identities and associated subjects. As in the image above, Jasper Johns the associated subject is selected, while the identity is in the upper left. I’ve color coded the nodes in the graph (blue identities, gray subjects) and they are distinct in the data.

For a network this large, TouchGraph works well a single node at a time, but extending the locality stressed out my machine and I still wanted to see the whole network. Pajek to the rescue. Below is the 3D force-based layout of only the identities.

OCLC Network - Identities Only

Modern Art / OCLC Network - Identities Only

Better, but I still wanted a closer look. Pajek exports to X3D.
Used Octaga to render; took a quick screencast…

Directionality is missing from the images but the edges only go from the numbered nodes (the starting set of Met Modern Art records) to Identity records from OCLC.

The major nodes are those you might expect. Each associated subject is presented in a tag cloud on the Identity page with a variable font size. I’ve used those sizes as edge weights where appropriate and summed them across the network here.

Associated Subjects
Sum of Weights
Exhibition catalogs 412
Criticism, interpretation, etc. 377
Catalogs 326
Biography 318
Art 298
United States 295
Artists 246
History 244
Painters 236
Art, Modern 165
Related Identities Occurences
Museum of Modern Art (New York, N.Y.)
De Kooning, Willem 1904-1997
Picasso, Pablo 1881-1973
Pollock, Jackson 1912-1956
Rothko, Mark 1903-1970
Marin, John 1870-1953
Matisse, Henri 1869-1954
Weber, Max 1881-1961
Braque, Georges 1882-1963
Stieglitz, Alfred 1864-1946

(Ahem, Metropolitan Museum of Art appears only 3 times in the network.)

I’ve started looking at the network metrics in UCINET and Pajek but I think there has to be something said about validity at this point. What we have is a two-mode network, i.e. a bipartite data set. Not a problem; plenty of ways to look at the data. But this is more an artifact of the data collection method than reality. Object records don’t point to one other and, since I didn’t iterate, there are no connections between the collected nodes. Of course the validity of the whole data set is pretty dubious. My selection criteria were intentionally broad and uninformed – picking the top 3 identies from OCLC and then pulling in everything, ignoring rank and weight in the data collection phase. The initial goal of the pipes was to find out more about the quality of the results from OCLC – to see if a simple query would suffice. So the pipe structure will need to change if we want validity. I don’t know nearly enough about how associated subjects are mapped to identities, or how an identity is “related” to any others, or for that matter how complete the coverage is for Modern Art in OCLC. Ultimately, any analysis will be saying more about the OCLC data surrounding books rather than about the Met’s holdings. I’ll be sure to present the network analysis metrics on a more “complete” dataset.

With all of that criticism about lack of rigor out of the way; Wow. With a large enough starting set, the resulting network gets rid of the noise pretty well. I think this network is a good place to start with clear direction for improvement.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s