Tuesday, May 12, 2009

Reflections on the research paper alerts project
Because an attempt to create or support a large-scale crawler would be madness, I figure we'd use an existing search service to find new research papers based on users' queries. I'm not sure, however, what would be accessible to us.

We might qualify for access to google research, but it would tightly limit what we could do with the project at the end, possibly making our results useless unless they're adopted by a research paper search company. The google search API is probably largely useless, as results are limited to 64 entries and, moreover, the terms require that the search component not comprise the core of your app or webpage.

Scraping search results from a free or pay service is almost certainly out of the question. I'm pretty excited about this project as a practical one that's within my abilities once the search source is figured out, though. There are a few services out who seem to be using google scholar results, maybe it's easier than it looks; see Publish or Perish - I don't know how theyse guys are licensed - and Pubfeed - Maria reports insufficient results on this one, but it's a local project, so I'll ask around. This 'touchgraph' does it an interesting way: it's a bookmarklet, so they don't need to return google search results elsewhere. Not quite applicable, but it's getting me thinking about alternate ways of doing this.

More readings

Configuration Management for Large-Scale Scientific Computing at the UK Met Office
A description of developing and deploying a new content management system for the research group. I have a slightly better handle on their current processes and information that'll be available to us. For instance, much of the old version history was imported when they moved to subversion a few years ago. The key takeaway for me was how much support and customization was required to get them to adopt a new system. Any tools we build will have to be extremely easy to use with obvious and immediate benefit if they're to be useful. Simplicity will be the byword.

Where’s the Real Bottleneck in Scientific Computing? and Software Carpentry
Quick reads on the basics computational scientists should be taught. Basically covers the material in CSC108 and CSC148 from a slightly different perspective.

Software Development Environments for Scientific and Engineering Software: A Series of Case Studies
Gives some insight as to how researchers come to conclusions about software engineering, but not really worth the read. Skip to section 5 for conclusions about how large scientific computing teams work.

Who should fix this bug?
An extremely interesting look at a project to cull information from bug reports and CVS repositories for Eclipse and Mozilla for automatic recommendations as to who should be assigned new bugs. It looks to me like what they worked on was way out of scope for the time and expertise our team has available, but it's from a few years back, so there may be further projects and tools available now that we could model our attempts at developing social network models from repository information on. Even if we don't use anything like this, it's an illuminating look at the complexities involved in developing and testing an aggregator from this sort of data.

Internet Groupware for Scientific Collaboration
An overview of group collaboration software as of 2000. I found this really useful as an introduction to the culture of the discourse; some of the comments made by Steve and Greg make more sense in the context of the goals and challenges of group collaboration online here. The much more recent post Now that’s what I call social networking… kind of helped tie it in to current technology trends for me.

The Django Book
I'm coming around. It feels like slower going than learning Rails because they focus heavily on making explicit things that just kind of happened in Rails. I really do appreciate that level of control, however, and I think I'm going to enjoy working in it.

1 comment:

  1. You might take a look at Connotea or Bibsonomy.