Wednesday, January 26, 2011

Sprint Wrap Up for POSIT, UCOSP 2011

This weekend about fifty students from all over Canada got together at the University of Alberta campus for a weekend of coding for our new open source projects. We spent three days getting to know each other and the codebase and had a blast doing it.

POSIT: Portable Open Source Search and Identification Tool

POSIT is an Android app and web project designed to help track finds in the field. Humanitarian aid workers can plan searches by looking at what's already been covered and tag finds with photos and their exact location as they survey the area.

The Team

Edward Bassett of the University of Alberta worked on POSIT for UCOSP last semester. He was there all weekend to help the new team get a handle on the project.

Shawn Gryschuk of the University of Saskatoon has already jumped into user interface improvements.

Eran Henig of the University of Toronto couldn't make it out to Alberta but was an everpresent collaborator over Skype and IRC.

Stas Kalashnikov of Simon Fraser University fixed our first big bad bug of the season, stopping images on the server from multiplying like bunnies.

Ralph Morelli is the POSIT project mentor at Trinity College. He made himself available all weekend over voice and video chat to help us get to know the project and get a feeling for where he'd like the project to go this semester.

Dustin Morrill of the University of Alberta fixed our second big bug, allowing POSIT to fall back gracefully to network location data when GPS is unavailable.

Jon VanAlten of the University of Alberta hopes to allow finds to be synced over SMS when data networks are unavailable.

Sarah Strong of the University of Toronto hopes to offer features to support real time collaboration on searches.

The Sprint: wrestling with Android and hunting bugs

We spent most of Friday getting our development environments set up and making sure that we can debug code on the phones and emulators. We also bumped up against permission control limitations of Mercurial hosting on Google Code and had to rethink our collaboration processes. By the end of the day, most of us were up and running and we could get started on exploring the project. We even started to document bug we found as we poked around.

On Saturday, Ralph ran us through a whirlwind tour of the codebase and we started to get set up to work on bug fixes. We wound up settling on each using a clone of the experimental branch as our personal repo, and asking Ralph to pull in changes when we have something worth incorporating into main. Jon started to put together a development POSIT server for us to tinker with.

Our last few hours on Sunday were spent tying up loose ends and planning for the rest of the semester. We're hoping to get bug fixes and some code cleanup done in the next couple of weeks and make a preliminary release in the next month. The last release was in June and our first release will bring in new fixes and features that are at a stable point. We'll work on pitches for features or projects of our own over the next few weeks and we'll each put together a final project. We're bursting with ideas but I'll leave the details for when we have more of an idea of what's feasible. By the end of the semester we hope to release a second version that incorporates the work done by UCOSP students last semester.

Tuesday, June 8, 2010

Clipboard managers for Ubuntu

Patching is hard, let's go shopping! ...for clipboard managers


I had hoped to have at least one patch finished by now, along with general guidelines for implementing the fix in other applications. GTK+'s implementation of textbuffer already respects the ClipboardManager specification, making applications that use it fixed by default. Any other usage of custom built text areas in GTK+ seems to vary so wildly from application to application that a specimen fix makes little sense. I'm holding out hope that I can factor out commonalities in the fixes of GTK+ applications to build a library solution, but I haven't had much luck yet.

I've begun trying to patch vim, openoffice.org, and empathy to conform to the spec and with no success so far. As a change of pace, here's a survey of the clipboard management field as it stands. If we were to decide to fix this problem by bringing a more fully featured clipboard management application up to standard so it could be released as a part of Ubuntu main, it would free us (me!) from having to patch each application individually. A girl can dream, right?


Klipper


Clipboard management isn't a problem in Kubuntu due to the tight integration of Klipper into KDE. It sits in the taskbar intercepting any and all copies performed in the system and making them available both after application quit and as a history in a panel app. Installing klipper in gnomic Ubuntu brings in a whole lot of kde libraries. On Ubuntu Lucid, I found it threw a warning, "QClipboard::setData: Cannot set X11 selection owner for PRIMARY," and failed to preserve clipboard contents after quit. The copied text was available in klipper's history, accessible by clicking on the panel icon.

GSD-clipboard-manager


Gnome Settings Daemon's clipboard manager runs in the background in Gnome systems including Ubuntu, taking care of preserving clipboard contents after quit by implementing the ClipboardManager specification. This only works for those applications kind enough to implement that same spec themselves.

One option to consider in this project is to integrate some of the functionality from the panel-based clipboard managers into GSD, keeping a record of each copy as it's performed without keeping a history or adding a panel applet. I've looked through the source and that looks quite doable, but the major consideration is whether it can be done without an inordinate impact on speed or reliability of GSD. Right now, it only registers copies when applications request it. Acting as a full clipboard manager would cause it to record many selections and copies that are never actually needed. We could perhaps reduce the load by supporting persistence only on the default (ctrl-V) register in conforming apps, at the expense of consistency.


Clipman

Clipman is a component of Xfce and depends on many xfce libraries. It also has the same behaviour as Klipper when installed in regular old Gnome-based Ubuntu. What makes it interesting is that it includes gsd-clipboard-manager.c, the clipboard management plugin provided with gnome-settings-daemon, within its source. If we were to extend the functionality of gsd-clipboard-manager, this could be a good place to start.

Glipper

Glipper was written as a Gnome-based alternative to Klipper. It installed oddly on my system, providing no executable in my path and no entry in gnome menu. Running the binary installed at /usr/lib/glipper/glipper provided no panel applet but did preserve clipboard contents after quit. It seems it's meant to run as a panel applet, so if I go down the route of working to bring it in as a default part of Ubuntu, I'll have to fix the install process. Users complain that it uses too much memory and is buggy. It's a python-based application that was last updated in 2007.

Parcellite

Parcellite is another Gnome clipboard panel applet, this time officially abandoned April 2010. It's written in C and I prefer its source to Glipper's mostly based on excellent commenting. It installed cleanly on my system, giving me an entry in "Add to panel" as Clipboard Manager. It preserved the clipboard after quit while sitting in my panel, collecting a history, and I wasn't able to reproduce the one bug I found reported about it, but I did find that its no-panel-applet daemon mode failed to preserve clipboard contents. 

Conclusions

Extending an alternate gsd-clipboard-manager that includes persistence seems like it would be worth doing if it could be done without noticeably impacting performance or reliability. GSD, however, is an essential system service that needs to be so speedy and reliable that I'm not sure it's feasible, at least for me.

It would be a good idea, I think, to make the Clipboard Manager panel applet (parcellite) available on a default install, without actually adding it to the default panel. That way we give users the option of fixing this bug in an easily discoverable sort of way without potentially running down slow systems or adding to panel clutter. I'll bring it up with my GSOC sponsor, Ted, and ask whether it would be appropriate to suggest it and spend time during my employment testing, fixing, and readying it for inclusion. That seems unlikely to happen at this late date, so I'll probably take up the maintenance and the cause once my time as a GSOC student wraps up.

Friday, May 21, 2010

Gearing up for GSoC: Clipboard Persistence for Ubuntu

GSoC & Ubuntu Clipboard Improvements

This summer I'll be tackling a Google Summer of Code assignment with Ubuntu keep clipboard contents from being lost when an application quits. You can check out my application here, developed with lots of input from my mentor Ted Gould and Ubuntu developers James Westby and David Bensimmon on IRC and the ubuntu-soc mailing list.

The problem: data loss on quit

Say you're writing an email in your word processor. If you copy it, paste it into your email client, and then close the word processor, you're golden. If you copy it, close the word processor, and try to paste, you're probably out of luck. It's an odd case, but when it happens it means loss of user data.

The problem happens because Xorg takes a conservative approach to copying. It copies only a reference to the original data when the user performs a select or copy. It doesn't go and retrieve the actual data from the source program until the user requests a paste. It saves a lot of unneeded transfer of data this way, at the expense of having no way of retrieving data from a closed program that hasn't saved its clipboard somewhere else.

The solution: save on exit

Freedesktop's ClipboardManager specification comes to the rescue. Gnome settings daemon, the component of Ubuntu that handles all copying and pasting by default, conforms by allowing applications to explicitly request to save their clipboard contents in a safe place. Applications conform by requesting a save before they exit. Everything gets squared away before a quit and we don't lose any data. Unfortunately, there are very few applications that conform to this standard, and we believe that few developers are even aware of the problem. I hope to put together an online guide to fixing the problem while patching several popular Ubuntu programs myself.

A batch fix?

The adhoc approach of fixing a series of apps seems like it'll work, but we're looking for a more systematic way to knock out the problem in many places at once. Right now, we have a variety of clipboard history applications that provide clipboard persistence by keeping track of each copy performed. These panel apps get the job done, but they're probably too heavyweight for default inclusion in Ubuntu. A lighter weight solution might be to create a library that GTK+ application developers can use to easily fix this problem. I'll be comparing existing fixes to figure out whether this is a feasible approach.


Things to watch out for: performance problem, format support, upcoming GTK+ improvements

There are several things to keep in mind as I start investigating the problem.
  • I've seen reports that saving clipboard data can have a performance hit. I'll need to make sure I'm not imposing an unacceptable performance hit before I push out changes to a bunch of programs.
  • An application can broadcast that it can provide a picture the user has copied as a jpg to an image program and as a text link to an editor. I'll need to make sure I'm not imposing any regressions to multiple format support on any changes I make, and report on support status as I go.
  • There are some changes that might be coming down the pipeline to GTK+ this summer with the addition of a base application class. Ted mentioned that the sort of changes I'm proposing might be made easier then. All the more reason, then, to keep what I'm doing well documented so it can be easily reimplemented when there's a better place to put it.
Timeline (so far...)

Weeks one and two, May 24 - June 6, 2010

  1. Create an example program that exhibits the problem
  2. Fix the problem in the example program
  3. Put up a website that describes the problem and shows how to fix it using the example program
  4. Add a page of extra links and explanatory material for anyone who's researching the problem; have a section at the top for users who might just be googling the behaviour.
Weeks three and four, May 7 - 21, 2010
  1. Compile a list of existing patches that fix this problem and add them to the link page
  2. Read them to get an idea of how it's being fixed and where there are commonalities that could be factored out
  3. Fix the problem in one real application
  4. If it seems like a common library to make future fixes easier makes sense, put together a proposal for it
  5. Update documentation page so that it would have been useful to me in fixing the real app I tackled

Friday, July 31, 2009

Wrapping up for the summer

I just spoke to Steve about plans for wrapping up our projects for the summer, and he asked me to write up a summary of the steps we'll need to take before the fall.

  1. Prepare for the poster session

  2. Make screencasts
    We're planning on releasing demos as screencasts for each project. Maybe we should make a quick and dirty one right away to get feedback and throw light on those features that need work before the end of the summer, and then make a nicer one in a few weeks?

  3. Move the source to a public site such as sourceforge
    If we do this right away, we can ticket code cleanup tasks to give us a nice roadmap for the end of the summer, and for any future development on the projects.

  4. Provide documentation
    Make sure code is commented nicely and provide an overview for future developers.
    Also include help/about/how-to information in the app, where appropriate, for end users.
    We can also do light refactoring and clean up code as we comment.

  5. Fix bugs and remove hacks
    It's easy to leave in buggy features in a single developer project if you know the workaround. We'll need to fix that up. A better testing suite would be nice to have, but might be of lower priority than fixing the bugs we know we have.

  6. Add one or two killer features
    We'll need to brutally triage so we don't ignore the boring but necessary cleanup and documentation tasks, but we got great feedback this week from people who watched our presentations and we've all got a couple of features we'll want to be able to leave the project with


By way of example, here's how I think we could apply this to our project, TracSNAP.

  1. Prepare for the poster session
    The screenshots and explanation on the poster could be reused as documentation and info for the front page of our TracHacks page.

  2. Make screencasts
    I already have an idea of what I want fixed for our screencast from the demo session we just did. A quick one soon would be good, though.

  3. Move the source to a public site such as sourceforge
    TracSNAP belongs on TracHacks. I'll throw our source up there and start ticketing changes as soon as I get the go-ahead from Ainsley.

  4. Provide documentation
    I can go through the code and check for any egregious omissions in commenting this weekend. On Tuesday, I'll add barebones user-accessible help on each feature. We'll put together an overview of features for the poster session and our TracHacks plugin page next week. Beyond that, I think maybe better documentation should wait until we've fixed a few bugs and decided on what features we'd like to add by the end of the summer..

  5. Fix bugs and remove hacks
    We have several known bugs and ugly workarounds in our project, and moving to a real project management system and ticketing them is the first step. Then we'll prioritize and work on fixing them.
    Most pressing: Update repository data on every commit. Remove extra tabs only used for development. Grab real emails and work on mapping Trac logins to subversion logins in a sane, if not perfect, way. Decide on saner algorithms for determining relatedness and expertise.

  6. Add one or two killer features
    Since JSViz seems to be pretty broken for parent nodes with >18 children, moving to Flare is probably a key feature. This is Ainsley's department and I'll defer to her judgement on whether it'll be doable in less than a month. Jon Pipitone suggested we get together with Brent and a few grad students and work together on a code sprint to get Flare up and running on both TracSNAP and Breadcrumbs, if possible.
    Other possible features:
    • Improved UI - request suggestions, and make everything scale better with screen size.

    • Import social network data from existing products that generate it from email logs and the like.

    • Adapt Anita Sarma's algorithms from the Tesseract project for determining relatedness to this project.

    • Do you have a suggestion? Leave a comment - thanks!

Monday, May 25, 2009

Pitching to the Tesseract folks

We had a group meeting today to present our feasibility findings and project plans, and it went quite well. It seems that Steve thinks that Anita Sarma and the other developers for Tesseract would be amenable to letting us use vast swaths of their code as a backend for our social network project, which would simplify our project immensely.

Tesseract does all the data harvesting and analysis we want to do, but presents the data in a complex, freestanding web app. We'd work on tweaking the analysis portion to work well with the data the Hadley Centre keeps (if neccesary), getting it to run as an unattended part of the project management/repository back end, and pushing the congruence data it generates to extremely simple views within the project's Trac site.

Thursday, May 21, 2009

Mostly minuatae

Social networking in Trac thoughts
  • I set up toy local Trac and subversion servers to look at what information's available out of the box. It turns out that Trac doesn't really track anything that could be useful for building a graph of straight up social interactions. This suggests some things about how to set up the project - our repository authorship graph maker is a totally separate module from the social network graph maker, both export to a common network representation format, the recommendation engine combines them and spits out information, and the Trac plugin serves pretty views on that info. This is probably the best way to set it up regardless of the social network information source (especially if we want to be able to adapt it to different VCSs and viewers,) but it's good to start thinking about more concrete choices.
  • It's my understanding that at the Hadley Centre, they would likely be able to feed all work email history into the social graph maker, and that guided my description of how to create a social graph from yesterday. I'd really like to make a suite of tools that could potentially be useful to other projects, though, so it's worth thinking about what resources others might have available. Many open source projects use mailing lists to communicate, and it makes sense to base a social graph of mailing list participants on who has replied to whom. More on this as I consider it.
  • How should we track LOC edited? I don't know whether Hadley uses BDB or FSFS for their subversion backend. FSFS introspection looks pretty straightforward: each revision has an author, each revision file has a list of deltas, followed by a list of information about the files revised. It'd probably be better to use existing parsers, even if all we want is linecount/filename.

Tuesday, May 19, 2009

Social networking on Trac: pinning down the specifics

I'm working with Ainsley Lawson now, and I'll defer to her excellent post for a summary of the purpose and evaluation issues she's been researching. Speaking with her about the project gave me a much clearer idea of the specifics of the network we'll be building, so here goes the clearest summary I can make so far:

To build the graphs:

Create a social relationship graph.

Look at email to: and from: fields in the tracked communications and give each pair of people a relationship point for each time one emails the other. Use the relationship points to determine strength of connection in a relationship graph.

Create a code relatedness graph.

For each pair of code modules, give them a relatedness point for each time they've been checked in at the same time. This code relatedness thing could get much more complex, but I understand there's a lot of source visualization software out there that's already solved these problems, so we could look at them.

Create a module-by-module expertise listing.

For each code module, look at the subversion history and record the number of lines of code each distinct author has added, changed, and deleted over the life of the module (LOC edited).

Created a shared authorship graph. This one's still very rough
  • For each pair of people, for each code module, give them min(A's LOC edited, B's LOC edited) shared authorship points.
  • For each pair of people, for each pair of related code modules, give them (min(A's LOC edited in both, B's LOC edited in both)*relatedness/something) shared authorship points.
  • Rationale: two heavy editors should get a higher rating than one heavy editor and one light editor, hence the min() construction.
  • Edits in related modules should count for less than edits in the same module, hence the "/something," denominator probably to be determined by dumb tweaking until it lines up with results of surveying the coders about their network or something.
  • Total shared authorship points between each pair of people is strength of connection in the graph
So what do we do with them?

The primary purpose would be to decide on a threshold difference between relationship points and shared authorship points at which we'd consider a pair of people not to be communicating effectively. If Alice and Bob have 2000 authorship points but only 500 relationship points, we would add them to each other's recommended collaborators feed, available as a widget down the side of the Trac project home page with a link to one another's emails or something.

Other possibilities:
  • People can input the name of a module and get back a list of the experts on that module (determined by LOC edited), and maybe a list of related module expertise search links.
  • To really reach, the above could be smarter, perhaps. If I'm writing an in-trac email or bug report that mentions modules by name, it could automatically suggest additional people to copy the ticket to.
  • You could have a list of experts in modules you've recently checked in as a quick-contact box (with manual add and stickying people allowed).
  • Managers can see a visualization of discrepancies between the social and shared authorship graphs to help diagnose organizational inefficiencies.
  • When Bob shows up on Alice's collaborators feed, she can click "Who's Bob?" and see a graph of of the social network with paths between her and Bob highlighted.
Things to consider
  • Should expertise slowly expire? It could make sense for experience within the last year to count more than experience from several years back. This would mean counting expertise points as LOC edited as a function of time - not hard to do since we'll be getting our info from diffs anyways, but it stinks of unnecessary complexity.
  • Should we allow for diff-by-diff updates of the graphs, or assume it'll just be fully rebuilt once a week or whatever? Probably the latter to start off, until we have an idea of just how big the organization is.
  • Must make sure to keep in mind that we're doing all this fancy footwork in order to deliver a final product that's extremely simple so people might actually use it. Other social network graphing solutions exist, we need to focus on making ours simple and directed. The recommended collaborators feature fits, but not all of the others do.



Note: Thanks to Ainsley for terminology correction, and please see her similar post for more information on these ideas.