Wikipedia SF Hackathon notes

From Worden
Jump to: navigation, search

I'm at the Jan. 2012 Wikipedia San Francisco Hackathon this weekend. It's got workshops on the MediaWiki API and JavaScript programming, so I'm hoping to come away better prepared to write API features and Ajax interactions for WorkingWiki.

API/Ajax notes

I went to Roan Kattouw's session on using the API and Brion Vibber's session on JavaScript gadgets, but they didn't answer all my questions, so I got Roan to sit down with me and give me the quick rundown on how to implement API actions and Ajax features in a MediaWiki extension. Here are my notes.

  • To create a new API action, write a class that extends ApiBase. [This is for my needs: if you're writing a custom kind of query, write a subclass of ApiQueryBase, etc.]
    • Use ApiCongressLookup as an example. See also the RecentChanges module in MW core for another example of how to implement lots of things in this class.
    • use getAllowedParams() to declare the things that can be passed into your action.
    • lots of validation is provided. If you need an edit token, you can include a param called 'token' and the framework will do the checking for you.
    • return your results by calling $this->getResult()->addValue() one or more times. The framework takes care of translating it to the various output formats, XML, JSON, etc. XML format may need some extra hand holding, for instance if you output a PHP array that has the implicit integer indices.
      • read docs on ApiResult class to understand this better. See also ApiFormatBase class and its subclasses for how the results get formatted.
  • Declare your class to the autoloader so MW can load it as needed.
  • Declare your action by writing a line like $wgApiModules['my-action'] = 'MyApiClass';
    • Given that, when api.php receives a query with 'action=my-action' it will create and use an object from class MyApiClass.
    • If you're writing an ApiQueryBase subclass, you would use $wgApiPropModules['my-prop'] = 'MyQueryClass' instead.
  • To implement Ajax calls on the client side of the wiki, use jQuery.ajax(). Roan recommends using that rather than shorter forms such as $.getJSON(), because it has better error handling.
    • See the jQuery documentation for jQuery.ajax(), as it is not specific to MediaWiki. This function can be called many ways, but one good way is with three arguments: a URL, a JSON-type object describing the query terms, and a callback function to be invoked when the result comes back. Use mw.util.wikiScript('api') to get the api.php URL.
      • you need to use the ResourceLoader to make sure mw.util is available so you can do that.
  • use the FireBug addon in Firefox to eavesdrop on your Ajax transactions - it's very powerful and you can use it to intervene in the javascript in lots of ways as well.

Update: Neil K. adds, on twitter: Also check out mediawiki.api.js, which was added to core recently. Makes it easy to handle errors, get tokens, etc.

CSS notes

I also have had some discussions about good ways to programmatically transform CSS files and sanitize user-generated HTML content.

The CSS issue is that WW projects can generate HTML output that comes with CSS stylesheets. LaTeXML does this. The CSS describes how the HTML content should be formatted. But typically it's written so it formats the entire web page it's included in, and we need it to format only the document it goes with, not the wiki page that contains it. So just outputting the CSS would be disastrous - it would completely garble and distort the appearance of the wiki.

For instance, LaTeXML produces a full document with a title, section headers, etc., and along with it a stylesheet called core.css that includes rules like <syntaxhighlight lang=css> h2 { border=none; } </syntaxhighlight> (this is a fake example, but representative). If I were to pass that along to the output it would make the lines disappear from below the section headings in the wiki page. I need to wrap the HTML document created by LaTeXML in an element like <div class="latexml">, and I need the CSS to be written in a way that limits its scope to that element: <syntaxhighlight lang=css> .latexml h2 { border=none; } </syntaxhighlight> ... presently I don't have a general way to do that. I actually have a handmade CSS file for LaTeXML output, and I just use that and don't use the core.css that comes with each output file at all. That has serious drawbacks, because it may become buggy any time when an update to LaTeXML causes it to output something different in the CSS.

Some approaches that Roan and Trevor Parscal suggest:

  • Trevor says it's not that hard to transform CSS using regular expressions, because it's a pretty simple language.
  • SaSS and another framework for generating CSS could be useful.

HTML Sanitizing notes

What to do with user-generated HTML to protect against privacy-violating javascript calls, image loads, etc.? This is a nontrivial issue because there is a staggering range of things people can do that reveal private data about website visitors, and it's very hard to filter all of them out, and because WorkingWiki being designed to (a) allow users to do powerful computing operations by defining makefile rules and (b) generate HTML output in order to allow the powerful LaTeXML display of documents means that users' projects can output arbitrary HTML code.

Some suggestions

  • Use a whitelist of allowed HTML constructions, not a blacklist (I knew that already).
  • Consider BeautifulSoup. Sumana's husband worked on it.
  • Google has a project called Caja that intends to do some very powerful sandboxing to protect parts of a web page from each other. It has a JavaScript widgets focus, but likely also addresses the smaller task of providing containment for untrusted HTML code.
Personal tools
Namespaces

Variants
Actions
Navigation
Projects
Go:
Toolbox