Thursday, June 14, 2012

Adding information to libosinfo

Some weeks back, Marc-Andre told me that it will probably be helpful for potential contributors if I could write a blog post explaining how new information could be added to libosinfo (the library Boxes relies on for information on various operating systems and their installer medias) so here I'm doing just that. Currently there are two types of information you can add, devices and operating systems. Usually, it'll be the latter that you'd want to add (e.g your favorite OS just made a new awesome release and you want libosinfo to know about it) but for the sake of completion, I'll describe both.

Libosinfo keeps its information database in a bunch of XML files. Although theoretically there could be just one XML file but that would have to be really huge and therefore will be very hard to edit/maintain so we keep each OS distro and device class in its own XML file.

Libosinfo recursively traverses the following locations, assuming application let libosinfo load its own default DB (which at least Boxes does):
  • ${pkgdatadir}/libosinfo/db, where pkgdatadir typically is ${prefix}/share. This can be modified at runtime by setting OSINFO_DATA_DIR environment variable to whichever path you got the custom DB.
  • ${sysconfdir}/libosinfo/db, where sysconfdir typically is ${prefix}/etc or /etc.
  • ${HOME}/.config/libosinfo/db
So if you just want to quickly add some information to libosinfo, the easiest way is to put a file under ${HOME}/.config/libosinfo/db folder (you'll have to create it yourself) with any name but extension must be 'xml'.

The schema of these XML files is pretty straight-forward so just looking at the existing XML files under data/devices and data/oses in the libosinfo source tree will already tell you mostly everything you need to know about the schema.

Adding a new device


Before you do that, you'll need to gather following data about the device in question:
  • Type: Qemu or virtio. If its not the latter, its the former.
  • Bus type: usually USB or PCI.
  • class: video, audio, block, input, net, watchdog, filesystem and memory.balloon are currently recognised values.
  • vendor name and ID
  • device name and ID
The last two you can find from http://www.linux-usb.org/usb.ids or http://pciids.sourceforge.net/v2.2/pci.ids, depending on which bus type device uses. Once you have all this information, you simply add an entry to either your custom XML file or the appropriate file under data/devices/ in libosinfo repository like this (I failed to find a way to embed raw XML here so I converted it into something XML should have looked like):

  (device id="http://pciids.sourceforge.net/v2.2/pci.ids/10ec/8029")
    (name)ne2k_pci(/name)
    (bus-type)pci(/bus-type)
    (class)net(/class)
    (vendor)Realtek Semiconductor Co., Ltd.(/vendor)
    (vendor-id)10ec(/vendor-id)
    (device)RTL-8029(AS)(/device)
    (device-id)8029(/device-id)
  (/device)

the 'id' is created simply by combining the URL of the appropriate ID database (one of the URLs I mentioned above) with vendor and device IDs.

Adding a new OS


This one is better explained by showing you some real examples:

  (os id="http://fedoraproject.org/fedora/17")
    (short-id)fedora17(/short-id)
    (name)Fedora 17(/name)
    (version)17(/version)
    (vendor)Fedora Project(/vendor)
    (family)linux(/family)
    (distro)fedora(/distro)
    (codename)Beefy Miracle(/codename)
    (upgrades id="http://fedoraproject.org/fedora/16"/)
    (derives-from id="http://fedoraproject.org/fedora/16"/)
  (/os)

The 'id' here could really be just anything you like but if you adding a new variant/version of an OS to an existing file of the appropriate family, its good to follow the conventions being followed in that file. Same goes for 'short-id'. The 'upgrades' and 'derives-from' are optional entries. While former is not really used much for anything useful yet, the latter is meant to avoid some duplication.

The most common example of such duplication is list of devices supported out of the box by the OS in question. Notice that we didn't list any devices in the example above. The reason is not that Fedora 17 doesn't support any devices but rather that it inherits all device support from its parent and grand parents. To list devices supported by the OS, you add simple entries like this:

  (os id="http://fedoraproject.org/fedora/17")
    ..
    (devices)
      (device id="http://pciids.sourceforge.net/v2.2/pci.ids/1b36/0100"/) (!-- QXL --)
      (device id="http://pciids.sourceforge.net/v2.2/pci.ids/8086/2415"/) (!-- AC97 --)
    (/devices)
  (/os)

Now in this case, 'id' elements must match an ID of either an existing device in libosinfo's default database or a device you have added in your custom database. If your OS supports the above list devices for example and you don't list them here (or under any parent OS), applications like Boxes might not add these devices to virtual machines they create and you'll end-up with very crappy graphics and no sound in your VMs created for the OS in question.

Another important piece of information is resource requirements and recommendations. Its rather straight-forward as well:

  (os id="http://fedoraproject.org/fedora/17")
    ..
    (resources arch="all")
      (minimum)
        (n-cpus)1(/n-cpus)
        (ram)671088640(/ram)
        (storage)94371840(/storage)
      (/minimum)

      (recommended)
        (cpu)4000000000(/cpu)
        (ram)1207959552(/ram)
        (storage)9663676416(/storage)
      (/recommended)
    (/resources)
  (/os)

'arch' attribute is usually just 'all', unless the OS in question has different requirements/recommendations for different architectures. The units for cpu, ram and storage are Hz and bytes respectively.

One last piece of information you really would want to add is about the installation and live media. While in future we might use it even for things like presenting downloadable OSs in Boxes (and other apps), for now we use this information mainly to detect the OS (along with other properties) given a media (ISO, USB stick or CD-ROM). Here is how that looks like:

  (os id="http://fedoraproject.org/fedora/17")
    ..
    (media arch="x86_64")
      (url)http://download.fedoraproject.org/pub/fedora/linux/releases/16/Fedora/x86_64/iso/Fedora-16-x86_64-DVD.iso(/url)
      (iso)
        (volume-id)Fedora 16 x86_64 (DVD|Disc)(/volume-id)
        (system-id)LINUX(/system-id)
      (/iso)
      (kernel)isolinux/vmlinuz(/kernel)
      (initrd)isolinux/initrd.img(/initrd)
    (/media)

    (media arch="i686" live="true")
      (url)http://download.fedoraproject.org/pub/fedora/linux/releases/16/Live/i686/Fedora-16-i686-Live-Desktop.iso(/url)
      (iso)
        (volume-id)Fedora-16-i686-Live(-KDE)?(/volume-id)
        (system-id)LINUX(/system-id)
      (/iso)
      (kernel)isolinux/vmlinuz0(/kernel)
      (initrd)isolinux/initrd0.img(/initrd)
    (/media)
  (/os)

The 'live' attribute means (as you guessed it) a media that can be simply booted for user to try the OS without having to install it first. If the media in question does not provide an installer at all, you want to explicitly specify 'installer' attribute with value 'false'.

The data under 'iso' element is what enables us to detect the media. You can get this information from a media using `isoinfo -d -i /path/to/iso/or/devicenode` command. I should make it clear at this point that values of 'volume-id' and 'system-id' nodes are not exact copies of the actual volume and system IDs but rather a regular expression.

If you are adding this information to libosinfo's default database and hope to contribute this upstream, we'd very much like you to add this information also to our tests (you don't want us to break support for your favourite OS at some point, do you?). Its very easy, you just put the output of the isoinfo command I mentioned to a file named $FILENAME_OF_YOUR_ISO.txt under test/isodata/$DISTRO/$SHORT_ID_OF_OS/ in the source directory.

As you probably guessed it, the 'kernel' and 'initrd' are completely optional and you only need to specify it for Linux-based operating systems. If you are adding information about a proprietary OS, we probably also need to skip the 'url' element.

Thats it! Happy hacking!

6 comments:

cesar said...

Hi:

The syntax looks very familiar to S-Expresssions but with the verbosity of XML, is there a reason why you do:

(os) ... (/os)

instead of

(os
...
)
?

zeenix said...

cesar,

I knew someone will point that out. :) The reason is that I was frustrated already with inability to put verbatim XML and didn't want to waste more time on it so I mostly just simply filtered all XML I pasted through `sed -e 's//)/g'`.

Anonymous said...

Hmm, so, one question: what's the point of maintaining all of this data in some centralized database? I mean, I see why this is done for Windows and other propriertary OSes, but for the Linuxes at least it should be possible to get everybody onboard to include descriptive information about the OS in the ISO images themselves? Did you guys try to define a spec for this this and convince the various distros to implement it in their ISOs?

zeenix said...

Anonymous,

Since this is not going to work for proprietary OSs (as you said yourself) and given the amount of work (not to mention fights) involved in the alternative you are asking for, I must ask what is wrong with maintaining a centralized database?

Anyways, we are more than happy drop most of our data and help you in any way you can if you could kickstart this ambitious (IMO) project.

jeremy said...

i think this may be usefull, i dont have time to add any myself. probably best someone scripts something for this anyway.
but here is some really usefull information:

http://dcos.net/projects/FOSS-TREE--ISO-PVDs--dcos.net-private-archive-nov-2013.text

ps. i dont use boxes. i use VMM.

jeremy anderson said...

here is some really usefull information, it is probably best that someone does some scripting to automate any future use of this file. but here is a project i have been working on:


http://dcos.net/projects/FOSS-TREE--ISO-PVDs--dcos.net-private-archive-nov-2013.text

ps. i dont use boxes. i use VMM.