-=( In Between )=-

Scholarly Online Publishing, Open Access and Library Related Technology
My Internet Resources - My Department - My Profile
Search the RDN Internet Resource Catalogues -
  
Scholarly Online Publishing, Open Access and Library Related Technology
 

In between has moved

 
By ellermann at Mon, 2009-11-16 00:50 | general

Because I will move again, and will get a new job at Tilburg Unviversity as Head of the unit Academic Services, this weblog will move too. I have reserved a new domain (who knows when I'll move again) for the weblog. For all further posts see:

http://henkellermann.nl/inbetween

Hope to see you back there.

email this story | 133 reads
 

What do we want to find?

 
By ellermann at Tue, 2009-05-12 14:06 | general

With the recent news, if not hype, about WolframAlpha - the system that will answer questions - comes the question of whether the search "paradigm" that permeates the library world is valid.

Search, for us librarians, is more often than not about the retrieval of relevant documents. But search is a result of a question that needs an answer. If I want to know when Dewey was born, do I really want documents that contain this answer? First of all I just want the answer and only after that I may need one document for reference purposes.

Question answering might be a better paradigm. The sort of answer one gets may depend on the question and the circumstances. I could get a factoid answer, a list of answers, or something else (a list of documents say that are somewhat relevant). So there is a lot to be done to identify questions, to analyze content, etc.

Search in the library field may have become too narrow a concept. Question answering might give a better perspective on what we need to do. And if we take that perspective, boy, we sure need to work differently. We need to know about language parsing, about formalizing relations between concepts (not simply terms), about the pragmatics of language, about ontologies and the semantic web. We need to learn a lot of things we know very little about, yet.

email this story | 1234 reads
 

Open acces or cheaper access?

 
By ellermann at Fri, 2009-04-03 15:25 | open access

Ewing fulminates against open access. Access has never been better he says. The real problem with journals is their price, not their limited access. And open access will lead to worse articles, he claims, and not without reason. This article is surely worth a read.

One argument he makes is that in an open access (author pays) model it is the publishers and the authors that determine what is published. In the current model libraries (and their clients) determine which journals have "intellectual value". The latter prevents vanity publishing, while the open access model doesn't. As long as the author is willing to pay, he will get published.

I am a bit surprised that Ewing seems to trust only financial punishments to prevent bad science from getting published. However, the status of the journal is, for many years already, a very important element in any evaluation of the quality of the authors. Rankings dominate the field. In fact, I think, it is their rankings that determine the price of a journal, and not the other way around, as Ewing seems to suggest.

So, whenever the day is born on which all journals are open access, there will still be lots of factors that make vanity publishing a futile exercise. Those journals, whether freely available or not, that maintain strict quality norms, will increase an author's reputation more than those journals that "just publish".

Money does not need to enter the picture here. Reputation is still the name of the game: as it should be.

Ewing raises more issues, be sure to read him and reflect on what he says. But all in all I dare say that he misses the point, the point that not money, but reputation (and ranking) really dominates the field. And that will not change in an open access publishing mode.

Ewing wants us to focus on the economics of publishing (re journals). Nah, we should simply increase their access, which is not optimal at all at the moment. We should also focus on good evaluations of journals, but universities can be trusted to take up that challenge, simply because they already do.

email this story | 605 reads
 

New Dutch Magazine about Digital Libraries

 
By ellermann at Tue, 2009-03-17 12:59 | general

There is a new magazine called "De digitale bibliotheek". They have a web presence too.

It's a magazine, with a forum, and stuff... A magazine you have to pay for. If you subscribe to enough magazines (that is, when you subscribe to more magazines), you will become a member of the Essentisals Community and may expect to pay a little less when attending their workshops and masterclasses.

So, why this magazine? Well, the usual: to bring people together, to inspire and motivate them. The magazine wants conversation, because (quoting someone) "conversation is the platform today".

This is not a magazine with a concrete mission and it has no focus in terms of content (conversation is not a focus!), it just wants to give digital librarians a platform to write for and read in. What about? Well, about whatever is new, I guess. News we all can find on the web already, thanks to weblogs, mail, magazines that are open access (like dlib), twitter, etc..

Will it get us working on a national digital library? Will it give us in depth detail on interoperability? Will it share code? Will it focus on the architecture of digital library systems? Will it help us think on the uses and functions of digital libraries? I guess not.

My prediction for this magazine therefore: A glossy with lots of little blocks, lots of pictures of people who have said this or that on God knows what, as long as the term digital library is part of it, interviews, awards, CV's, opinions, workshop announcements and reports... Well, stuff like that. This is social networking implemented in an old fashioned technology: the magazine that costs you money.

No, we really don't need another general magazine. We may need more magazines with a strong focus, a bit like D-Lib and Ariadne, we may need more technical journals, more theoretical journals, more practical journals: journals by and for people working with and thinking about digital libraries and magazines with some form of quality control (peer reviews for instance).

I'll keep an eye on this magazine and I might change my opinion some time in the future, but, having said that, I am not going spend too much of my time on reading it, and certainly no money.

email this story | 1069 reads
 

Open Access Again

 
By ellermann at Mon, 2009-02-23 21:10 | open access


... ... ... ... ... ... ... ... ... ... ... ...

Evans and Reimer published a paper in Science on the effects of Open Access entitled Open Access and Participation in Science. They analyse citations to articles in journals indexed by Thomson Scientific’s Science, Social Science, and Humanities Citation Indexes (CI). These include articles and associated citations from the 8253 most highly cited journals (going back to 1945).

The most important feature of this article is, the authors claim, the use of more extensive citation data than previous research did. I think they are right.

Before I present a few of their main results, I like to make some comments that, IMHO, limit the generality of their results.

First:

It is citations they analyse and so the phrase "participation in science" is operationalized as "citing a paper". Nothing wrong with that of course, but we should note that citations are not a perfect measure of the use of articles. Citations in journals, papers, proceedings, books and so on that are not indexed in CI (and many many scholarly and scientific outlets are not indexed in CI), are simply not counted. The real impact of an article is broader then citations in a limited number of, even when important, journals, but no dean will compliment me for this observation. I am still looking for a study that operationalizes impact in a more satisfactory manner.

Second:

The analysis is based on 8253 journals. According to UlrichsWeb there are about 29000 online journals out there, 17000 of which are peer reviewed. I can't figure out whether the journals analysed by Evans and Reimer are all peer reviewed, but I tend to assume they all are. In any case, the percentage of journals analysed in this paper is between 28 and 48 percent of all relevant online journals. The journals analysed are the most cited ones, I will grant that, but still, we do have to recognize the fact that many citations are simply not counted.

Nevertheless, accepting the limitation of the analysis to papers in the CI index that are cited in papers in the CI index, Evans and Reimers make a few important observations.

First, there is no OA effect (no rise in citation frequencies because of being available in Open Access) for three disciplines: chemistry, physical sciences and social sciences. This is hardly surprising because, as Evan and Reimer point out too, Open Access is more or less the norm here. Indeed, of most papers a version can be obtained for free. It would be reasonable to exclude these zero effects from a real OA effect in disciplines where it still matters (that is, where it is not the norm), thereby, probably, increasing the overall 8 percent OA effect reported. (It is a necessary consequence of the fight for Open Access that the OA effect will disappear, obviously).

Second, High OA effects are found for multidisciplinary journals (around 20 %). This accords with many previous findings. For a librarian, who wants to tailor his collection to the local interests, this is no surprise at all. With the growing importance of multidisciplinary research, this part of the overall OA effect should be taken as a very important argument for Open Access. Alas, an analysis of cross-discipline citations has not been performed by Evans and Reimer, and this I do consider an omission. My conjecture here is that the OA effect would be high for cross-discipline citations, well, lets be wild, and claim that it will be even higher than it is for multidisciplinary journals.

Third, absolutely marvellous is the attention Evans and Reimer give to the OA effect for authors in developing countries. For poor countries the OA effect may, in general, reach heights of 30 percent. They present a world map of OA effects and the results are almost painful: only Europe, North America and Australia show a relatively (sic!) low OA effect.

So, despite a few omissions and a few criticism, I think this paper demonstrates the importance of OA quite forcefully. Multidisciplinary research benefits greatly from Open Access. For Developing countries Open Access is nothing but a blessing. And for the rest it is very nice at the very least.

Given all this, it is saddening to see that a national dutch newspaper, De volkskrant, presents these results under the header: "Free journals only lead to a few extra citations" (uit: Volkskrant wetenschapsbijlage zat. 21-feb-2009: Gratis tijdschrift geeft weinig meer citaten).

Yuck.

email this story | 717 reads
 

One reason why libraries will become obsolete

 
By ellermann at Thu, 2009-02-12 14:45 | general

There is a nice post by Will Sherman. 33 reasons why libraries and librarians will be important, even in the digital age. A large number of these reasons concern the relevance and quality of the materials hosted by a library and the expertise librarians have, for instance for describing books (etc.) adequately.

That is all fine. Will Sherman however seems to assume that all these different functions (re the management of information) are performed by libraries and librarians and that they will remain doing that. I tend to disagree.

No one doubts the importance of information management, but the tools and the workflows and the organisations that (should) exist to manage information require competences that are not easily found within library walls. Text mining, information retrieval, the management of peer review, publishing, the semantic web, ontologies, the architecture of the web, theories of classification, automatic term extraction, web archiving, the identification of authors, documents and institutes, protocols for information exchange, digital rights management, etc., etc., will be of growing importance and the skills needed for these tasks is at the moment not sufficiently available within libraries.

So the conclusion is:

Libraries "as we knew them" will become obsolete if they want to stay libraries. The same goes for librarians.

QED

email this story | 866 reads
 

Collections in the Digital Age?

 
By ellermann at Tue, 2009-02-03 07:38 | amazon

Mary Frances Casserly is one of the authors who has thought about the meaning of a collection in the digital age (Casserly, M.F. (2002). Developing a Concept of Collection for the Digital Age. Libraries and the Academy 2.4 (2002) 577-587. The article is relatively old, but that's ok.

One of the problems one faces is finding a metaphor to describe a collection that for a large part consists of resources available on the internet. She mentions a few (citing others), like interface, logical gateway, information commons, gateway library or even information population.

The main idea, rather obviously, seems to be that there is a huge collection of information on the internet but that the collection (the one deemed relevant for... well whatever) is a subset that needs to picked from the total set of available online resources.

I find it quite remarkable that the new collection is seen as the result of a process of picking elements, a process similar to finding shells on a beach. The delivery of new resources is, as a process, set apart from setting up a collection. It is the sea that bring us new shells, and the sea is a mystery.

What if we expand the notion of a collection in such a way that the sea becomes part of it? The main issue with any sensible collection is quality control. We don't want ugly things in our collections. But if documents, and this surely is the case in the digital age, become fluid, for instance when there are many version of one document and when documents show up as movies, datasets and the like, and when it becomes hard to judge such a huge variety of documents with respect to their quality, it might be a good idea to refocus quality control; away from the documents towards the people that add documents. Qualified people can add documents.

Then a collection is not a simple store of documents anymore, but a rather complex system of interrelated documents, controlled by a selected group of people.

Librarians "just" need to make the system searchable.

Well, I don't know, really...

email this story | 831 reads
 

Rankings and Repositories

 
By ellermann at Tue, 2009-01-27 21:25 | repository

Our dissertations repository (here or here) made a huge jump on the webometrics rankings. Our position on the list was near 300, yesterday. Today we are on position 14 or 23, depending on where you look (here and here)

All in all something to rejoice in.

Or not? Well yes, of course, but there are some issues too.

The reason we made the jump is because one of my colleagues, Wim Braakman, contacted webometrics to ask details about their harvesting procedures. I am not going into the details here, let us just say that the webometrics procedures are a little strange, they cannot handle all regular harvest addresses. Redirects for instance are a problem. Anyway, Wim Braakman read the specifications, contacted them quite often (he was persistent), and in the end webometrics was provided with an URL they could handle, but which does not capture all our content.

The current ranking is only based on our dissertations repository. Without considerable re-engineering of the domain names in our repositories and aggregators all the other documents we have collected are not counted by webometrics. Indeed, not even half of our total number of documents are counted, so our real position should be considerably higher.

Yes, we are glad we have moved up in this ranking. No, we are not happy with the rules and regulations that webometrics uses. If we only had one repository, as most institutions seem to have, it would be no problem. But we work with a large number of specialized repositories, adapted to the needs of the users, with the sad consequence that we are not harvestable fully, at least not by webometrics. No, not even our aggregators will do the job. We need to build a special one for webometrics and that we flatly refuse.

In short: webometrics should re-engineer their harvesting procedures because this is not entirely fair, not even to us.

email this story | 604 reads
 

A Paradox?

 
By ellermann at Mon, 2009-01-26 22:42 | general

Is this a paradox, or something I just don't understand, or something that simply isn't true?

The many digital libraries that have been developed by librarians can be characterized als closed systems. Making metadata and documents re-usable by third parties is often not even considered, and when it is considered, it is limited to a set of partner libraries, tightly controlled by contracts and lawyers. General re-use of software and data does not seem to be a primary concern. Yet, tons of standards have been developed to describe documents. In other words: the organization of knowledge (contained in documents) is heavily standardized (gazetteers, controlled vocabularies, metadataformats like MARC, MODS, perhaps even FRBR). There is some re-use of data, of course. Union catalogues are an example, but despite the immense efforts spent on standardization, the software built by institutional digital libraries seems to ignore re-use.

Yes, I am aware of OAI-PMH, of SRU/SRW, but these are relatively recent inventions, and I simply cannot find that much software developed within institutional libraries that freely offers such interfaces, nor data that can be transmitted through such interfaces.

Software developed in the context of non institutional digital libraries (librarything comes to mind, but also bookmarking sites like deli.co.us and perhaps OAIster) do offer such interfaces, but often do not use the best of the standardized knowledge organisation schemes that library science has to offer.

I find this puzzling. Is it simply because institutional digital libraries are focussed on the members of their own libraries and ignore the rest of the world? Is it just a matter of ownership and money?

email this story | 583 reads
 

5S

 
By ellermann at Wed, 2009-01-14 16:13 | general

A common complaint against people like me is that we just create our own localized digital libraries. A digital library, at least for those working within the confines (I did not say coffins) of a library, is an addendum to a normal library.

Rarely is software, however small, that was built by one group re-used by others. If, as the past few years have shown, it is almost impossible to cooperate on developing any piece of digital library software, we at least would need a framework that can guide software development to make it possible that the software is re-used. Very few attempts have been made to define such frameworks. There are a few however. One seems to be the DELOS framework, which I haven't studied yet, another is the 5S model, one that I am studying now.

The 5S model decomposes the problem of making a digital library into 5 components: Streams, Structures, Spaces, Scenarios, and Societies, all words starting with an S, in case you failed to notice.

Streams are basically information resources on the Internet. Text files, streaming video, you name it, they are all streams. Structures are structures within the streams. Perhaps the prime example is a XML document, where the XML explicates the structure of the stream. Other structuring principles are possible of course. Spaces are the operations one can perform on the (structured) streams. Operations can vary from indexing to defining an ontology. Scenarios is where the user enters the scene. Scenarios are a set of operations (state-transitions) that a user performs, or can perform, while using a digital library. Societies, finally, are groups of users with differing information needs. Societies and scenarios can be seen as an explication of user centered design (I surmise).

What is really good about 5S is that it does not stop at inherently vague descriptions such as given in the previous paragraph. Streams, structures, spaces, scenarios and societies are defined in terms of set theory and relational algebra. All definitions are formal. The advantage of this is that such definitions can guide implementation. For example, descriptive metadata is defined in graph theoretical terms, clearly suggesting ways to formalize metadata. If these basic structures are defined formally, exchange of both data and implementations should become easier. They could even hint, well more than hint, at how to define protocols for information exchange.

I wonder if there are environments in which 5S is already used. I am still exploring the model. It seems hard to get information on the practicalities of the 5S model. Nevertheless, theoretically it seems not only sound, but very attractive too. I like pretty... uh... work.

email this story | 742 reads
 

PurpleSearch launched

 
By ellermann at Mon, 2009-01-12 21:47 | amazon

Today we officially released the first beta version of PurpleSearch, software for federated search developed by people from my department, the digital library department of the university of Groningen. André Keyzer in particular is responsible for the design, Bart Alewijns has been the main programmer of the system. It is a beta version. Also a number of features that were present in its predecessor livetrix were dropped, or are given a less prominent place, because the user interface had to be as simple as possible. PurpleSearch also offers a few webservices, in particular a recommendation function that returns an "educated" guess consisting of a number of databases that might be relevant for a query. PurpleSearch is a learning system in that it stores all queries ever entered by users and determines which databases return a significant number of hits given the query.

The following text, taken from the PurpleSearch helpfile, describes the system.


Purplesearch enables simultaneous search in the most important scientific and scholarly databases. It is an interface that eases and enriches federated search.

It eases this method of searching by not requiring manual selection of the databases to search in. PurpleSearch learns, over time, what each database contains and will give good results for any given search query. PurpleSearch combines smart search techniques, local indexing, and using that index for each new search. As such, presented results are those from a search in the best scoring databases for a query. It is also possible to do targeted searches within different databases.

Among other things it chooses databases that are likely to give results for any given query. As this does not always pick the most important databases for the intended subject area, you may use the subject guide to start searching in the most important databases, or choose the databases you want to search in manually.

It allows catalogue searching for books and other physical resources, and will lead researchers to electronic full-text articles when we have a relevant subscription.

A number of festivities are organized to promote the use of PurpleSearch within the university of Groningen.

It is a nice day. :)

email this story | 720 reads
 

Documents of the Future

 
By ellermann at Wed, 2009-01-07 15:17 | digitization | general | metadata

When we, librarians, deal with documents we add metadata. The metadata are used in a separately developed interface to give people a search and/or browsing interface to find and locate documents.

In this mode of thinking the documents on the Internet are passive objects. The metadata associated with those documents are passive too. Software is written to use the data and present it. All the action is in the extra software.

Another approach that has been suggested is to turn documents into code. Or, perhaps better, to encapsulate documents in code that, when it is sent a proper request, will return a view on the document. Documents become active. So instead of just sending a URL to a server that will return a PDF document, a number of different requests can be sent, and different results will be returned.

Examples of requests that could be sent are (this is just to give an idea):

getDefaultPresentation: returns the document in a default format (which can be PDF, PPT, Word, Tex, what have you).

getPresentation PFD: returns a PDF version of the document (if available).

getIndex: returns an Index of words used (many parameters are possible, for instance a stemming method can be selected).

getAuthor: returns the list of authors.

getKeywords: returns a list of keywords. The request can be parametrized, for instance when a controlled vocabulary is used.

getReferences: returns a list of citations used in the document.

getType: is it a book? an article? etc.

getRights: may return a creative commons license indicating if, and how, the document can be re-used.

getDC: returns metadata in DC format.

getMarc: returns metadata in Marc format.

Problems may arise when information is added to the document after its first publication. Social tags for instance should also be associated with the document and should be retrievable. The code therefore should also accept requests that add data to it. The question of how this has to be implemented is not a trivial one. It might make sense to encapsulate the document in different code sets, not all of which have to be maintained (developed) by the original publisher. So company X will add, as an added value, social tagging information to a document published by company Y. Encapsulations can be layered.

There are many problems to be solved when the latter approach is followed. The main ones are the definition of the interface to the document (in form of a protocol, or document API), and the findability of the different encapsulations.

But would it not be great if all documents became active documents?

email this story | 707 reads
 

2009: The Year of...?

 
By ellermann at Thu, 2009-01-01 11:38 | general

2008 was probably the year of web 2.0/library 2.0. It has brought us lots of new tools and toys to play with, and lots of new evangelists (to toy with). It has brought us too a vision on the information ecology of the future. In this vision, open computation, open access and collective intelligence play an important role.

And 2009?

I hope I am wrong, but.....

..... 2009 will be the year when all the major university libraries will retreat from web 2.0 and will not give their employees the time to toy with new tools. The smell of "being fed up with 2.0 and their evangelists" permeates our buildings. There is a financial crises, there will be budget cuts. So: there will be a massive generalized skepticism about all new information technologies, including open computation, including collective intelligence. If new technologies will be introduced, they will be based on notions librarians are comfortable with, like the extremely silly developments around FRBR. A retreat therefore from the unknown to the safe and comfortable.

2009 will be the year for conservatives, for those who see the library as a service unit that needs a tight management. Technology should be outsourced, systems should be maintained, books should be bought, journals licensed, employees will be clocked; there will be less and less money. Lawyers will help the conservatives and invent new applications of copyright laws to hamper innovation.

2009 therefore will be a year in which libraries lose a decisive battle against companies that do have a vision on search and information-behavior and do have the talent and the time to develop new services, that do know about (text-driven) computing, about collective intelligence, about statistics. These companies will not include most of the current publishers and it will most likely not be Google either. The new companies will come, financial crises or not - and they'll earn good money, for instance doing business with universities, which could actually be earnt by the libraries.

It will be a tough year for us innovative workers within the library, and I am a bit tired already.

We really should unite, and not let library walls determine what is united. And unite does not mean: set up a new platform to "chat a little".

email this story | 827 reads
 

Projects

 
By ellermann at Thu, 2008-10-30 08:22 | general

I have spent the last few days of my fine life on the horrible, horrible task of (co-)writing a project proposal to get some work financed which we'd like to do with OAI-ORE. I will talk about the contents of that project later. For now I'm just wondering whether there isn't a better way to get innovative work in the digital library funded. The proposals one has to write these days - yes, it has gone from bad to plain awful, the whole thing just shows organized distrust. Each activity has to be described meticulously, every hour spent has to be accounted for, as well as each person involved. I hope no one reads this before the proposal is judged, but this juggling with hours and names is somehow pure fiction. I know, people can't be trusted, and if public money is spent on work that me and my colleagues do, we should definitely have to make explicit how it was spent, or will be spent. But is THIS really necessary?

I have no ready answer here, but please, please let some creative individual address this problem and save us from the mindless game of writing a project proposal, of filling in these dreadful forms. Why can't those who have the money simply watch what is going on, take note of measurable outputs a certain group has produced, both in terms of software developed or articles written, and just say: "hey, good work, here is some money, use it well"? This is asking too much, I know, but isn't there some middle ground between the bureaucracy of project-writing and the laissez faire attitude that I would love so much and that seems to be gone completely these days?

Big sigh... the work is not over yet, one more day to go...

email this story | 806 reads
 

The Boundary Problem

 
By ellermann at Sun, 2008-10-26 16:47 | library systems | OAI | persistence

It took me some time to realize, but what OAI-ORE seems to be really about is the boundary problem: The fact that groups of objects cannot be properly identified in the basic web architecture.

OAI-ORE allows one to identify aggregates. On top of that it offers means to describe the relations between the aggregated objects. It allows one to define boundaries between groups of objects.

Of course, any web page can contain a number of links to other pages and documents, but those links are not typed, meaning that it is hard to distinguish between, say, navigational links from links to objects. OAI-ORE may be a way to solve not only the grouping problem (enhanced publications) but may give web archiving a great boost too. Now relatively complex software like httrack or heritrix is used to heuristically define relevant groups, but using OAI-ORE's resource maps, a good hint at what should be archived becomes possible.

OAI-ORE also highlights an often underestimated problem. The transition from the normal library to a digital one needs to be based on descriptions of individual items (and not, as is common in the library field, on the expression or manifestation level) and these items need to be grouped.

Whether OAI_ORE solves all problems remains to be seen. One of the things that may need reworking is the flexibility of OAI-ORE resource maps. As far as I can see now, all the possible relations between documents in an aggregate need to be predefined. But I am not sure if this is flexible enough.

email this story | 784 reads
 

xISSNen WorldCat Search API

 
By ellermann at Thu, 2008-08-21 10:19 | search | Worldcat

OCLC has released two new webservices for Worldcat. xISSN and an API to WorldCat. The xISSN service allows one to retrieve information about a journal. Alternative ISSN's are shown for print and CD-ROM and microfilm versions of a journal, including information about the periods in which the different versions are or were available. A nice service that complements their xISBN service. The latter service was already put to use in our Livetrix software.


* Send searches in OpenSearch or SRU CQL syntax


* Receive OpenSearch responses in RSS or Atom format


* Receive SRU responses in MARC XML or Dublin Core


* Receive MARC XML content for a single OCLC record


* Receive geographically-sorted library holdings information
(each including the institution's name, location and a catalog link) within requests for single records


* Receive records in standard bibliographic citation formats (APA, Chicago, Harvard, MLA, and Turabian)

This indeed makes integration of WorldCat functionalities and information in your own website or application possible.

WorldCat can now rightfully be called an open system.

email this story | 1137 reads
 

Web 2.0 is a Mindthing

 
By ellermann at Thu, 2008-08-21 08:26 | general

It happened a few times now, and each time it annoys me. It may be nothing, but I am not sure. The thing is: I like brainstorming sessions. Not everyone does. That's fine. But a few of those not liking that way of getting things moving rationalize their feelings by saying something like: "We first need an official policy statement from, say, the management, before we start thinking about topic X." Policy first, then brainstorming. Some people need a straightjacket before they will start to think. If there is any value in the Web 2.0 movement, it is the willingness to change and experiment and think! It is the willingness to hand over a vision to the management, instead of waiting for a policy.

email this story | 1011 reads
 

Versions, again

 
By ellermann at Wed, 2008-07-30 09:54 | general

T Scott, in a comment to my previous post on versions of papers, drew my attention to a NISO report with the title Journal Article Versions (JAV): Recommendations of the NISO/ALPSP JAV Technical Working Group. I indeed had missed that one, so thanks to T Scott! :)

The Technical Committee responsible for the report has defined a vocabulary that can be used to describe the many versions of journal articles. They define things like Submitted Manuscript Under Review, Accepted Manuscript, Proof (being a version of an article that is still under scrutiny). The group focused solely on journal articles. The general idea seems to be to incorporate such terms in (a) metadataset associated with the document.

I have nothing to add to their work, although it will always be possible to make other, perhaps more refined, distinctions and to disagree on how to classify a given document. It seems to me they have done a fine job.

My comments don't concern the vocabulary itself, but the (implied?) procedures needed to create the relevant metadata. In fact, the comments just supplement the work done by this Technical Committee. It seems to follow a rather classical 'library paradigm', namely: define a metadataset that will cater for all possible variations, now and in the future, instruct a group of professionals to associate the proper metadata-elements with a given document, and leave it at that. I often called this one-shot metadatadization (because I love ugly words, but it sounds better in Dutch, anyway).

Predefined categories that describe the world as it is now, and as it will be in the future. That is ambitious. Such approaches often fail in practice, for rather obvious reasons. Let us reconsider the problem a bit.

Complex problems can often be addressed by decomposing them into a set of smaller problems. In fact, the problem addressed by the committee can be decomposed into a few smaller, still complex, problems. Let me propose one such decomposition:

First, each document that is put online is tagged with a really, really, minimal metadataset. In fact, all that is needed is an identifier (I once suggested that such identifiers can even be computed from the document itself), a date (computable too, obviously), and a publisher (that can be a person, say the author, or a company, or a university, or... well etc.). Of course, identifying the publisher is the biggest problem here). In an ideal world, these three metadata-elements would be just numbers. Publishing means putting the document online and make sure the three numbers are valid, and cannot be modified after the fact, ever. (Access to the document itself can, of course, be restricted).

Second, and here we start to address the real issue, namely defining an infrastructure for publication, set up services that can, say, relate the numbers to names (of authors), to titles, to all the other metadata-elements that someone fancies (say, statements about quality), for whatever reason. One service could be to make known which documents are (still) available, and describe the access policies. All this happens, in principle, after the fact of publication, but of course, it is not forbidden for one institute (a library?) to take care of both steps. The point is that both steps are logically separate, and that what is accomplished in step 1 is free for all (except, perhaps, have access to the document itself) to use as a foundation for 'real' services.

Three, set up services that relate the documents defined in step three. The work done by this technical committee is a third step activity, but only half way. Saying that a document is proof does not in itself relate it to the version of record (i.e. the published paper). Of course, the Technical Committee has addressed the issue of relations, but solves it in a traditional manner, namely by incorporating in the metadataset a (typed) reference to other documents. These relations can, perhaps, partly be computed from all the data provided in step 2, but new relations can be defined.

This might be called a layered approach (which of course is not a new concept at all).

There can be competition now between service providers; and alternative ways to associate documents with metadata can be implemented, as well as alternative ways to define relations between documents.

What perhaps distinguished the latter layered approach from the first -one shot- approach, is the emphasis on setting up a relational structure (much like the one provided by RDF), and setting up distinct services, and, of course, on defining protocols for information exchange to make those services possible. Not all protocols need to be open, not all need to provide free access to information: everything is possible here. The layered approach focuses on procedure and opens up a future where different parties can focus on a part of the overall problem and come up with different solutions, and compete.

email this story | 938 reads
 

Versions, Versions

 
By ellermann at Mon, 2008-07-28 13:54 | general

There are many complaints about the appearance of articles online in different versions. There are complaints too about the publication of the same articles on different sites, sometimes with different access policies.

Why complain?

There seem to be a number of reasons, most of them have to do with counting. If an author publishes a preliminary version of a paper in a repository and a final version in a good journal, then the preliminary version might receive citations that should really go to the official article. Such citations are not counted by our impact administrators, who, as we all know, are often good acquaintances of the people deciding on tenure tracks, or those deciding on grants. Another reason to complain relates to the lack of transparency for researchers; multiple versions of the same or similar content, may, and most likely will, confuse scholarly communication.

Or, in other words, it is not just skill that determines the survival of the fittest scholar, but a certain amount of randomness is included.

Registration and certification are two of the most important values added to scholarly communication by the decent publishers. They certify a contribution by publishing it, they register a contribution by tagging it with dates, names and metadata, so as to make clear who discovered what, and when. Registration and certification are needed to both make the competition among scholars fair and to make scholarly communication transparent.

But the current (relative) chaos of versions and multiple copies undermines these important characteristics. The repository movement, with all its charms, is also undermining it.

There are, as always, two ways to counteract negative consequences of new innovations. One is to "re-mediate" the old modes of working. That is, constrain the innovations in such a way that old habits don't have to die. The other one is more passive, at least in the beginning. Accept chaos, accept the wildness and try to establish new modes of working that still allow one to establish reasonably objective criteria of scholarly and scientific excellence.

The first approach implies that both registration and certification are to be implemented prior to publishing. Also, some restraint by authors, and institutes, is needed on publishing preliminary work. The second approach implies the implementation of such processes after 'publishing'. One way to implement such systems is by identifying individual contributions, tag them with quality statements and, and this is very important, to establish certain relations between versions of documents. Such relations would have to include obsoletes, supersedes, alternative form, and the like. That is, an a priori infrastructure is needed that can 'accept' new publications and allows for the incorporation of relations as just mentioned, statements about quality, and much much more (but quite often after publication).

Somehow, scholarly and scientific publication requires prestructuring, in both approaches mentioned. The second approach requires less preliminary work by authors, allows for rapid communication and does not require much restraint from authors. The first approach needs more work in the initial phases, but it has been well-established.

If we ever want to establish the second approach, some work is needed to set up the required infrastructure. A minimal set of metadata is needed that identify individual contributions and allow for a (minimal I hope) set of relations between documents. Registration and certification and a -semantics of versions- are the main issues to be considered.

The repository movement, if I may call it a movement, needs to focus on these issues. Very few do, however. Most are concerned with increasing the number of documents in repositories, with using new varieties of metadata, or otherwise tend to occupy themselves with describing complex documents (you know, the article and the odd dataset). It simply is too early for that. If the basic infrastructure cannot cater for registration, or for certification, we are building a plane without wings. Perhaps it is also too late already. Many voices can be heard that detest the inherent chaos of the second approach, at least in its initial phases, and forcefully try to re-mediate old -print-world- habits in a new -internet- environment. Now, who really wants that?

email this story | 1346 reads
 

The Power of Convenience

 
By ellermann at Mon, 2008-07-28 11:04 | general | open access

James Evans discovered something that surprised me, quite a lot, actually.

What I used to think was that improving the accessibility of an article would increase the number of times it was read, that less accessible articles would be read less (of course, the time spent on reading has not increased, not that much, at least). The accessible articles would receive more citations. Accessible here simply means, I think, available online. This includes all electronically available articles, open access as well as those offered to paying subscribers.

Well forget it, is what Evans seems to indicate, it simply is not true. Online accessibility actually concentrates the citations on a smaller group of articles. The articles that receive citations are also more recent for easily accessible articles than for less accessible items. This seems to contradict the long tail of information-theory which implies that old articles would receive more attention when it becomes more convenient to access them. How to model these effects?

Let me quote James Evans.


With five additional years of free and commercial online availability, the number of distinct articles cited within journal would drop from 600 to 200; the number of articles cited within subfields would drop from 25,000 to 15,000; and the number of journals cited within subfields would drop from 19 to 16. This suggests that online availability may have reduced the number of distinct articles and journals cited below what it would have been had journals not gone online. Provision of one additional year of issues online for free associates with 14% fewer distinct articles cited.

This is no small effect. Apparently, as James Evans also suggests, it becomes easier to find relevant articles. The proportion of relevant articles does not change much over time. The citations to suboptimally (articles that are relevant, but some other article would have been more relevant) cited articles disappear.

This one might call the power of convenience. By increasing accessibility, the really important literature becomes more visible. Convenience increases the transparency of scientific quality. It also tends to decrease the time difference between an article and the articles it cites. The citation window narrows.

I am still a bit puzzled by this finding. It is well known that open access articles receive more citations. How does this relate to Evan's findings? Do all open access articles receive more citations? Is there a constant of increase, constant for all articles? Is the constant additive, multiplicative? Or are more complex models needed? Can this be researched?

email this story | 1049 reads
 

A Metric for Repository Success

 
By ellermann at Mon, 2007-09-03 10:15 | general | library systems | metadata | repository

In a nice article called Size isn't everything Leslie Carr and Tim Brody tried to measure the success of a repository.

What is refreshing about their approach is that they do not focus solely on repository size. A repository is successful when it is used. And use here refers not to the number of downloads or views, but to the number of uploads. When the staff uses the repository on a regular basis to disseminate their work, the repository is successful, irrespective of the size. Irregular deposits arise, for instance, when on only a few days a large number of items have been uploaded (batches). This probably means that the maintainers of a repository found a large set of documents. That is fine of course, but it is not in itself a sign of a healthy repository. Besides regularity of uploads the scope of the repository is an important variable too. If it is an institutional repository, the content should reflect contributions from all the disciplines.

Measures for "health", "scope" and "size" can be combined, and this leads to a list of successful repositories, a list that can be extracted from this paper. On a personal note, it is good to see that our Groninger repository is considered to be a healthy repository and therefore belongs to the top 20 of successful repositories.

Comment

Obviously, it is very difficult to "design" objective and adequate measures. Carr and Brody make no hard claims here. More research is needed and further refinements have to be incorporated. What does surprise me a bit though is that repository usage (by reading users) is not taken into account. Even a large and healthy repository with a wide scope can be totally useless if no one comes to watch it. Uploads and downloads both should be considered.

Also, in this paper, all items are equal. Whether the items in the repository consist of both metadata and documents, or just metadata is not taken into account. The question of whether there is an added value of having an item in a repository is not considered either. If the item is already accessible elsewhere, or if an item consists only of metadata that is already available from other (re-)sources, does it have any real value?

Incorporating these aspects was clearly not the goal Carr and Brody set for themselves. But an adequate measure of success should, in the end, take these things into account.

email this story | 3437 reads
 

Web 2.0 for libraries: revisited

 
By ellermann at Sat, 2007-08-25 11:55 | Library 2.0 | open access | semantic web

As some might have noticed, I am having second thoughts on Web 2.0. What I don't like about it is its focus on communication and collaborative work. Well, I like it, but I don't think digital libraries should get too involved in online communication. It is NOT the core business of us library people to make others communicate. Our task, in my humble opinion, is that WHEN people want to communicate (about) scientific or scholarly ideas, whether in the form of an article or a discussion, the library should have the information available which makes writing and talking worthwhile. Worthwhile here means: fully aware of the contributions of others.

In my latest bi-monthly contribution to livre, I have argued that the main task is, after detecting and or selecting relevant scholarly and scientific information, to describe it properly and make it findable and available to all who need it. From this perspective, Open Access is very important, but web 2.0 is not. That does not imply a conservative attitude to the goals of the digital library. On the contrary, "opening up" content requires us to take recourse to rather advanced technologies that are developing around metadata, the description of complex objects, semantic web, and the like. A focus on Web 2.0 distracts us.

My claims also do not by necessity imply, as some seem to think, that digital libraries should focus on static content only. I will leave that open, if only because I have no clear idea what static content is.

Discussion

I am very pleased to see that my claims have provoked a debate. On his weblog Edwin Mijnsbergen asked people to comment on these claims. In short: the comments are great! Perhaps the most important ones state that IF I am right, I may be only right for university libraries, and not for public libraries. That may be a valid point indeed and I am not decided on the issue yet. Communication MIGHT very well be more important for public libraries.

At least I hope the discussion will continue.

Thanks Edwin!

email this story | 5192 reads
 

Librarything goes Dutch

 
By ellermann at Mon, 2007-08-13 14:42 | Library 2.0 | library systems

LibraryThing, the collective and virtual bookshelf, has recently been connected to a number of Dutch libraries and online booksellers (including Bol,Bruna, the KB, the KBR and the Rijksmuseum Research Library). Dutch users can now find dutch titles much easier than they used to. This increases the number of dutch titles considerably and makes it easier to fill your online catalog.

Needless to say, all the other features of LibraryThing become more useful for Dutch users too.

email this story | 2438 reads
 

The Communication Paradigm

 
By ellermann at Fri, 2007-06-29 11:21 | Library 2.0

A recent brainstorm session at our library actually did get me thinking.

The theme of it was library 2.0, web 2.0, or whatever you want to call it. Wouter Gerrits gave a wonderful overview (in 45 mins) of all that library 2.0 has to offer these days. He clearly advocates us going "2.0". He urged us, for instance, to go on Hyves, because around 9000 of our students seem to have an account there. I had a presentation there too (3 minutes and 14 seconds), telling the audience a different story, namely that the library should focus on content more than on library 2.0 issues.

In a recent comment on this meeting by Jacob van der Sluis (here) he contrasted a focus on communication to a focus on content. He argues that our mission should be to make content available, especially content that is not available (anymore) elsewhere. Presenting the library on Hyves, SecondLife, or via sms-es is at best of secondary and often dubious importance. Did the phone revolutionize the library?

Let the content do the communicating?

There seems to be an unhappy relationship between the existing tools for communication and library content. Modern and most communication tools connect people to people, or users to users. Rarely is the inter-user communication explicitly connected to what people talk about. MSN to MSN contact can be about the garden, a sick uncle, as well as about science. The content is irrelevant, or, in other words, does not shape the mode or channel of communication. The same is valid for almost all virtual environments. They too can be used for basically anything. Even social tagging is a communication tool between users about content, but again the content itself can be anything.

Clearly, in the triangle user 1 - user 2 - content, a formative role is not played by content. A librarian should find this strange. If we care to remember, the form in which information traditionally has been cast (books, articles, journals) has been optimized to connect users to content. The same applies to book shelves and metadata. The long history of the book clearly shows how it has gradually been adapted to meet the needs of the readers. The long history of the library clearly shows how access to information has been improved by a long process taking the content of the information into account. There is a connection, therefore, between a single user (user 1) and the content. Note that the connection between the users is ignored. In other words, the second user is ignored.

There is great wisdom in the neglect of the second user in designing information carriers (like books). Because the primary problem is, and should very obviously be, to make the content knowable. Knowable! Not necessarily known! When that hurdle has been taken, the rest CAN follow, if users want to.

Does this mean that we just should select content? My answer is no. One of the disadvantages of the traditional information carriers is that they are "itemized". It is a universe of unconnected items. The maintenance of this universe has always been the main task of libraries. But, and here is the news, with modern tools we can structure this universe. Above the universe of items we can place a web of relations that makes it easier for a user (ONE user) to walk from relevant item to relevant item. Precisely this is what is attempted in the semantic web and in related, but older, disciplines focusing on knowledge representation.

There is a long way to go before we have adapted the universe of items to one user as well as we have adapted one item to the needs of one user. But that is the way to go.

And communication? Sure, that is important and we should advertise ourselves and our skills. We should instruct too. But for that we can use the tools that are already there, and no doubt the many to come. But never is the medium the message. There is therefore no need for us to become experts in that area, although some familiarity with it would be nice.

There is however a distinct need for professionals that can select and interrelate content, because if we don't do that, no one will. Yes, the development of this connected universe of items is the main task for us digital librarians.

email this story | 6953 reads
 

SecondLife, and the library?

 
By ellermann at Wed, 2007-05-02 13:15 | Library 2.0 | library systems

The latest addition to social virtual worlds is SecondLife. In their least modest moments it has been described by creators and users as the sequel to the World Wide Web: Web 3.0 as it were. It is only natural then for a librarian to ask whether SecondLife (SL) has the potential to become ThirdLibrary.

I have used SL for some time now and tried to pinpoint the features that could make it useful for the library. There are a few. It is, compared to the previous generations of social virtual environments, a rich environment. It is possible to create your own objects and surroundings, from chairs via buildings to complete cities or islands. In these environments one can, well.... chat.

But there is a bit more. One can create objects of all kinds, including art, music streams or interactive objects using the scripting language LSL. These objects can be sold "in world" or used as advertisements for outside goods. There is a solid foundation for an online economy and online copyrights, because a creator can determine whether an object can be copied or modified by others; also the creator can set a price before an object is transferred from one owner to the other. The copyright enforcement is still strict and rigid and does not allow for the flexibility of, say, Creative Commons License, which would be beneficial for the (re-)use of scholarly works in SL.

Assuming for a moment that this world became so popular that most academics and students would join SL, these features are, to say the least, promising. Building an online library and populating it with books, reference librarians, search engines, links to sources in and outside of SL (that is the web) and what do you have? A library that can be accessed by anyone, everywhere, and all the time. A dream?

I am sorry to say that such dreams cannot be dreamed yet. First of all, SL is not as popular as some would have it. At any time there are, currently, around 30,000 users (reaching 40.000 at peak hours these days) online. It goes without saying that that is not enough for a thriving academic community, not even if all users would be academics (which God forbid, but that is another issue).

But the misery doesn't stop there.

SL is, to put it mildly, extremely bad in handling text. In SL itself the only useful way to present text on some self-made surface is by using pictures. That is like faxing a book, but without the convenience of a faxing machine!! Sure, there are note cards to present text to others, but the layout in those is basically the layout of an ASCII text, and note cards do not integrate in the environment. It is also not possible to create a text in SL by any other means than typing it. So much for self designed interactive forms, and the like! A solution would be to allow certain objects to act like web clients, but there is no sign that this will happen soon.

The interfacing with the web is also rather bad. There is one command in the scripting language that allows you to read web documents into SL, but besides being rather slow it has a mere 2K limit on the amount of data that can be transferred. There is also a function to start up websites in your browser and there are a few, rather clumsy commands, that allow you to work with XML_RPC, but only, yes only, in SL! And even there it has its quirks. Taken together these functions are simply too limited.

And transferring pictures from the web to SL? Forget it, can't be done without great, great efforts. You can upload pictures from your disk, but for that you have to pay for each pic (ok, only a very small amount, but still!).

The scripting language itself needs a redesign too. Although not without power, it is a very clumsy language. It has no support for multidimensional arrays and data storage is limited to only a few KB per script. It is extremely awkward too, to maintain a library of functions. There are ways to circumvent these limitations, but you better not have day job when you do that.

In short: SL shows great promise, but its promise has NOT been realized yet. Even worse, there is so much focus on performance, and perhaps on letting the user pay for the storage and CPU time, that essential functionality will probably not be implemented in the near future. And essential means here a good interfacing with the web and a decent handling of larger texts. Sometimes nice results are obtained, there are for example RSS readers in SL, but a library needs more and far better text handling capabilities.

Libraries and SL are not a happy marriage yet and I think that the ball lies in the court of the developers and creators of SL now. They have done a great job, but have as yet to undertake the necessary steps to make SL really useful for librarians and academics. "Chat only" is not enough.

(Mind you, it is rather nice to sit by a crackling virtual campfire with people from all over the world, talking philosophy or "what great books have you read lately"...) :)

email this story | 4938 reads
 

Critique of Google: Private Sector Book Digitization and Digital Library Policy

 
By ellermann at Wed, 2007-01-03 16:39 | digitization

Bearman summarizes what Jean-Noël Jeanneney, President of the Bibliothèque Nationale de France, had to say about Google's plans to digitize "the world's knowledge".

  • Google cannot digitize everything so has to make selections. These selections will no doubt be biased towards literature available in English. The magnificent novels written in Swahili are not likely to be digitized soon.
  • The presentation of the books is not good. Quite a number of scans are of poor, if not of very poor, quality.
  • The ranking of results after a search may not be what scholars want. The ranking method itself is a secret. No academic should accept that.
  • Archiving is a real issue. When Google is gone, and it will
    be gone one day, who will take care of the digitized material? And why?

  • Google seems to have digitized work without asking permission from authors. A number of lawsuits resulted.

These are all very serious issues. Especially the fact that it is not sure what will happen to the scans after Google drops out is serious. That Google's goals may not, or will not, match the goals academics (should) have for their publications, also deserves attention from academics and librarians.

Bearman summarizes the issue quite nicely with a plan for action as follows: In the two years since Google first announced its ambitions, I think the D-Lib community has largely given Google the benefit of the doubt; now that some results are visible and the implications are more clear, I think it's time to publicly endorse open access to rights-cleared, high quality, scanned page images and reconsider the appropriate roles for academic and public institutions participating in commercial analogue heritage conversion efforts that don't contribute to this end.

Hear hear...

email this story | 4725 reads
 

Very few journals matter, really

 
By ellermann at Thu, 2006-12-28 12:41 | journal | open access | statistics

John P. A. Ioannidis has written a nice article entitled: Concentration of the Most-Cited Papers in the Scientific Literature: Analysis of Journal Ecosystems in PLoS. Basically it shows that in most, if not in all, scientific fields, very few journals ever publish a high impact paper and, once they do it, you can get rich by betting that it will never happen again. A bit more precisely: in the vast majority of sciences 53 to 94 of the 100 most-cited papers can be found in no more than six journals (which typically is less than 10 % of the available journals). The number of these high impact articles in one journal decreases exponentially (Lottka's law with an exponent of 1.6). Engineering and the Social Sciences are among the few fields where this degree of centralization (in citations) cannot be found.

A nice metaphor guides the work of Ioannidis. Scientific (and scholarly) fields are seen as ecosystems and the journals are the equivalent of species. The number of species, properly weighed by the number of "individuals" (which are here, I presume, the number of high impact papers) says something about the ecosystem. The scientific ecosystems turn out to be severe and admit only a low degree of diversification of species. It is not a "mild" ecosystem that can sustain a large number of species (high impact journals).

This diversification can be quantified. The most familiar measure, to me at least, is the Shannon measure of uncertainty (sum of p.log(p), where each p represents the proportion of papers in the 100 most cited ones published in a given journal). We call it H in the sequel.

A very interesting observation made by Ioannidis concerns the negative correlation between H and the average number of citations received by each article in a field, while there is no (significant) relation between H and, say, the total number of papers, or with the total number of citations received by journals in the field. What this means is that when papers receive more citations in general, the preference for citing a high impact paper becomes even more prominent. Simply increasing citations or the number of journals has no effect on diversification.

Comment

This paper offers some nice empirical findings. The ecology-metaphor has led to interesting applications of certain measures of variation. Of course, the data used are somewhat arbitrary. The boundaries between disciplines are taken from ISI definitions and why would the top 100 papers be more interesting than the top 1000? But these are minor criticisms.

What this paper really lacks however, is a model that could explain the results. A first step towards such a model could be the formulation of a simple, statistical model with a few open (to-be estimated) parameters. For instance, each paper could be seen as a citation "magnet" with two factors determining the probability that it will be cited. One factor, an intrinsic one, could be based on estimates of the repuation of the author or the institute he/she works for, the other factor could be extrinsic. Each time a paper gets cited, the chances of being cited again increase by a certain factor. As a first approximation the reputation factors can be assumed to be constant. It might even pay out to incorporate an Open Access parameter... Anyway, it is not too hard to set up a simple model and to see where it fails. It is a pity that this has not been done.

email this story | 6888 reads
 

ELPUB 2007: Openness in Digital Publishing: Awareness, Discovery and Access

 
By ellermann at Mon, 2006-12-11 16:24 | conference

11th International Conference on Electronic Publishing
13 to 15 June 2007, Vienna (Austria)

Submission deadline: January 10th 2007

http://www.elpub.net

"Openness" is a broad philosophical as well as technical tenet that underlies much of the innovation in the creation and consumption of Internet technologies, which are in turn transforming scholarly communications, practices and publishing across the disciplines and around the world.

ELPUB 2007 is devoted to examining the full spectrum of "openness" in digital publishing, from open source applications for content creation to open distribution of content, and open standards to facilitate sharing and open access. We welcome papers with theoretical analysis, description of models and services, or new and innovative technical results on:

* Publishing models, tools, services and roles
* Digital publication value chain
* Multilingual and multimodal interfaces
* Services and technology for specific user communities, media, and content
* Interoperability and scalability
* Middleware infrastructure to facilitate awareness and discovery
* Personalisation technologies (e.g. social tagging, folksonomies, RSS, microformats)
* Metadata creation, usage and interoperability
* Semantic web issues
* Security, privacy and copyright issues
* Digital reservation, contents authentication
* Recommendations, guidelines, standards

AUTHOR GUIDELINES
Contributions are invited for the following categories:

- Single papers (abstract minimum of 1,000 and maximum of 1500 words)
- Tutorial (abstract minimum of 500 and maximum of 1500 words)
- Workshop (abstract max of 1000 words)
- Poster (abstract max of 500 words)
- Demonstration (abstract max of 500 words)
Abstracts must be submitted following the instructions on the conference website http://www.elpub.net

IMPORTANT DATES

January 10th 2007: Deadline for submission of abstracts (in all categories).

February 28, 2007: Authors will be notified of the acceptance of submitted papers and workshop proposal.

April 11th, 2007: Final papers must be received. See website for detailed author instructions.

Posters (A1-format) and demonstration materials should be brought by their authors at the conference time. Only abstracts of these contributions will be published in the conference proceedings. Information on requirements for Workshops and tutorials proposals will be posted shortly on the website.
Accepted full paper will be published in the conference proceedings. Electronic version of the contributions will also be archived at: http://elpub.scix.net

ABOUT ELPUB

The ELPUB 2007 conference will keep the tradition of the ten previous international conferences on electronic publishing, held in the United Kingdom (in 1997 and 2001), Hungary (1998), Sweden (1999), Russia (2000), the Czech Republic (2002), Portugal (2003), Brazil (2004), Belgium (2005) and Bulgaria (2006), which is to bring together researchers, lecturers, librarians, developers, businessmen, entrepreneurs, managers, users and all those interested on issues regarding electronic publishing in widely differing contexts. These include the human, cultural, economic, social, technological, legal, commercial and other relevant aspects that such an exciting theme encompasses.

Three distinguished features of this conference are: broad scope of topics which creates a unique atmosphere of active exchange and learning about various aspects of electronic publishing; combination of general and technical issues; and a condensed procedure of submission, revision and publication of proceedings which guarantees presentations of most recent work.

CONFERENCE LOCATION

Vienna, the capital of Austria, is one of Europe's most fascinating cities with a rich history and various cultural attractions and reasonable living costs. The campus of Vienna University of Technology is located near the historic downtown of Vienna.

Conference Host: Vienna University of Technology, Vienna, Austria

General Chair: Bob Martens, Vienna University of Technology, Vienna, Austria

Programme Chair: Leslie Chan, University of Toronto at Scarborough, Toronto, Canada

email this story | 5024 reads
 

Holland digitizes a large amount of audio visual material

 
By ellermann at Mon, 2006-09-25 10:24 | audio visual | digitization

The Dutch government has decided to finance a project called "Beelden voor de Toekomst" (Images for the future). The project is a large scale digitization effort to preserve around 285,000 hours of movies, videos and photographs for future generations. Around 154 million Euros have been made available to prevent the loss of audio-visual material with significant cultural, educational and economic value.

Besides digitization the project will also develop an infrastructure for the digital distribution of this material, as long as it serves some 'educational' or 'creative' application.

Six organizations participate in this project: het Nederlands Instituut voor Beeld en Geluid, het Filmmuseum, Nederland Kennisland, het Nationaal Archief, de Centrale Discotheek Rotterdam and de Vereniging van Openbare Bibliotheken.

First seen on: http://www.beeldenvoordetoekomst.nl/home.html

email this story | 4785 reads
 

DFG and the future of Scientific Library services and systems

 
By ellermann at Sun, 2006-09-24 11:01 | library systems

It may be old news to most of you, but I am still catching up. While doing that I found a position paper by the DFG (Deutsche Forschungsgemeinschaft) on 'Scientific Library Services and Information Systems: Funding priorities through to 2105. The document is available in both the German and English languages.

It is an important document, I believe. It seems to aim at giving libraries a 'cornerstone' position in e-science of the near future. It makes it quite clear that an immense effort is needed change the library. Cataloguing needs to change; connecting library systems to other systems is a new challenge; and the handling of metadata needs careful consideration.

The document contains an 'action plan'. The objective is the implementation, in Germany, of an integrated, '... digital environment for the provision of scientific information in all disciplines and subjects by 2015.

So far so good, or even great...

The action plan itself is, I think, a bit disappointing. Sure, immense efforts are needed to digitize materials, repositories have to be networked, better metadata standards are needed, a German 'Cream of Science' would be nice, as well as toolboxes for electronic publishing - who could object?

The reason to be just a bit disappointed however is that very little thought is given to 'architectural' issues. What is needed for the integration of information from a variety of systems? Also: the document (and I hope I am wrong here) breathes a 'top down' mentality: a national structure is needed, portals will be set up. Isn't it better to think deep about how to formalize (and standardize) the notions of integration and then develop tools with which others (scientists, librarians) can make services that are of real interest to scholars and scientists?

I am not saying that national efforts are useless, but they should be, I think, aimed at creating conditions for other people to develop services. But in the Action plan I see no provisions to set up boards that define and maintain protocols for information exchange and no intention to define, say, a minimal metadata standard for interoperability of library systems. I also see no plan to develop tools that can be used, by others, to build new services.

Well, perhaps if they find out that we are all very interested?

email this story | 9386 reads
  
-=( Premature Optimization Is The Root Of All Evil )=-