A Crisis of Capacity: How can Museums use Machine Learning, the Gig Economy and the Power of the Crowd to Tackle Our Backlogs
Adam Moriarty, Auckland Museum, New Zealand
AbstractAt the Auckland Museum, we are looking at how can we harness the power of a global workforce, free software, and social media to embrace the changes made by the digital revolution. Can we use the "gig economy," machine learning, and the power of the crowd to solve our backlog problems head-on? Can these new ways of working help us to free our time for the more creative and innovative aspects of our roles? Is it better to have an AI-created record online than no record at all? What are the ethical implementations of automated, computer-generated content for museums?
Keywords: machine learning, crowdsourcing, gig economy
At the Auckland Museum, we have a truly amazing collection, estimated to be around 7 million artefacts, specimens and documents. The collections span art to archives, cultural collections to natural sciences, an impressive war history collection, and one of New Zealand’s four major research libraries. In its 2012 Future Museum strategy, Auckland Museum committed to increasing access online, and we have been making good progress. Like many other museums, we want to provide open access, but also ensure that audiences can find meaning in the collections each time they access them. Today, the impact of uncatalogued collections has shifted from what it was a generation ago. Realistically, access to collections in the past was through in-house staff with internal knowledge and expertise in collections. As collection records are made open online, there is an increasing expectation from the public that all of our collections should be immediately digitally available, discoverable and “complete.” Consequently, new audiences are aware of the collections and their potential thus generating new demands as well. In answer to this, our largest ever digitisation and cataloguing project is well underway; we have established a temporary project team of more than 25 cataloguers, photographers, and subject experts for a 3–4 year period, and we are uploading 2,000 new records and images every month. But even so, we estimate that it would take decades to sufficiently catalogue all the collections, far longer than our 3–4 year funding supports. As such, our current projects will only scratch the surface of our backlog, and they don’t take into account the new acquisitions or born-digital content, which is coming in at a volume that couldn’t be comprehended a generation ago. For instance, the museum recently acquired a born-digital collection of one million images and documents. This work may never be completed, and I am always conscious that it is there, a collection waiting, unseen, ever-growing and full of potential. Auckland Museum is not alone in this. These are the problems the museum community faces worldwide. We need to experiment with new methods and technologies to evolve in line with changing museum content collections.
In 2018, a small team of staff started a series of pilot projects to look at new, affordable solutions available online. We decided to experiment with applying AI and the “gig economy” to our collections. This is an interesting time to initiate these pilots as the tools are increasingly under scrutiny in popular media. Indeed, during the development of our pilot projects, several high-profile breaches of trust of large data sets were made public in the media. These raised serious concerns around whether it is ethical that the Museum should be considering using and supporting such technology. Due to this, we kept the ethical and legal issues front of mind. Daugherty and Wilson (2018) summarise these concerns into three key questions that should be asked before deploying technical solutions:
- If we use technology in a new process, how can we do it in compliance with laws and regulations?
- How can we ensure that we have thought through the possible unintended consequences that can create brand and public relations issues for the company?
- What obligations do we have to society to ensure we deploy artificial intelligence for good and not harm?
With this in mind, some of the projects undertaken were: computer vision for cataloguing and the use of the gig economy for image tagging. There was no formal funding or project teams. Involved were merely a few interested staff members from the Museum’s library and information team who were keen to trial things to discover what might warrant more formal adoption. We only had rudimentary Python skills and no developers on site to assist. As such, everything we used needed to be self-taught, simple and well-documented.
For this paper, I will outline how the technology was used, the issues each posed, and the likelihood of adoption as part of our Museum “business as usual” practice.
For the purpose of this document, I am using “Artificial intelligence” (AI) to describe any system that uses computer logic rules or statistical analysis to make programmatic decisions that are, in some ways, “intelligent.” For the case of this pilot, we are using computer vision (a subclass of AI) to recognise common shapes and patterns in an image and describe them. The AI is making intelligent decisions that mimic those of a museum cataloguer.
AI is a contentious topic to a traditional museum, but we must recognise that we are already using it across the business. Most of our staff use the technology every day when assisting visitors with Google translate, using social media systems that decide when to post content feeds, or even our email spam filters. It’s easy to forget that this is all powered by machine learning algorithms (Bradley, R., 2018). So, perhaps we need not be so afraid of this type of technology. Even using AI (or machine learning) to auto-tag collection images isn’t necessarily a new idea for museums. The Powerhouse Museum in Sydney started doing this a decade ago when they used OpenCalais (http://www.opencalais.com/) to add social tags to images from the collection (Chan, 2008). Most recently, Google Arts has led the way with Google Arts experiments (https://experiments.withgoogle.com/collection/arts-culture)—a series of projects that allow for the serendipitous exploration of collections indexed by the Arts and Culture project, harnessing the power provided by AI. These pose exciting developments for the wider museum community as we can see tangible results for the potential of AI technology to enrich the user experience, enabling users to explore the collections in new ways. These all demonstrate AI’s use to increase discovery and online user experience, the front face of the data (information already catalogued and exposed online). What we are interested in now is how we can start enhancing collection data at its root source in our collection management systems with the help of AI. This would be the cultural shift to utilising technology for the very core of traditional museum and registration business.
But can AI realistically support cataloguing processes before we build the overlaying user experience? Could we use these technologies right from the start of the cataloguing process to create basic records? Would evolving our processes in this way open up more of our collections and free up staff time to tackle the more creative and complex tasks?
We decided to see at what stage we could bring in AI technology to help with our backlog. To do so, we fed the computer vision systems previously unseen photographic collection images to see if it was feasible to retrieve basic records. We soon discovered that this posed many more questions for us as museum professionals, such as whether or not it’s better to have a potentially inaccurate AI-generated record in our source systems and online than no record at all? What would be the proportional “success/accuracy rate” of the AI-generated content that might support the argument for utilising this system?
IBM vs Microsoft vs Google vs Clarifai
When looking at computer vision systems there are four big contenders, each offering a similar service. Instantly drawing attention were Clarfiai and Microsoft as they both allowed a high rate of images to be processed for free each month.
As part of the pilot, we tested all four systems in some format, using the strength of each to enhance our metadata, but we prominently used the Microsoft service. Microsoft provided the ability to add captions alongside the tags, and this enriched the experience and provided a better level of basic record context. Alongside the tags and caption, the service also provided a confidence score that indicated how accurate the AI believed the caption and tags to be. This function provided us with the basis for a risk management system, a programmatic way to select which images could be published with the AI-generated content, and which should go back to the cataloguing staff for review.
AI Computer Vision in Practise
We ran 2,000 collection images through the Microsoft system to see how it would perform in practice. We discovered that when the confidence score was high, the caption appeared to be on par with a basic catalogued record and when the score was low (<50%), the captions were misleading often to the point of being comical. As such the confidence score proved to be a good measure and an important feature. It would allow us to automatically reject any record that was
AI-catalogued with a lower than 60% confidence score, and reduce the chance of us publishing many records that could be misleading or embarrassing.
During the initial phase of the pilot, the images selected were from a collection of architecture and landscape photographs, with very few portraits or scenes including people—the systems worked well on this content. These 2,000 images were processed with an average confidence score of around 60%. We were able to take all those with a high score (i.e. >60%) and import them directly into our source system. We added a classification identifying these as auto-created records for administrative transparency. These 2,000 collection photographs now had records that staff and online users could access where they had had none before.
We also hold large war photography collections and images of Māori and Moana Pacific subjects and content which would be considered a high risk for AI captioning because of the negative consequences of publishing misleading information. Although the system worked well with buildings and landscapes, we needed to test it on a series of more complex images, if it was going to be used beyond the landscapes collections. We selected a series of WWI images and a handful of images depicting Māori subjects. The results were less satisfactory but revealed the inherent bias of the system. For instance, photographs of nurses in white uniforms were often classified as “Karate” or “Judo,” almost every group shot of a military unit was tagged as “Sports Team” and most commonly as “Baseball team,” (although the tag “Military” or “Army” would appear, it was further down the list of tags and had a lower accuracy score). The Māori content received generic tags such as “Male standing” or was either tagged as “Native American” or “Chinese.”
Here, we unsurprisingly see the reputational risks and the limitations of the system. Although part of the captions were often correct in describing, literally, what the image was depicting, it was misidentifying the vital cultural context which a human cataloguer would describe in the very first instance and which would pose a huge risk to the reputation of the institution. We have worked extensively to create processes at the museum to protect cultural content—we have implemented a Cultural Permissions Process to ensure that Māori and Pacific images are used appropriately. As such, using the AI captioning for this content seems to directly contradict these values and is simply not something that we would be able to implement for this content, under these current conditions.
We have to acknowledge the bias of the system. It is only as good as the training data, which for the most part has a Northern American focus. Baseball, for example, while having been played in New Zealand since the 1880s, still only has 18 clubs nationwide and most high schools do not have a competitive team, so it is highly unlikely that a given image of uniformed males can be assumed to be a baseball team. Bias is well-documented, and when looking at the tagging of portraits, the systems are known to perform best with images of lighter-skinned males. A study conducted by MIT media lab in 2018 shows a 34.4% difference in the error rate when comparing lighter-skinned men with darker-skinned women (Buolamwini and T. Gebru, 2018). Whilst we acknowledge this bias, it also raises questions about what role museums have in working to correct this, and how might they negotiate with cultural communities for the release of cultural images to improve AI recognition of cultural content. With the dramatic increase in open-collection content through the OPENGLAM movement, perhaps the museum of the future may need to recognise the role of the sector in providing some content with existing, rich-verified metadata to help correct or reduce the current bias in the systems; that will ultimately help the wider museum community in the long run.
The outcome of this pilot for our museum was that we can use the computer vision only for certain, less sensitive collections and that there are definitely additional measures we would need to employ to use it with confidence. For instance, we recognised the need to create a list of the terminology used by the systems that would be problematic if published online. Any image records with generated tag terms such as “deathcamp,” for example, would certainly need to be flagged and reviewed before being published. Additionally, as mentioned above, there is a need to discuss and develop a sector best-practice guideline for how we might show these records online. For instance, ethically, do we need to flag these records as “auto-generated” so the public is aware that they are looking at a record which was not created or verified by a museum staff member? This measure would certainly help the staff feel more secure that the professionalism of their work wouldn’t be undermined with these auto-generated records. Finally at Auckland Museum, we soon discovered that when dealing with indigenous content, there is a need to ensure that we could uphold the principles of our Cultural Permissions Process. That meant we could not use automated systems to categorise the images under the AI’s current capabilities. This decision is based both on the lower levels of accuracy (identification biases) and the sensitivity and reputational consequences of poor data being made public. Although not under the teams remit, we have tested the system on the record images of three-dimensional objects also, and the results did prove positive. It would be interesting to investigate this further, to see how the AI system could impact the workflows of a wider variety of collection content.
To answer the questions of Daugherty and Wilson then, having gone through this process, we believe that in order to use the AI captioning on our photographic collections, we must apply it only to less sensitive collections such as the landscapes, if we are to uphold the values and regulations of our own cultural permissions and ethical processes. At this stage, AI is not yet intelligent enough to handle the delicate nuances of cultural content. As a museum community, we need to deepen and widen our discussions on how we can, in good faith, approach the inherent bias of AI as it stands. These questions we need to have at the forefront of our minds when embarking on physical co-collecting and co-development projects. Once again, we are reminded that the development of our digital practise needs now to be integrated to this extent, to support the creation of a strong foundation to our modernising methods.
As it stands, however, our pilot shows that the confidence score function can be useful as a risk assessment tool to the degree we deem appropriate—whether that is >60% or >80%—when used in conjunction with a mindfulness of that inherent bias discussed above.
Having gone through this pilot, we also believe that any AI-generated records should be marked for us to align with the values of contingent ethics and museum best practices. By doing this, we both protect staff reputation and remain open and honest with the consumer. This transparency is also consistent with good research practice of attributing the sources of evidence or references.
Finally, we recognise now that by strategically utilising computer vision with caution and openness, museums would ultimately be deploying it for good, on two counts. Firstly, that these collections would be given increased accessibility in-house and online, faster. Secondly, that by reducing staff time spent on such content that the AI systems can work well with, staff are able to focus more fully on the more intricate and nuanced collections cataloguing that would otherwise remain in the backlog.
The Gig Economy
During our initial pilot working with the various machine learning platforms, we had soon discovered our own technical limitations and budget restraints which prevented us from hiring in local vendors for minor code creation. This led us to consider if utilising the gig economy would provide a solution. We decided to make this our second pilot project because it looked like it could tick a few strategic boxes and also because of the close link with the initial pilot (Machine Vision).
The gig economy is a growing trend where individuals can sell their services to others via a shared platform. This had promise for the museum in that it would allow us to test technical solutions to our specific problems without a huge investment. It would provide access to a global pool of expertise and enable us to experiment with new, modern ways of working. If this pilot would prove successful, we would be seen to be paving the way for the “future museum” by embracing the changes brought about by the digital revolution—keeping pace with the new work environment and by maximising rate-payer costs by paying only for the actual work required, rather than a complex programme of work. Morgan Stanley’s 2018 research suggested that half of the U.S. workforce will be made up of freelancers working in the gig economy by 2027, and you can see why this system of work empowers people to use their unique skills, on their own terms. You can work from where you want, when you want.
Within the gig economy, the rates are flexible and often charged by the minute, by the hour, or as a single set price for a one-off job. Although mostly associated with Uber or Airbnb, for our purposes, we became interested in Gig Economy sites such as Fiverr (https://www.fiverr.com/), Upwork (https://www.upwork.com/), and Freelancer (http://www.freelancer.com/). These are platforms that allow users to quickly contact and hire specialists for one-off jobs that range from data-cleanup to graphic design, coding to copy editing and much more.
We needed our collection images to be requested from our API, run through the Microsoft Vision API described above (which would provide tags, captions and confidence scores) and be converted from JSON to a downloadable CSV, ready for manual importing into our source systems. We also wanted error handling and logging. As our team didn’t have the technical expertise (Python coding) to do this, we decided to investigate whether we could use these gig platforms to connect us to a developer who could.
As an affordability test for a sector that often operates on a tight budget, we agreed on a budget of $10USD. It took under an hour for us to find someone. They charged $10 to write the required script which would run over the first 5,000 images requested and make them ready to load into our database. This “job” proved to be a simple task for the Fiverr contractor where it would have taken us much longer in-house, costing a lot more in staff time, or we would have had to wait for it to get wrapped into a larger parcel of work for a local contractor.
At this stage, we contacted our people and organisation team to ensure that this was all above board with respect to employment and contracting policies. There were some concerns from the team that we were outsourcing the work to an unknown entity, over whom we had no control or knowledge of working conditions and the like. For the sake of this pilot, we were allowed to go ahead to continue the trial whilst acknowledging the inherent risk.
Once the data was ready to import as a CSV, we could start using those confidence scores to decide which records could be uploaded directly and which required some form of quality control (as they sat around the 60% threshold of that confidence score). This begged the question, could we use the new, digital workforce to help here too? The Fiverr and Upwork sites didn’t seem like the right place for this type of outsourced quality control, as it would be more of an ongoing requirement and a repetitive and simple task that needed human reasoning rather than a piece of code.
Amazon Mechanical Turk appeared to be an answer to this new problem as it provides access to a huge pool of people interested in completing simple tasks that require human intelligence. The site works as part of the gig economy, however, the jobs are paid on average between $.01USD and $.10USD and can be picked up by multiple people and on an ongoing basis. The site has been around since 2005 and works on the principle of users repeating thousands of discrete, simple tasks to make money. Amazon manages the platform and marketplace, and collects a commission on the work completed. On the face of it, this seemed ideal for our purpose.
The task we needed completing was for someone to ensure the machine-created caption matched the image with a simple ‘yes’ or ‘no’ response. In theory, we would take all the yes’ and upload them into our system. The no’s would go back into the cataloguing backlog for future review by a museum staff member.
Before the work started, we created an account and decided to complete some tasks ourselves to trial how the process would work so that we could efficiently design and describe our project. In the example we selected, the task was to complete 20 lines of OCR transcription work for $.08USD. This test quickly raised a red flag for us. At that rate of pay, we would need to complete around 4,000 lines an hour to meet the NZ minimum wage. This seemed an unbalanced effort for minimum reward, and no one in our pilot team came close to making that target. On further investigation, the unsavoury side of the micro gig economy is well-documented and something that as a sector we need to be aware of. The people competing for these jobs aren’t employees of either Amazon or the company requesting and providing the tasks. Therefore, they don’t receive any benefits or protection as would be standard for an in-house employee. For instance, no guaranteed minimum wage or paid leave. The work is often labour intensive, and the workers are operating in a competitive and volatile work environment with many tasks taking longer than expected. The requester can even decline the work which has been completed by a worker if they don’t think it meets their requirements, and in doing so they don’t have to pay the worker. A study by the Pew Research Center showed that 29% of users had not received payment for a job at some point in their time working on the Mechanical Turk (Hitlin,P. 2016).
If museums choose to use the platform, we could pay a higher rate for work, ensuring that people can make a living wage on the jobs that we post. The bigger question is though, should we support this volatile work environment at all? Unlike the gig economy sites such as Fivver and Upwork, the power and security is all in the favour of the requester, and the often low rate of pay is indicative of the exploitation of an unprotected workforce.
We can remind ourselves of the role museums play in society and, in particular, the code of ethics that we abide by:
Members of the museum profession should observe accepted standards and laws and uphold the dignity and honour of their profession. They should safeguard the public against illegal or unethical professional conduct. (ICOM Code of Ethics 8).
Looking at the three original questions posed by Daugherty and Wilson and the code of ethics above, the use of the micro-gig economy seems problematic to say the least. The ICOM Code is most easily considered and applied in face-to-face situations and transactions. Working with a digital global workforce in this way means that we can’t necessarily maintain the ethical standards that we uphold as part of the museum sector or in our own local contexts. Using microgigs seems like we could be crossing the line of our code of ethics by supporting a system that often does not pay a fair wage, offers no employee protections, and appears to facilitate the exploitation of its online workforce. On the other hand, the approach potentially offers work and payment to those who might otherwise have none. An attribute of the micro-gig arrangement is that the parties usually never meet each other, so it is impossible to verify each other’s context or motives, making it difficult to apply ethical codes. But this situation befalls the whole online and social media world, and debating that is beyond the scope of this paper. At best, we can say that this is a system that hasn’t yet established a strong foundation of employment best-practice. Though as a requester, we could pay more for our posted jobs, we would still be using a system that actively promotes .10USD as a starting wage. Ultimately, as a sector, we are here to do good and to support a better and sustainable society, and as such, our project chose not to continue the pilot to use Amazon Mechanical Turk. Maybe, we will reconsider as legislation catches up with this new way of working.
As the way in which we communicate, collect, and share our human story changes in this digital age, the rate and range of collecting and digital archiving increases exponentially for museums. In many ways, the ambition of the contemporary museum has grown. Not only are records of our world far more numerous, we challenge ourselves to do more to share, give access, and use it. If we are to keep in step with this shift from keeper to sharers (Clare, 2014), we need to embrace the possibilities of new tools for us to adapt and expand our working processes. Machine learning and the gig economy both offer opportunities for museums to tackle our cataloguing backlogs. We should not be afraid to explore, to trial and test new solutions to old problems. We have a strong code of ethics to guide us as we make decisions and implement solutions. When trialling new technology, we should keep our eyes open to the inherent bias and the bigger picture for those unseen workers.
At this stage, we have decided to continue working with sites such as Fivver, as the user can set their own price and terms. We are happy with the way machine learning is helping with image tagging and will continue to investigate how we can improve and expand its use. Our next pilot is looking at how we could use crowdsourcing and chatbots to help work through the quality control issues. We made the decision to not use micro gigs.
Solving the backlog and opening the collections is important, but more so is paying a fair wage and abiding by international museum ethics and national employment regulations.
- Auckland Museum. (2012). “Future Museum Strategy.” Published 2012. Consulted December 20, 2018. Available at: https://www.aucklandmuseum.com/getmedia/453249c8-73a5-44a8-a4ba-055cf737465d/auckland-museum-future-museum-master-plan
- Auckland Museum. (2016). “He aratohu mō te tono i ngā whakaahua Māori—Guide to requesting Māori images.” Available at: http://www.aucklandmuseum.com/getmedia/b55badf0-5d18-40e3-99f8-290760563444/awmm-library-guide-to-requesting-Maori-images-download
- Bradley, R. (2018). “16 Examples of Artificial Intelligence (AI) in Your Everyday Life.” Published September 2018. Consulted January 15, 2019. Available at: https://medium.com/@the_manifest/16-examples-of-artificial-intelligence-ai-in-your-everyday-life-655b2e6a49de
- Buolamwini and T. Gebru. (2018) “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” Conference on Fairness, Accountability, and Transparency, New York, NY, February 2018. Consulted January 8. 2019, Available at: http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
- Chan, S. (2008). “OpenCalais meets our museum collection / auto-tagging and semantic parsing of collection data.” Published March 2008. Consulted December 21, 2018. Available at: http://www.freshandnew.org/2008/03/opac20-opencalais-meets-our-museum-collection-auto-tagging-and-semantic-parsing-of-collection-data/
- Clare, R. (2014). “Museum Movement—From Keepers to Sharers: Evolution or Revolution?” Museum ID Magazine. Consulted February 7, 2019. Available at: https://museum-id.com/museum-movement-from-keepers-to-sharers-evolution-or-revolution-by-roy-clare/
- Daugherty, P. and Wilson R. (2018). “Human + Machine: Reimagining Work in the Age of AI.” Harvard Business Review Press.
- Hitlin, P. (2016) “Research in the Crowdsourcing Age: A Case Study.” Published 2016. Consulted January 5, 2019. Available at: http://www.pewinternet.org/2016/07/11/research-in-the-crowdsourcing-age-a-case-study/
- “ICOM Code of Ethics for Museums” (2004), Consulted December 20, 2018. Available at: http://icom-oesterreich.at/sites/icom-oesterreich.at/files/attachments/icom-code-en-web_1.pdf
- Morgan Stanley. (2018). “The Gig Economy Goes Global.” Published June 2018. Consulted January 5, 2019. Available at: https://www.morganstanley.com/ideas/freelance-economy
Websites Used in Projects
Watson Visual Recognition. https://www.ibm.com/watson/services/visual-recognition/
Google Cloud Vision. https://cloud.google.com/vision/
Clarifai Computer Vision AI. https://clarifai.com/
Amazon Mechanical Turk https://www.mturk.com/
Moriarty, Adam. "A Crisis of Capacity: How can Museums use Machine Learning, the Gig Economy and the Power of the Crowd to Tackle Our Backlogs." MW19: MW 2019. Published January 15, 2019. Consulted .