JaniumEnki 2023

Published in

Janium

4 min readAug 30, 2023

Where does JaniumEnki stand today?

The 2023 release of JaniumEnki includes more enhancements to the product than ever. We feel that this release will please the current customers and open a world of possibilities for adopting the system in new markets with multiple use cases to serve.

Let’s make a recap of what was the origin of JaniumEnki until this point.

Where are we coming from

JaniumEnki’s original design involved a digital repository that could handle local record formats or Dublin Core records, allowing search by different access points to provide access to the available digital content. The first installed systems provided information from photographic repositories, multimedia archives, and historical records.

Since this first release, JaniumEnki had a robust indexing and searching system that allows users to find information agilely.

Evolution

Very soon, the need to support archive records in ISAD-(G) arose, and that was the time when we added support for hierarchical documents to address the basic principles of ISAD-(G):

Description goes from general to particular.
The information must be of relevance for the level of description.
Description must be linked between levels, making it clear the level of the description unit.
Non-repetition of information between different levels.

These enhancements allowed us to serve archives that required a tool to handle archival records and, at the same time, offer digital content linked to those records.

Current Status

JaniumEnki has morphed into a powerful digital content publishing platform linked to records in an information repository or documents that describe an archival fund. The system is a tool for knowledge mining in the organisation. Knowledge mining uses the data collected from all the machine learning services, organised and indexed to uncover relationships and patterns within the documents.

Record formats

JaniumEnki offers several configurations for records in the database. Different formats can coexist in the database per the specific requirements of each installation.

The system administration can add formats to the system configuration.

The list of formats delivered as part of the base system are:

MARC21
UNIMARC
Dublin Core
Dublin Core (extended)
ISAD-(G)

The system has authority control and many thesauri. The list of formats for the controlled language are:

MARC21
UNIMARC
ISAAR
Thesaurus ISO 25964 (multilingual)
Person
Corporate
Geographical
Subject
Others (configured as required)

The following formats are supplied to support ISAD-(G) and ISAAR records:

ISDF
ISDIAH

Content enrichment with machine learning

The process adds information to the records stored in the system obtained from alternate sources. The aggregate information complements the records and helps identify properties and add access points to the documents.

The information is aggregated as an external source to preserve the integrity of the original record.

Currently, machine learning services from Amazon, Google, and Microsoft Azure can process information sent from our systems. Available services include:

Image recognition
Text analysis of documents
Audio-to-text transcriptions
Video analysis

Data Sources

Additional data is collected from specialised services. These vary from Artificial Intelligence systems for image recognition, facial recognition, text analysis, and tagging of people, places, companies, etc.

Data available in the system can enrich records and widen the search access points, embed thesaurus information to enhance the hierarchical display of documents.

The enrichment data helps to surface and discover relationships between documents that otherwise gets lost for the researchers.

Use cases

Visual recognition

Photos stored in the system as digital content can be sent to external machine learning services to be tagged by the content or subject in the photograph, for example, places, text in billboards or advertisements, people, etc.

Facial recognition

People can be identified from photographs, for example, employees at company events, congress attendees, political rallies, sports events, etc.

Image tagging

Image tagging is the process of obtaining tags that describe the image. The tags are indexed to improve the access to the records.

Thesaurus

Documents enrichment using thesaurus information is an example of using the information already available in the system to improve access to the records.

Thesaurus are part of the authority control system and have information that can be used to enrich the records. For example, they can have tags that store information about the knowledge area and subarea for the thesaurus entry. Furthermore, terms for the thesaurus entry in different languages can be available.

Document analyses

Systems that store documents can connect to machine learning services to perform an analysis of the content of the document and retrieve tags for places, personal, corporate, and government names.

Thesaurus entries can be assigned depending on the value of those tags.

These tags and thesaurus entries enhance the access point to the records without needing a person to read and catalog according to the contents.