Previously, we gave a quick tour of the KvasirA API and how to use it. In this post, we’ll show how the API can be used to build a simple command line client for KvasirA using Python (v3):
Before we begin, you may need to install the requests
library, like so:
pip3 install requests
As a quick reminder, to get the list of available public document libraries, we need to perform a HTTP GET request to the API using the address
https://demo.kvasira.com/api/libraries/
In Python, this request is simple to do using the requests library:
import json, requests
api_url = 'https://demo.kvasira.com/api/libraries'
r = requests.get(api_url, headers={
'Content-Type': 'application/json',
})
response = json.loads(r.text)
A successful request will return a status code of 200. In this case printing the available document libraries in alphabetical order is easy:
from operator import itemgetter
libraries = response['data'] if r.status_code == 200 else []
for library in sorted(libraries, key=itemgetter('title')):
title, library_id, running, description = \
library['title'], library['id'], library['running'], library['description']
if running:
print(f'{title} ({library_id}) - {description}')
The output should look as follows:
ArXiv (arxiv) - ArXiv papers
Hansard (hansard) - UK House of Commons speeches 1979-2018
Patents (patents) - USPTO patents - data provided by PatentsView
RFC (rfc) - Internet RFCs and drafts
Twitter (twitter) - A random sample of tweets from 2017-2018
Wiki (AR) (arwiki) - Arabic Wikipedia articles
Wiki (EN) (enwiki) - English Wikipedia articles
Wiki (ES) (eswiki) - Spanish Wikipedia articles
Wiki (HI) (hiwiki) - Hindi Wikipedia articles
YouTube (youtube) - A sample library of 90 000 YouTube videos
To query a document library, we need to issue a POST request to
https://demo.kvasira.com/api/library/LIBRARY_ID/query?query_type=[url|text]&k=N
,
where LIBRARY_ID
is the id of the document library we want to query, N
is a parameter that indicates the desired number of results, and the query_type
parameter specifies the query type. In Python, this request can be done as follows:
target_url = 'https://en.wikipedia.org/wiki/Merge_sort'
library_id = 'enwiki'
k=6
call_url = f'https://demo.kvasira.com/api/library/{library_id}/query?query_type=url&k={k}'
r = requests.post(call_url, data=json.dumps({'doc': target_url}), headers={
'Content-Type': 'application/json',
})
response = json.loads(r.text)
Looping over the results and printing them is easy:
if r.status_code == 200:
for i, result in enumerate(response['response']['results'], start=1):
title, url = result['title'], result['uri']
print(f'{i}. {title} - {url}')
This will give us neatly formatted query results, for example:
1. Merge sort - https://en.wikipedia.org/wiki/Merge_sort
2. Sorting algorithm - https://en.wikipedia.org/wiki/Sorting_algorithm
3. Merge algorithm - https://en.wikipedia.org/wiki/Merge_algorithm
4. Insertion sort - https://en.wikipedia.org/wiki/Insertion_sort
5. Quicksort - https://en.wikipedia.org/wiki/Quicksort
6. External sorting - https://en.wikipedia.org/wiki/External_sorting
A complete command line application with neater output, error handling and argument parsing is given below. It requires the requests
and colorama
libraries which can be installed using the Python package manager pip with pip3 install requests colorama
.
import argparse
import json
import requests
import sys
from operator import itemgetter
from colorama import Fore, Style
BASE_URL = 'https://demo.kvasira.com/api/'
def print_query_error(message):
print(f'{Fore.RED}Query failed: {message}{Style.RESET_ALL}',
file=sys.stderr)
def print_query_success(response, print_summary):
results = response['results']
for i, result in enumerate(results, start=1):
title, url = result['title'], result['uri']
if print_summary:
print(Fore.GREEN, end='')
print(f'{i}. {title} - {url}{Style.RESET_ALL}')
if print_summary:
print(result['summary'])
if i != len(results):
print()
def print_libraries(libraries):
for lib in sorted(libraries, key=itemgetter('title')):
title, library_id, running, description = \
lib['title'], lib['id'], lib['running'], lib['description']
color = Fore.GREEN if running else Fore.RED
print(f'{color}{title} ({library_id}) - {description}{Style.RESET_ALL}')
def get_libraries():
call_url = BASE_URL + 'libraries'
try:
response = requests.get(call_url, headers={
'Content-Type': 'application/json',
})
return json.loads(response.text), response.status_code
except:
return None, None
def query(library_id, url, k):
call_url = BASE_URL + f'library/{library_id}/query?query_type=url&k={k}'
try:
response = requests.post(
call_url, data=json.dumps({'doc': url}),
headers={
'Content-Type': 'application/json',
})
return json.loads(response.text), response.status_code
except:
return None, None
def check_valid_n(value):
ivalue = int(value)
if not 1 <= ivalue <= 20:
raise argparse.ArgumentTypeError(
f'{value} is invalid -- must be within 1..20')
return ivalue
def parse_arguments():
parser = argparse.ArgumentParser(description='KvasirA query tool.')
parser.add_argument('library', help='the document collection to query')
parser.add_argument('-u', '--url', help='the URL to query', default='')
parser.add_argument('-n', '--nresults', type=check_valid_n, default=10,
help='number of results')
parser.add_argument('-s', '--summary', dest='summary', action='store_true',
help='display summaries (default)')
parser.add_argument('--no-summary', dest='summary', action='store_false',
help='do not display summaries')
parser.set_defaults(summary=True)
return parser.parse_args()
def main():
args = parse_arguments()
libraries_response, libraries_status = get_libraries()
if libraries_response is None or libraries_status != 200:
print('Unable to fetch document libraries', file=sys.stderr)
return
libraries = libraries_response['data']
if args.library == 'libraries':
print_libraries(libraries)
return
match = next((lib for lib in libraries if lib['id'] == args.library), None)
if match is None:
print('Library {args.library} not found', file=sys.stderr)
return
query_response, query_status = query(match['id'], args.url, args.nresults)
if query_response is None:
print_query_error('Connection failed')
elif query_status != 200:
print_query_error(query_response['response'])
else:
print_query_success(query_response['response'], args.summary)
if __name__ == '__main__':
main()
Let’s save our script as kquery.py
. To get the available libraries, use python3 ./kquery.py libraries
:
Querying is easy: python3 ./kquery.py enwiki -u https://en.wikipedia.org/wiki/Merge_sort -n 3
:
If you don’t want to see the summaries, you can suppress them with the --no-summary
flag:
If you have a great use case for KvasirA in mind and need help integrating it into your own app, contact us at contact@kvasira.com!