Game Library Service / new module library demo

A demo version of the The Game Library Service along with a frontend is available now:

This is a demo. It is not current with the existing Module Library. (The demo uses a dump of the Module Library from a few weeks ago.)

I’m looking for feedback on design and systematic import problems. Please comment here.

2 Likes

Displaying only 10 at a time is cumbersome. Even if I select all under letter “C” I have to click like 20 next pages to find the module im looking for…

Box covers aren’t working

Not sure why but there appears to be ‘gaps’ in the project numbers. Also what value do they provide? You can’t use that number for anything it appears…

Only some packages appear to be available…? (very few actually)

there is a spacing issue in projects where Owners are butted right up against the entry whereas packages seems to space info properly (one whitespace)

Are we dumping screenshots and comments?

How is this controlled?

More to come…

1 Like

You can add a limit=n query parameter to try different quantities of results. I set the default to ten not because I thought it was the ideal amount, but because I wanted to prompt this conversation. What do you think would be better?

That’s because they also aren’t working in the wiki right now, which I hadn’t known. No idea why not, yet.

Try now. The problem was that the SSL certificate for object storage expired in the past few minutes. Apparently the cron script which should have updated it in May didn’t do so. I ran the script manually to upload the new certificate; images should work again.

The gaps are due to some pages from the wiki which don’t import successfully yet.

The project numbers are really project names, which at present I’m generating by counting through the projects because I need to generate them somehow and didn’t want to mess with anything more complicated to start with. They don’t need to be numbers, but I don’t have a good way of generating project names from the data right now.

Would you give an example?

Images other than box images are imported but not displayed by the frontend yet.

What do you mean by “comments”?

Not sure what you mean here.

I added some spacing. Does it look better now?

You said -

“You can add a limit=n query parameter to try different quantities of results. I set the default to ten not because I thought it was the ideal amount, but because I wanted to prompt this conversation. What do you think would be better?”

So this is a case of where 10 would be an unsatisfactory limit. Think of a job search - it defaults to 10, but allows the user to show 20 , or “all” or whatever. That option needs to be available as a user dropdown interaction or they will be clicking next page for eternity and getting ticked off :slight_smile:

Box covers do seem to be working now. I would add explicit limits.

1 Like

OK - makes sense

It does look good now

Currently the library has a “Comments” section and a “Change Log” section where the user has basically provided update, reason, whatever for the latest change etc…

This does not seem to be present

Go to any project page with multiple modules/ extensions etc… and it seems only certain files are available - the rest are not. https://vassalengine.org/test/gl/projects/3333

Here is a good example. Where are the spanish versions? They are available on the current game page…

How are the pages created? How are they moderated? Doing this is might be a good time to reach out to Scott Alden and create a relationship that links to his db. Page creation could become as simple as a click with a question “BGG Link?”. Prevents page duplication if page already exists. We can expand metadata for multiple publishers, categories, genre descriptions etc instead of being one entry restricted.

A new page could become a very simple point and click fill gaps from a template and is live. no moderation ever needed if it doesn’t exist on the input question and so on

It is easily arguable to Scott that vassal does house the most number of games bar none. More than BGA, Tabletopia, TTS whatever combined, Its shear volume probably makes it one of the biggest platforms available - we could maybe make a benefit here for all

1 Like

What’s an example that’s missing? (I’m going to ask for an example for everything, btw.)

Tim’s “Comments” texts appear to be the same as the new “Readme” texts.
I’m not sure what Tim’s “Change Log” refers to, but maybe it’s the “Packages” text in the new version?

Hi,

Some observations

  • It seems to me that the README sections are to be formatted as Markdown, which is great - somewhat easier to deal with than the MediaWiki format.

I took a look at the entry for Napoleon at Waterloo.

It seems to me that not all MediaWiki formatting has survived the translation to Markdown. For example headers (=== fubar ===) are put in verbatim, and tables (`{|…|}') are messed up too. I looks like you have used

pandoc -f mediawiki -t markdown input.mediawiki -o README.md 

Perhaps do

pandoc -f mediawiki -t markdown-simple_tables input.mediawiki -o README.md 

to salvage tables and headers. Of course, the {{GameInfo|...}} and {{ModuleFilesTable2}}...|} needs to be parsed out separately, as well as the <gallery>...</gallery> tag - which haven’t survived either. For example

#!/usr/bin/env python
import mvparserfromhell as mw

with open('NaW.mediawiki','r') as file:
     text = file.read()

code = mw.parse(text)
for t in code.filter_templates(recursive=False):
    if t.name in ['ModuleFilesTable2','ModuleVersion2','ModuleFile2','email','GameInfo']:
         code.replace(t,'') # Remove templates 

for t in code.filter_tags():
      if t.tag != 'gallery': continue 

      c = t.contents
      l = c.split('\n')
      tt = []
      for e in l:
          if 'e' == '': continue
          fields = e.split('|')
          img = fields[0].replace('Image:')
          alt   = '' if len(fields) < 2 else fields[1]
          tt.append(f'![{alg}]({img})')

     code.replace(t,'\n'.joint(tt))

with open('NaW.mediawiki.cleaned','w') as file:
     file.write(code.str()))

and then run pandoc on that parsed file (NaW.mediawiki.cleaned).

In other module pages, some modules have disappeared (see f.ex. Afrika Korps or Strike Force One), or the different versions have been messed up (see f.ex. D-Day (Smithsonian).

Searching for D-Day gives the error

ke: 500 Internal Server Error: error returned from database: (code: 1) no such column: Day 

Just my 2¢

Yours,
Christian

I’ve made some progress on repairing the headers.

Hi again,

Again looking at Napoleon at Waterloo I see you fixed up the header thing. However, the tables are formatted as HTML rather than Markdown.

Are you “rolling your own” parsing? If so, why? It seems much more reasonable to me to use existing tools such as pandoc, possibly with some pre-parsing in Python (as in my previous message) or the like. I guess you are splitting the Wiki pages into several database tables - one for “packages” (shouldn’t it be “modules”?), one for README.md, and so on.

Perhaps something like the below would do most of what you need:

#!/usr/bin/env python
from mwparserfromhell import parse
from json import dumps

def do_gameinfo(code):
    gi = code.filter_templates(matches=lambda n : n.name=='GameInfo')
    if not gi or len(gi) < 1:
        raise RuntimeError('No GameInfo')

    gi = gi[0]

    ret = {k: str(gi.get(k)) for k in
           ['image',
            'publisher',
            'year',
            'era',
            'topic',
            'series',
            'scale',
            'players',
            'length']
           if gi.has(k)}

    code.replace(gi,'')

    return ret

def do_emails(text):
    main = parse(text)

    eml = main.filter_templates()
    return [{'name': str(e.params[1]),
             'address': str(e.params[0])}
            for e in eml]
    

def do_modules(code):
    names = ['ModuleFilesTable2',
             'ModuleVersion2',
             'ModuleFile2']
    tmpl = code.filter_templates(matches=lambda n : n.name in names,
                                 recursive=False)

    tab = None
    cur = None
    for tm in tmpl:
        if tm.name == names[0]:
            tab = {}
            continue

        if tab is None:
            raise RuntimeError(f'{tm.name} seen before {names[0]}')


        if tm.name == names[1]:
            cur  = {}
            tab[str(tm.get('version'))] = cur
            continue

        if cur is None:
            raise RuntimeError(f'No current version')

        cur.update({k: str(tm.get(k)) for k in
                    ['filename',
                     'decription',
                     'date',
                     'size',
                     'compatibility']
                    if tm.has(k)})

        cur['maintainers'] = do_emails(str(tm.get('maintainer'))
                                       if tm.has('maintainer') else '')
        cur['contributors'] = do_emails(str(tm.get('contributors')
                                            if tm.has('contributors') else ''))

    for tm in tmpl:
        code.replace(tm, '')

    return tab

def do_gallery(code):
    tags = code.filter_tags(matches = lambda n: n.tag == 'gallery')

    if not tags:
        return []

    def extract(e):
        fields = e.split('|')
        img    = fields[0].replace('Image:','')
        alt    = '' if len(fields) < 2 else fields[1]

        return {'img': img, 'alt': alt}
            
    ret = [
        extract(e)
        for tag in tags
        for e in tag.contents.split('\n')
        if e != ''
    ]
    for tag in tags:
        code.replace(tag, '')

    return ret

def do_players(code):
    tags = code.filter_tags(matches = lambda n: n.tag == 'div')

    if not tags:
        return []

    ret = [
        do_emails(tag.contents) for tag in tags
        if tag.contents != ''
    ]

    for tag in tags:
        code.replace(tag, '')

    return ret

def do_readme(code):
    from tempfile import mkstemp
    from subprocess import Popen, PIPE
    from os import unlink
    

    tmp, tmpnam = mkstemp(text=True)
    with open(tmp,'w') as tmpfile:
        tmpfile.write(str(code))

    cmd = ['pandoc',
           '--from', 'mediawiki',
           '--to', 'markdown-simple_tables',
           tmpnam]
    out,err = Popen(cmd, stdout=PIPE,stderr=PIPE).communicate()

    unlink(tmpnam)

    return out.decode().replace(r'\|}','')

def convert(inp,md,js):
    
    text = inp.read()

    code     = parse(text)
    gameinfo = do_gameinfo(code)
    modules  = do_modules(code)
    gallery  = do_gallery(code)
    players  = do_players(code)
    readme   = do_readme(code)
    
    game     = {'info': gameinfo,
                'modules': modules,
                'gallery': gallery,
                'players': players }

    js.write(dumps(game,indent=2))
    md.write(readme)

    

if __name__ == '__main__':
    from argparse import ArgumentParser, FileType

    ap = ArgumentParser(description='Convert')
    ap.add_argument('input',type=FileType('r'),
                    help='Input media wiki')
    ap.add_argument('readme',type=FileType('w'),
                    help='Output markdow')
    ap.add_argument('json',type=FileType('w'),
                    help='Output JSON')

    args = ap.parse_args()

    convert(args.input,args.readme,args.json)

Give an input MediaWiki file and two output files - the Markdown README and a JSON for the game data.

Yours,
Christian

That is what I’m doing.

Converting entire pages to markdown isn’t desirable because many sections (e.g., what are in the wiki now as the GameInfo, ModuleContactInfo, and ModuleFile templates) aren’t needed as markdown and they’re already in a regular format in the input.

The code is here.

I guess you didn’t really look at what the code I posted actually does, because it does not convert the entire MediaWiki page to Markdown - only the text part.

Suppose you have a wiki page in say NaW.mediawiki, then the code

  • parses that using mwparserfromhell
  • Extracts game information from the {{GameInfo}} template
  • Extracts module information from the {{ModuleFilesTable2}} (and {{ModuleVersion2}} and {{ModuleFile2}}) template(s) (this should of course also handle the older {{ModuleFilesTable}} template).
  • Extracts the gallery information from the <gallery>...</gallary> tag
  • Extract the player information from the <div>...</div> tag
  • These pieces of information (game info, module info, gallery, players) is then written to a JSON file, and the corresponding content removed from the MediaWiki text.
  • The trimmed-down MediaWiki text is then passed to pandoc to produce a Markdown document which is written to disk.

The point is, that mwparserfromhell knows how to handle templates, and pandoc knows how to turn the rest into Markdown without us needed to code a whole lot ourselves.

A lot of your code could be replaced by what I posted in the previous message - i.e., by using Python library functions (mwparserfromhell) and passing the massage text to pandoc. Of course, the code posted assumes you have individual files of each MediaWiki page, at least if it was to be run as a standalone program, but it is easy enough to use the various functions do_... from some other script.

I think it could be beneficial for you to do the conversion from MediaWiki to SQLite DB in a few steps rather than in a single app. For example,

  • download all MediaWiki pages locally
  • Parse out game, module, etc. info as JSON, and make README.md from each page
  • Populate SQLite database from the written JSON files and README.mds

BTW

foo  = 'foo'
Foo = foo.captilize()
assert Foo == 'Foo'

I see you use pypandoc. Perhaps consider to disable simple_tables with that, because they rarely come out good.

Yours,
Christian

Would you explain why?

>>> 'aBC'.capitalize()
'Abc'

That’s why I want only the first character.