Game Library Service / new module library demo

uckelman · June 9, 2024, 7:42pm

A demo version of the The Game Library Service along with a frontend is available now:

This is a demo. It is not current with the existing Module Library. (The demo uses a dump of the Module Library from a few weeks ago.)

I’m looking for feedback on design and systematic import problems. Please comment here.

Tim_M · June 9, 2024, 8:54pm

Displaying only 10 at a time is cumbersome. Even if I select all under letter “C” I have to click like 20 next pages to find the module im looking for…

Box covers aren’t working

Not sure why but there appears to be ‘gaps’ in the project numbers. Also what value do they provide? You can’t use that number for anything it appears…

Only some packages appear to be available…? (very few actually)

there is a spacing issue in projects where Owners are butted right up against the entry whereas packages seems to space info properly (one whitespace)

Are we dumping screenshots and comments?

How is this controlled?

More to come…

uckelman · June 9, 2024, 9:06pm

You can add a limit=n query parameter to try different quantities of results. I set the default to ten not because I thought it was the ideal amount, but because I wanted to prompt this conversation. What do you think would be better?

That’s because they also aren’t working in the wiki right now, which I hadn’t known. ~~No idea why not, yet.~~

Try now. The problem was that the SSL certificate for object storage expired in the past few minutes. Apparently the cron script which should have updated it in May didn’t do so. I ran the script manually to upload the new certificate; images should work again.

uckelman · June 9, 2024, 9:31pm

The gaps are due to some pages from the wiki which don’t import successfully yet.

The project numbers are really project names, which at present I’m generating by counting through the projects because I need to generate them somehow and didn’t want to mess with anything more complicated to start with. They don’t need to be numbers, but I don’t have a good way of generating project names from the data right now.

Would you give an example?

Images other than box images are imported but not displayed by the frontend yet.

What do you mean by “comments”?

Not sure what you mean here.

uckelman · June 9, 2024, 10:19pm

I added some spacing. Does it look better now?

Tim_M · June 10, 2024, 1:26am

You said -

“You can add a limit=n query parameter to try different quantities of results. I set the default to ten not because I thought it was the ideal amount, but because I wanted to prompt this conversation. What do you think would be better?”

So this is a case of where 10 would be an unsatisfactory limit. Think of a job search - it defaults to 10, but allows the user to show 20 , or “all” or whatever. That option needs to be available as a user dropdown interaction or they will be clicking next page for eternity and getting ticked off

Box covers do seem to be working now. I would add explicit limits.

Tim_M · June 10, 2024, 1:30am

OK - makes sense

Tim_M · June 10, 2024, 1:31am

It does look good now

Tim_M · June 10, 2024, 1:33am

Currently the library has a “Comments” section and a “Change Log” section where the user has basically provided update, reason, whatever for the latest change etc…

This does not seem to be present

Tim_M · June 10, 2024, 1:38am

Go to any project page with multiple modules/ extensions etc… and it seems only certain files are available - the rest are not. https://vassalengine.org/test/gl/projects/3333

Here is a good example. Where are the spanish versions? They are available on the current game page…

Tim_M · June 10, 2024, 1:51am

How are the pages created? How are they moderated? Doing this is might be a good time to reach out to Scott Alden and create a relationship that links to his db. Page creation could become as simple as a click with a question “BGG Link?”. Prevents page duplication if page already exists. We can expand metadata for multiple publishers, categories, genre descriptions etc instead of being one entry restricted.

A new page could become a very simple point and click fill gaps from a template and is live. no moderation ever needed if it doesn’t exist on the input question and so on

It is easily arguable to Scott that vassal does house the most number of games bar none. More than BGA, Tabletopia, TTS whatever combined, Its shear volume probably makes it one of the biggest platforms available - we could maybe make a benefit here for all

uckelman · June 10, 2024, 1:33pm

What’s an example that’s missing? (I’m going to ask for an example for everything, btw.)

RobS · June 10, 2024, 2:04pm

Tim’s “Comments” texts appear to be the same as the new “Readme” texts.
I’m not sure what Tim’s “Change Log” refers to, but maybe it’s the “Packages” text in the new version?

cholmcc · June 10, 2024, 10:31pm

Hi,

Some observations

It seems to me that the README sections are to be formatted as Markdown, which is great - somewhat easier to deal with than the MediaWiki format.

I took a look at the entry for Napoleon at Waterloo.

It seems to me that not all MediaWiki formatting has survived the translation to Markdown. For example headers (=== fubar ===) are put in verbatim, and tables (`{|…|}') are messed up too. I looks like you have used

pandoc -f mediawiki -t markdown input.mediawiki -o README.md

Perhaps do

pandoc -f mediawiki -t markdown-simple_tables input.mediawiki -o README.md

to salvage tables and headers. Of course, the {{GameInfo|...}} and {{ModuleFilesTable2}}...|} needs to be parsed out separately, as well as the <gallery>...</gallery> tag - which haven’t survived either. For example

#!/usr/bin/env python
import mvparserfromhell as mw

with open('NaW.mediawiki','r') as file:
     text = file.read()

code = mw.parse(text)
for t in code.filter_templates(recursive=False):
    if t.name in ['ModuleFilesTable2','ModuleVersion2','ModuleFile2','email','GameInfo']:
         code.replace(t,'') # Remove templates 

for t in code.filter_tags():
      if t.tag != 'gallery': continue 

      c = t.contents
      l = c.split('\n')
      tt = []
      for e in l:
          if 'e' == '': continue
          fields = e.split('|')
          img = fields[0].replace('Image:')
          alt   = '' if len(fields) < 2 else fields[1]
          tt.append(f'![{alg}]({img})')

     code.replace(t,'\n'.joint(tt))

with open('NaW.mediawiki.cleaned','w') as file:
     file.write(code.str()))

and then run pandoc on that parsed file (NaW.mediawiki.cleaned).

In other module pages, some modules have disappeared (see f.ex. Afrika Korps or Strike Force One), or the different versions have been messed up (see f.ex. D-Day (Smithsonian).

Searching for D-Day gives the error

ke: 500 Internal Server Error: error returned from database: (code: 1) no such column: Day

Just my 2¢

Yours,
Christian

uckelman · June 10, 2024, 10:55pm

I’ve made some progress on repairing the headers.

cholmcc · June 11, 2024, 7:43am

Hi again,

Again looking at Napoleon at Waterloo I see you fixed up the header thing. However, the tables are formatted as HTML rather than Markdown.

Are you “rolling your own” parsing? If so, why? It seems much more reasonable to me to use existing tools such as pandoc, possibly with some pre-parsing in Python (as in my previous message) or the like. I guess you are splitting the Wiki pages into several database tables - one for “packages” (shouldn’t it be “modules”?), one for README.md, and so on.

Perhaps something like the below would do most of what you need:

#!/usr/bin/env python
from mwparserfromhell import parse
from json import dumps

def do_gameinfo(code):
    gi = code.filter_templates(matches=lambda n : n.name=='GameInfo')
    if not gi or len(gi) < 1:
        raise RuntimeError('No GameInfo')

    gi = gi[0]

    ret = {k: str(gi.get(k)) for k in
           ['image',
            'publisher',
            'year',
            'era',
            'topic',
            'series',
            'scale',
            'players',
            'length']
           if gi.has(k)}

    code.replace(gi,'')

    return ret

def do_emails(text):
    main = parse(text)

    eml = main.filter_templates()
    return [{'name': str(e.params[1]),
             'address': str(e.params[0])}
            for e in eml]
    

def do_modules(code):
    names = ['ModuleFilesTable2',
             'ModuleVersion2',
             'ModuleFile2']
    tmpl = code.filter_templates(matches=lambda n : n.name in names,
                                 recursive=False)

    tab = None
    cur = None
    for tm in tmpl:
        if tm.name == names[0]:
            tab = {}
            continue

        if tab is None:
            raise RuntimeError(f'{tm.name} seen before {names[0]}')


        if tm.name == names[1]:
            cur  = {}
            tab[str(tm.get('version'))] = cur
            continue

        if cur is None:
            raise RuntimeError(f'No current version')

        cur.update({k: str(tm.get(k)) for k in
                    ['filename',
                     'decription',
                     'date',
                     'size',
                     'compatibility']
                    if tm.has(k)})

        cur['maintainers'] = do_emails(str(tm.get('maintainer'))
                                       if tm.has('maintainer') else '')
        cur['contributors'] = do_emails(str(tm.get('contributors')
                                            if tm.has('contributors') else ''))

    for tm in tmpl:
        code.replace(tm, '')

    return tab

def do_gallery(code):
    tags = code.filter_tags(matches = lambda n: n.tag == 'gallery')

    if not tags:
        return []

    def extract(e):
        fields = e.split('|')
        img    = fields[0].replace('Image:','')
        alt    = '' if len(fields) < 2 else fields[1]

        return {'img': img, 'alt': alt}
            
    ret = [
        extract(e)
        for tag in tags
        for e in tag.contents.split('\n')
        if e != ''
    ]
    for tag in tags:
        code.replace(tag, '')

    return ret

def do_players(code):
    tags = code.filter_tags(matches = lambda n: n.tag == 'div')

    if not tags:
        return []

    ret = [
        do_emails(tag.contents) for tag in tags
        if tag.contents != ''
    ]

    for tag in tags:
        code.replace(tag, '')

    return ret

def do_readme(code):
    from tempfile import mkstemp
    from subprocess import Popen, PIPE
    from os import unlink
    

    tmp, tmpnam = mkstemp(text=True)
    with open(tmp,'w') as tmpfile:
        tmpfile.write(str(code))

    cmd = ['pandoc',
           '--from', 'mediawiki',
           '--to', 'markdown-simple_tables',
           tmpnam]
    out,err = Popen(cmd, stdout=PIPE,stderr=PIPE).communicate()

    unlink(tmpnam)

    return out.decode().replace(r'\|}','')

def convert(inp,md,js):
    
    text = inp.read()

    code     = parse(text)
    gameinfo = do_gameinfo(code)
    modules  = do_modules(code)
    gallery  = do_gallery(code)
    players  = do_players(code)
    readme   = do_readme(code)
    
    game     = {'info': gameinfo,
                'modules': modules,
                'gallery': gallery,
                'players': players }

    js.write(dumps(game,indent=2))
    md.write(readme)

    

if __name__ == '__main__':
    from argparse import ArgumentParser, FileType

    ap = ArgumentParser(description='Convert')
    ap.add_argument('input',type=FileType('r'),
                    help='Input media wiki')
    ap.add_argument('readme',type=FileType('w'),
                    help='Output markdow')
    ap.add_argument('json',type=FileType('w'),
                    help='Output JSON')

    args = ap.parse_args()

    convert(args.input,args.readme,args.json)

Give an input MediaWiki file and two output files - the Markdown README and a JSON for the game data.

Yours,
Christian

uckelman · June 11, 2024, 11:03am

That is what I’m doing.

Converting entire pages to markdown isn’t desirable because many sections (e.g., what are in the wiki now as the GameInfo, ModuleContactInfo, and ModuleFile templates) aren’t needed as markdown and they’re already in a regular format in the input.

The code is here.

cholmcc · June 11, 2024, 12:04pm

I guess you didn’t really look at what the code I posted actually does, because it does not convert the entire MediaWiki page to Markdown - only the text part.

Suppose you have a wiki page in say NaW.mediawiki, then the code

parses that using mwparserfromhell
Extracts game information from the {{GameInfo}} template
Extracts module information from the {{ModuleFilesTable2}} (and {{ModuleVersion2}} and {{ModuleFile2}}) template(s) (this should of course also handle the older {{ModuleFilesTable}} template).
Extracts the gallery information from the <gallery>...</gallary> tag
Extract the player information from the <div>...</div> tag
These pieces of information (game info, module info, gallery, players) is then written to a JSON file, and the corresponding content removed from the MediaWiki text.
The trimmed-down MediaWiki text is then passed to pandoc to produce a Markdown document which is written to disk.

The point is, that mwparserfromhell knows how to handle templates, and pandoc knows how to turn the rest into Markdown without us needed to code a whole lot ourselves.

A lot of your code could be replaced by what I posted in the previous message - i.e., by using Python library functions (mwparserfromhell) and passing the massage text to pandoc. Of course, the code posted assumes you have individual files of each MediaWiki page, at least if it was to be run as a standalone program, but it is easy enough to use the various functions do_... from some other script.

I think it could be beneficial for you to do the conversion from MediaWiki to SQLite DB in a few steps rather than in a single app. For example,

download all MediaWiki pages locally
Parse out game, module, etc. info as JSON, and make README.md from each page
Populate SQLite database from the written JSON files and README.mds

BTW

foo  = 'foo'
Foo = foo.captilize()
assert Foo == 'Foo'

I see you use pypandoc. Perhaps consider to disable simple_tables with that, because they rarely come out good.

Yours,
Christian

uckelman · June 11, 2024, 12:13pm

Would you explain why?

uckelman · June 11, 2024, 12:16pm

>>> 'aBC'.capitalize()
'Abc'

That’s why I want only the first character.