diablo2.io is supported by ads

Screenshot scraping

11 replies 1650 views

Description

Hi,

reading the trading rules about screenshots, it would be nice if you could upload a fullscreen screenshot with item stats and:

simple version: would autocrop the screenshot
advanced version: autocrop + read item properties

I think it would also be good if you could select the item you want to sell from a searchable list and have inputs for all properties that are variable (then the advanced option would fill that out).

I can help with implementation, I hate PHP, but the autocrop/read properties functionality should be possible on the front-end (which would also result in lower server load than doing it on the back-end) - to be honest I'll probably make an open-source Web Component for it, I could help with integration then.

Can be used to make Runewords:

klh `56`

4 years ago (pre-Resurrected)

Hi,

reading the trading rules about screenshots, it would be nice if you could upload a fullscreen screenshot with item stats and:

simple version: would autocrop the screenshot
advanced version: autocrop + read item properties

Noemard `92`

4 years ago (pre-Resurrected)

We've have a thread related to this (second half of forums/screenshots-of-drops-t3550.html). @Sabcoll and @mengstrom have each trained a tesseract OCR model to read d2 screenshots to get the text. I started as well but didn't have time to finish, but made a demo web app using tesseract.js and the generic english model.

It's remains to be determined what the performance is on a full uncropped screenshot as well as the performance of using a model trained for the diablo font. It might be necessary to crop beforehand depending on performance, at least roughly.

Once it is read, Tesseract will include the coordinates of the bounding box around the text it reads, so it should be pretty easy to get a properly cropped version of the original within a couple px of what a human would do.

Sabcoll `111`

4 years ago (pre-Resurrected)

I was really busy at work and couldn't finish my work yet, however I'm still planning to work on it tho.

If I can trady with pictures uploaded into the trade databse, that is more or less enough for me at the moment.
But everyone developing nice applications are highly appreciated!

D2 & LoD Veteran
Twitch: chr_isso
Youtube: chr_isso
If you like my posts & content, I would really appreciate a follow on Youtube & Twitch!
Ex InDiablo.de Staff
Greatest Find:

Arkaine's Valor 1.08

klh `56`

4 years ago (pre-Resurrected)

Oh, I just found the forum search - I just browsed through the suggestions earlier to see if someone suggested that.

I wanted to start with OpenCV for cropping and use simple template matching to detect text, because:

they are screenshots, so there won't be much perspective variability / smudges / missing pixels
if it detects the item type I can better predict attribute values (eg. if it detects a Spirit runeword and for some reason gives me 96% accuracy for +23 FCR and 95% for +25 FCR I know the second one is correct) - not sure if it will be necessary, but in my experience with Tesseract it always had problems with non-static backgrounds

I'll probably do it in Rust since it has much better performance compared to JS.

Of course if we got proper modding support in D2R then you could simply Ctrl+C the item in-game and paste here ;p (I think PD2 does that?)
Come to think of it, it may be possible without modding since you will be able to post items to chat - maybe they have proper clipboard support for that.

EDIT:
nvm with Tesseract problems, a simple threshold removes any trace of items underneath:

Noemard `92`

4 years ago (pre-Resurrected)

klh wrote: 4 years ago
they are screenshots, so there won't be much perspective variability / smudges / missing pixels

Hopefully, but I image that would be best case scenario. Teebling made a good point that many people use a program for screenshots, some automatically do compression for manual/automatic upload to cloud destinations, so might not always be as crisp as rendered in the game

klh wrote: 4 years ago
I'll probably do it in Rust since it has much better performance compared to JS.

Definitely would, but the reason for using javascript was to be able to do it client side so the user doesn't have to upload anything. Not sure if Teebling wants to maintain additional backend processes, he seemed to allude in the other thread, if it's gonna happen, it needs to be using the existing stack. If everything can be properly parsed on the client side, there isn't really a need to store the game image anyway, it can all be generated and look relatively close, using the item images that may already be cached in the users browser.

klh `56`

4 years ago (pre-Resurrected)

Noemard wrote: 4 years ago
but the reason for using javascript was to be able to do it client side so the user doesn't have to upload anything

Rust compiles to WASM and runs on the browser too

Although looks like OpenCV is problematic to use with Rust in WASM, so I'll probably stick with JS (performance improvement wouldn't be that great anyway since OpenCV runs in WASM already - I just like to use Rust whenever possible).

Noemard wrote: 4 years ago
Hopefully, but I image that would be best case scenario. Teebling made a good point that many people use a program for screenshots, some automatically do compression for manual/automatic upload to cloud destinations, so might not always be as crisp as rendered in the game

I'll have to check that, usually compression noise disappears after thresholding and it's going to work with a dictionary, so a few errors shouldn't break the algorithm.

I mainly want to avoid tesseract since it's very slow.

Noemard `92`

4 years ago (pre-Resurrected)

Cool, I didn't consider you meant Rust compiling to WASM, just wanted to make sure you didn't implement something purely in Rust that read from an image file when it seems (as I understand it at least) that Teeb wouldn't want to use it that way.

Yeah I haven't tested tesseract.js with a diablo font, but it was relatively slow with the default english training data. I assume performance would increase with a better trained model, but wasn't sure. Would definitely be interesting to see the results of a different approach using OpenCV. Although either way, assuming it's only a few seconds I imagine people would still use it rather doing everything completely manually.

klh `56`

4 years ago (pre-Resurrected)

Noemard wrote: 4 years ago
I assume performance would increase with a better trained model, but wasn't sure.

Sadly training primarily increases accuracy, I never noticed a speed increase. Maybe training with a limited number of allowed words would help, but I never tried that.

EDIT 09/16/21:
Tried a few things, nothing looks good enough to me. I'll try with clipboard data after release before spending more time on OCR.

Here is where I stopped, basic text detection without using the 60Mb EAST model.
This is the image after processing (basically threshold, blur slightly, erode with a 2x2 kernel, dilate with a 2x3 kernel to join letters, threshold again - this screenshot is before the final threshold):

Final result:

Red outline is the contours found on the processed image, green rectangles are the possible text locations after applying some rules (minimum area, aspect ratio).

It's neither faster nor better than EAST, but I'd rather avoid making users download too much data. This was tested on a variety of images with different compression artifacts etc and worked on all of them (ofc a set of 10 is not good enough, would have to test further). Detection on a 1080p screenshot takes ~450ms.

Noemard `92`

4 years ago (pre-Resurrected)

Very cool. Nice work @klh. Is the the 450ms just for detection of potential text or also include an attempt to read the text?

Sabcoll `111`

4 years ago (pre-Resurrected)

klh wrote: 4 years ago
Oh, I just found the forum search - I just browsed through the suggestions earlier to see if someone suggested that.

I wanted to start with OpenCV for cropping and use simple template matching to detect text, because:

they are screenshots, so there won't be much perspective variability / smudges / missing pixels

if it detects the item type I can better predict attribute values (eg. if it detects a Spirit runeword and for some reason gives me 96% accuracy for +23 FCR and 95% for +25 FCR I know the second one is correct) - not sure if it will be necessary, but in my experience with Tesseract it always had problems with non-static backgrounds

I'll probably do it in Rust since it has much better performance compared to JS.

Of course if we got proper modding support in D2R then you could simply Ctrl+C the item in-game and paste here ;p (I think PD2 does that?)
Come to think of it, it may be possible without modding since you will be able to post items to chat - maybe they have proper clipboard support for that.

EDIT:
nvm with Tesseract problems, a simple threshold removes any trace of items underneath:

can you post your threshold values please?
I'm also working with openCV and I'm curious which methods you use for thresholding.

D2 & LoD Veteran
Twitch: chr_isso
Youtube: chr_isso
If you like my posts & content, I would really appreciate a follow on Youtube & Twitch!
Ex InDiablo.de Staff
Greatest Find:

Arkaine's Valor 1.08

klh `56`

4 years ago (pre-Resurrected)

@Noemard detection only and there is room for optimization, no need to work on full size images - they could be scaled down to speed it up.

@Sabcoll here is the sandbox with code - it's a big mess though

(also includes attempts at template matching for OCR and Tesseract with a scheduler - though that has to be moved to workers to be of any use and the idea is to combine the detected regions into X images, where X is the number of workers, so it would be one job per worker instead of a few hundred).

The best value I found for threshold is 75 with THRESH_BINARY.

klh `56`

4 years ago (pre-Resurrected)

Sadly when you copy the item shared in chat you only get the name, and looks like Blizz learned from WoW and D3 and you can't edit it to copy attributes.