-
-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically extract information from http://minecraft.gamepedia.com/ ? #229
Comments
An alternative to scraping a wiki is to install debug statements into the game itself. That would be guaranteed to be 100% correct and complete (at least for the mechanical data like id numbers), but it relies on the Minecraft Coder Pack project being caught up to the latest version of Minecraft. I can't really find any authoritative information on MCP anymore; I wonder if that project is still alive. |
I think the official site of MCP is there http://www.modcoderpack.com/website/releases . So I think whatever ways we can extract these infos automatically is fine. |
Relying on MCP is a bad idea. The project seems very volatile, sadly. A bukkit or forge plugin could also extract information and would seem more stable. |
I thought Forge was built on MCP. Maybe it used to be? If Forge works with 1.8.3, then that seems like the way to go. What seems so attractive about a mod/plugin is that all the heavy data comes straight from Mojang. The only thing the community provides in this case is a scraping tool. The wiki is community maintained, and might be wrong. Bukkit is community maintained and might be wrong. The downside of scraping the minecraft binary itself is that you don't always get very good string names and descriptions. Perhaps scraping would only be appropriate for recipes and a sanity check list of id numbers. |
Forge is built on MCP, but public builds of MCP take longer and longer to get released. Bukkit is based on mojang's minecraft server, it can hardly be wrong. They use a similar technique as MCP, but do it themselves. |
Bukkit currently doesn't know about Granite: https://github.com/Bukkit/Bukkit/search?utf8=%E2%9C%93&q=granite (contrast with: https://github.com/Bukkit/Bukkit/search?utf8=%E2%9C%93&q=acacia ) Bukkit, like the wiki, is supposed to be kept up to date by the community. This makes it inherently less trustworthy than the actual data in the notchian game itself, which we know must be right at all times by definition. A Forge plugin still seems like the most reliable solution to me at this point. |
This is the wrong repo. Bukkit repo's last commit is in 2014 august. Spigot is still up-to-date and does know about granite, prismarine, etc... |
Oh ok. Where do we get the current source? Or are you proposing we write a Bukkit plugin to dump the data from the Bukkit runtime binary? |
Yes, that's what I was proposing. A forge plugin works too though. Current source is closed due to the DMCA stuff |
I started fixing the recipes extractor. I think there are many other such errors, that's why some kind of automatic extractor is needed for this. I will still update the recipes but it won't be perfect until we have an extractor for the blocks and the items (the recipes extractor depend on having correct items.json and blocks.json) |
…ecipes with that. Also put the output file in the arguments of the file instead of printing to stdout. I used merge_recipes.js so recipes aren't changed, just added. blocks.json and items.json aren't fully updated (see #229) so some recipes are probably still missing.
I'm currently extracting from the html of http://minecraft.gamepedia.com/Crafting#Complete_recipe_list but it's not very reliable (or easy). The wiki source is generally much easier to parse than the html, and it might be possible to parse the items and blocks information from it (see the source of the infobox there http://minecraft.gamepedia.com/index.php?title=Andesite&action=edit) Edit: apparently the complete list is generated with a script like that http://minecraft.gamepedia.com/Module:Recipe_list , this might be useful Edit2: there's a "Pocket Edition only" or "Console edition only" note on some of the recipes, check that on the script (and remove the recipes that shouldn't have been added if needed) |
"trapdoor" is the unlocationed name from the notchian client. I have checked all block that could have changed |
@Kupferhirn "name": "trapdoor", is ok , the problem is "displayName": "Trapdoor", I don't have it right now, but I'll put here a list of blocks/items with problems tonight if that can be useful. |
So I found out a bit more about these recipe-related scripts : |
The recipes of the furnace are there http://minecraft.gamepedia.com/Smelting For the brewing stand : http://minecraft.gamepedia.com/Brewing see http://minecraft.gamepedia.com/Template:Grid#Other_templates for various grid-related pages. |
this should somehow go in https://github.com/PrismarineJS/minecraft-data |
I think I might just start by making a script to get the wiki source of everything on the wiki, because there is a lot of information on it, not just recipes. |
Or have said info hosted on a new repo,aND get it to draw info from it |
@pokeball99 that's already done there but we still need to extract minecraft info to put it in minecraft-data ;) |
Ok, this issue PrismarineJS/minecraft-data#8 tracks the progress for the wiki extraction. Closing this issue. |
http://minecraft.gamepedia.com/ is a really complete reference on many things on minecraft.
There are already some scripts (https://github.com/andrewrk/mineflayer/blob/master/bin/transform1_recipes.js for example, currently broken though) to extract the recipes from that wiki.
And I think we could extract more things, for example everything that's on the infobox (see http://minecraft.gamepedia.com/Rabbit%27s_Foot vs https://github.com/andrewrk/mineflayer/blob/master/lib/enums/items.json#L950 )
I'm not sure this can really be applied here, but http://dbpedia.org/ has a really good framework to extract information from wikipedia infoboxes and the infoboxes from http://minecraft.gamepedia.com/ just look like the ones from Wikipedia so that might be interesting to look into.
@Kupferhirn has extracted the items manually from that wiki (see #227) and that's nice, but doing the same thing automatically would be really nice.
Edit: well I think the extraction framework of DBpedia is probably way to big for that, doing some simple scripts would be easier.
The text was updated successfully, but these errors were encountered: