Skip to content

Conversation

@ardunn
Copy link
Contributor

@ardunn ardunn commented Jan 13, 2022

@sgbaird could you answer the following for me?

  1. Is it ok to upload a .json.gz version of this dataset to our figshare so there is a persistent resource for download?
  2. Would it be possible to include any or all of the following as additional columns:
  • Original source reference (doi and/or bibtex preferred)
  • Spacegroup number or HM symbol
  • Any other pertinent sample descriptors that might have been recorded in earlier versions of the dataset

Also go ahead and look over the metadata addition and make sure the name/description are correct and complete (and feel free to make any needed revisions).

Current data looks like this:

                composition  hardness  load
0     Ag0.05Gd0.048Pd0.902      1.810  0.49
1      Ag0.05Y0.048Pd0.902      1.640  0.49
2       Ag0.25Pb0.5Sb0.25Te     0.578  2.94
3       Al1.5Si1.5N2.5O1.5     15.030  0.98
4                 Al1.67B22    23.800  2.00
                     ...       ...   ...
1057                  ZrV2      3.600  0.98
1058                   ZrW2    11.100  0.98
1059                   OsB2    34.800  0.25
1060                   OsB2    27.000  0.49
1061                   OsB2    17.800  1.96

Once we figure out the above questions I'll upload the final version to figshare and the tests should pass.

@sgbaird
Copy link

sgbaird commented Jan 13, 2022

@ardunn I appreciate you taking the initiative on this! I'm actually not affiliated with the group and had a similar question about the LICENSE.

@ziyan1996 @BrgochGroup do you mind taking a look at these questions? As some background, I suggested the dataset in https://github.com/ziyan1996/VickersHardnessPrediction as an addition to the 13 materials benchmarking tasks in the next version of matbench. I think this would both be a useful dataset for people to use for benchmarking their materials informatics model as well as give additional limelight for the large data mining effort.

@ardunn
Copy link
Contributor Author

ardunn commented Jan 25, 2022

@sgbaird it might be worth for us to email them, perhaps they are not on github as actively as we are

@sgbaird
Copy link

sgbaird commented Feb 5, 2022

@ziyan1996 mentioned being in the middle of job interviews, so it might take some time. Something else worth mentioning is the hardness dataset available through MPDS, but unfortunately, it doesn't seem to have applied load information (mpds-io/mpds-api#38). See also matsci post.

EDIT: based on some initial tests, without knowing the applied load, it doesn't seem like the MPDS dataset has very much predictive potential (even when predicting the mean hardness for a given composition).

@ardunn
Copy link
Contributor Author

ardunn commented Feb 8, 2022

Ok, that is fine since I'm sure they are busy. Let's maybe wait a month or two and re-evaluate then

@ardunn ardunn merged commit 63d9b5f into hackingmaterials:main Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants