-
Notifications
You must be signed in to change notification settings - Fork 208
Description
Sorted by earliest year of reference, limited to experimental entries with fewer than 52 sites: https://figshare.com/articles/dataset/Materials_Project_Time_Split_Data/19991516
How does this seem in terms of a matminer dataset contribution? See How do I do a time-split of Materials Project entries? e.g. pre-2018 vs. post-2018 and sparks-baird/xtal2png#12 (comment) for additional context. Starting to feel like I'm reinventing the wheel by trying to host it myself.
In my own code, I've been running into a strange issue where if I use:
Lines 89 to 177 in 76a529b
| def load_dataframe_from_json(filename, pbar=True, decode=True): | |
| """Load pandas dataframe from a json file. | |
| Automatically decodes and instantiates pymatgen objects in the dataframe. | |
| Args: | |
| filename (str): Path to json file. Can be a compressed file (gz and bz2) | |
| are supported. | |
| pbar (bool): If true, shows an ASCII progress bar for loading data from disk. | |
| decode (bool): If true, will automatically decode objects (slow, convenient). | |
| If false, will return json representations of the objects (fast, inconvenient). | |
| Returns: | |
| (Pandas.DataFrame): A pandas dataframe. | |
| """ | |
| # Progress bar for reading file with hook | |
| pbar1 = tqdm(desc=f"Reading file {filename}", position=0, leave=True, ascii=True, disable=not pbar) | |
| def is_monty_object(o): | |
| """ | |
| Determine if an object can be decoded into json | |
| by monty. | |
| Args: | |
| o (object): An object in dict-form. | |
| Returns: | |
| (bool) | |
| """ | |
| if isinstance(o, dict) and "@class" in o: | |
| return True | |
| else: | |
| return False | |
| def pbar_hook(obj): | |
| """ | |
| A hook for a pbar reading the raw data from json, not | |
| using monty decoding to decode the object. | |
| Args: | |
| obj (object): A dict-like | |
| Returns: | |
| obj (object) | |
| """ | |
| if is_monty_object(obj): | |
| pbar1.update(1) | |
| sys.stderr.flush() | |
| return obj | |
| # Progress bar for decoding objects | |
| pbar2 = tqdm(desc=f"Decoding objects from {filename}", position=0, leave=True, ascii=True, disable=not pbar) | |
| class MontyDecoderPbar(MontyDecoder): | |
| """ | |
| A pbar-friendly version of MontyDecoder. | |
| """ | |
| def process_decoded(self, d): | |
| if isinstance(d, dict) and "data" in d and "index" in d and "columns" in d: | |
| # total number of objects to decode | |
| # is the number of @class mentions | |
| pbar2.total = str(d).count("@class") | |
| elif is_monty_object(d): | |
| pbar2.update(1) | |
| sys.stderr.flush() | |
| return super().process_decoded(d) | |
| if decode: | |
| decoder = MontyDecoderPbar if pbar else MontyDecoder | |
| else: | |
| decoder = None | |
| hook = pbar_hook if pbar else lambda x: x | |
| with zopen(filename, "rb") as f: | |
| dataframe_data = json.load(f, cls=decoder, object_hook=hook) | |
| pbar1.close() | |
| pbar2.close() | |
| # if only keys are data, columns, index then orient=split | |
| if isinstance(dataframe_data, dict): | |
| if set(dataframe_data.keys()) == {"data", "columns", "index"}: | |
| return pandas.DataFrame(**dataframe_data) | |
| else: | |
| return pandas.DataFrame(dataframe_data) |
It returns None during an uninterrupted debugging run, but if I set a breakpoint and run the line manually in the debug console (VS Code) then it returns the expected DataFrame.
See https://github.com/sparks-baird/mp-time-split/runs/6739787243?check_suite_focus=true/#step:5:1