GlassPy: loading data

Introduction

GlassPy can load experimental data in its subpackage glasspy.data. Currently, GlassPy has the SciGlass database as an available data source.

Basic usage

Below is a minimal example of loading SciGlass data into a pandas DataFrame. This loads the SciGlass data with the default configuration. This means that you will load most of the available data and metadata.

[1]:
from glasspy.data import SciGlass

source = SciGlass()
df = source.data

It takes a while to run this cell, but after it loads all the data, we can check what we have.

[2]:
df
[2]:
elements ... property metadata
H Li Be B C N O F Na Mg ... SurfaceTensionAboveTg SurfaceTension1173K SurfaceTension1473K SurfaceTension1573K SurfaceTension1673K ChemicalAnalysis Author Year NumberElements NumberCompounds
ID
20400020000 0.0 0.0 0.0 0.000000 0.0 0.0 0.666667 0.0 0.000000 0.000000 ... NaN NaN NaN NaN NaN False Volarovich M.P. 1936 2 1
20500020001 0.0 0.0 0.0 0.000000 0.0 0.0 0.579213 0.0 0.196815 0.000000 ... NaN NaN NaN NaN NaN False Hoj J.W. 1992 5 4
20500020002 0.0 0.0 0.0 0.000000 0.0 0.0 0.580869 0.0 0.193449 0.000000 ... NaN NaN NaN NaN NaN False Hoj J.W. 1992 5 4
20500020003 0.0 0.0 0.0 0.000000 0.0 0.0 0.581986 0.0 0.187167 0.000000 ... NaN NaN NaN NaN NaN False Hoj J.W. 1992 5 4
20500020004 0.0 0.0 0.0 0.000000 0.0 0.0 0.583672 0.0 0.183080 0.000000 ... NaN NaN NaN NaN NaN False Hoj J.W. 1992 5 4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4493300611694 0.0 0.0 0.0 0.000000 0.0 0.0 0.625485 0.0 0.000000 0.049125 ... NaN NaN NaN NaN NaN False Murata T. 2019 7 6
4493300611695 0.0 0.0 0.0 0.001948 0.0 0.0 0.637540 0.0 0.000000 0.009932 ... NaN NaN NaN NaN NaN False Murata T. 2019 10 9
4493300611696 0.0 0.0 0.0 0.000000 0.0 0.0 0.635921 0.0 0.000000 0.000000 ... NaN NaN NaN NaN NaN False Murata T. 2019 8 7
4493300611697 0.0 0.0 0.0 0.014544 0.0 0.0 0.622226 0.0 0.035890 0.000000 ... NaN NaN NaN NaN NaN False Murata T. 2019 9 8
4493300611698 0.0 0.0 0.0 0.041532 0.0 0.0 0.634462 0.0 0.000000 0.000487 ... NaN NaN NaN NaN NaN False Murata T. 2019 7 6

283102 rows × 793 columns

To avoid naming conflicts and to make it easier to navigate through the DataFrame, the data is structured in two levels. In the first level, we have information grouped by composition, property, or metadata.

[3]:
print(df.columns.levels[0])
Index(['elements', 'compounds', 'property', 'metadata'], dtype='object')

So if you want to explore the chemical elements of the data, you can just filter that part of the DataFrame.

[4]:
els = df["elements"]
els
[4]:
H Li Be B C N O F Na Mg ... W Re Pt Au Hg Tl Pb Bi Th U
ID
20400020000 0.0 0.0 0.0 0.000000 0.0 0.0 0.666667 0.0 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
20500020001 0.0 0.0 0.0 0.000000 0.0 0.0 0.579213 0.0 0.196815 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
20500020002 0.0 0.0 0.0 0.000000 0.0 0.0 0.580869 0.0 0.193449 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
20500020003 0.0 0.0 0.0 0.000000 0.0 0.0 0.581986 0.0 0.187167 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
20500020004 0.0 0.0 0.0 0.000000 0.0 0.0 0.583672 0.0 0.183080 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4493300611694 0.0 0.0 0.0 0.000000 0.0 0.0 0.625485 0.0 0.000000 0.049125 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4493300611695 0.0 0.0 0.0 0.001948 0.0 0.0 0.637540 0.0 0.000000 0.009932 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4493300611696 0.0 0.0 0.0 0.000000 0.0 0.0 0.635921 0.0 0.000000 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4493300611697 0.0 0.0 0.0 0.014544 0.0 0.0 0.622226 0.0 0.035890 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4493300611698 0.0 0.0 0.0 0.041532 0.0 0.0 0.634462 0.0 0.000000 0.000487 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

283102 rows × 76 columns

The same is true if you want to explore a particular column of the DataFrame. Suppose you want to explore the glass transition temperature:

[5]:
Tg = df["property"]["Tg"]
Tg
[5]:
ID
20400020000          NaN
20500020001      1017.15
20500020002      1096.15
20500020003      1013.15
20500020004      1013.15
                  ...
4493300611694        NaN
4493300611695        NaN
4493300611696        NaN
4493300611697        NaN
4493300611698        NaN
Name: Tg, Length: 283102, dtype: float64

As you can see, not all entries have a value for Tg.

To check for all available properties in GlassPy, run:

[6]:
print(SciGlass.available_properties())
['T0', 'T1', 'T2', 'T3', 'T4', 'T5', 'T6', 'T7', 'T8', 'T9', 'T10', 'T11', 'T12', 'Viscosity773K', 'Viscosity873K', 'Viscosity973K', 'Viscosity1073K', 'Viscosity1173K', 'Viscosity1273K', 'Viscosity1373K', 'Viscosity1473K', 'Viscosity1573K', 'Viscosity1673K', 'Viscosity1773K', 'Viscosity1873K', 'Viscosity2073K', 'Viscosity2273K', 'Viscosity2473K', 'Tg', 'Tmelt', 'Tliquidus', 'TLittletons', 'TAnnealing', 'Tstrain', 'Tsoft', 'TdilatometricSoftening', 'AbbeNum', 'RefractiveIndex', 'RefractiveIndexLow', 'RefractiveIndexHigh', 'MeanDispersion', 'Permittivity', 'TangentOfLossAngle', 'TresistivityIs1MOhm.m', 'Resistivity273K', 'Resistivity373K', 'Resistivity423K', 'Resistivity573K', 'Resistivity1073K', 'Resistivity1273K', 'Resistivity1473K', 'Resistivity1673K', 'YoungModulus', 'ShearModulus', 'Microhardness', 'PoissonRatio', 'Density293K', 'Density1073K', 'Density1273K', 'Density1473K', 'Density1673K', 'ThermalConductivity', 'ThermalShockRes', 'CTEbelowTg', 'CTE328K', 'CTE373K', 'CTE433K', 'CTE483K', 'CTE623K', 'Cp293K', 'Cp473K', 'Cp673K', 'Cp1073K', 'Cp1273K', 'Cp1473K', 'Cp1673K', 'NucleationTemperature', 'NucleationRate', 'TMaxGrowthVelocity', 'MaxGrowthVelocity', 'CrystallizationPeak', 'CrystallizationOnset', 'SurfaceTensionAboveTg', 'SurfaceTension1173K', 'SurfaceTension1473K', 'SurfaceTension1573K', 'SurfaceTension1673K']

See the pandas documentation if you are not familiar with how to use a pandas DataFrame.

Controlling the initial data collection

It takes a while to load all the SciGlass data. It is wise to load only what you will actually use. You can control what you load by passing your configuration as dictionaries to the SciGlass class.

For example, say you don’t want glasses with silver or gold in their composition, you are only interested in the glass transition temperature, and you don’t want information about the compounds that make up the glass. You can run this query like this:

[7]:
all_properties_except_Tg = SciGlass.available_properties()
all_properties_except_Tg.remove("Tg")

config_el = {
    "drop": ["Ag", "Au"],
}

config_prop = {
    "keep": ["Tg"],
    "drop": all_properties_except_Tg,
}

config_comp = {}

source = SciGlass(
    elements_cfg=config_el,
    properties_cfg=config_prop,
    compounds_cfg=config_comp,
)

df = source.data
[8]:
df
[8]:
elements property metadata
H Li Be B C N O F Na Mg ... Tl Pb Bi Th U Tg ChemicalAnalysis Author Year NumberElements
ID
20500020001 0.0 0.000000 0.0 0.000000 0.0 0.0 57.921249 0.0 19.681530 0.0 ... 0.000000 0.0 0.000000 0.0 0.0 1017.15 False Hoj J.W. 1992 5
20500020002 0.0 0.000000 0.0 0.000000 0.0 0.0 58.086941 0.0 19.344940 0.0 ... 0.000000 0.0 0.000000 0.0 0.0 1096.15 False Hoj J.W. 1992 5
20500020003 0.0 0.000000 0.0 0.000000 0.0 0.0 58.198601 0.0 18.716690 0.0 ... 0.000000 0.0 0.000000 0.0 0.0 1013.15 False Hoj J.W. 1992 5
20500020004 0.0 0.000000 0.0 0.000000 0.0 0.0 58.367241 0.0 18.308001 0.0 ... 0.000000 0.0 0.000000 0.0 0.0 1013.15 False Hoj J.W. 1992 5
20500020005 0.0 0.000000 0.0 0.000000 0.0 0.0 58.282768 0.0 18.264561 0.0 ... 0.000000 0.0 0.000000 0.0 0.0 978.15 False Hoj J.W. 1992 5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4493200611415 0.0 7.250638 0.0 2.368801 0.0 0.0 59.389221 0.0 0.000000 0.0 ... 8.964828 0.0 5.536447 0.0 0.0 543.15 False Jung Woo Man 2019 9
4493200611416 0.0 7.445931 0.0 2.358826 0.0 0.0 59.595871 0.0 0.000000 0.0 ... 6.650183 0.0 5.808963 0.0 0.0 545.15 False Jung Woo Man 2019 9
4493200611417 0.0 6.593068 0.0 10.288480 0.0 0.0 59.600090 0.0 0.000000 0.0 ... 10.782570 0.0 0.000000 0.0 0.0 532.15 False Jung Woo Man 2019 9
4493200611418 0.0 5.919064 0.0 1.936039 0.0 0.0 64.014076 0.0 0.000000 0.0 ... 7.322553 0.0 0.000000 0.0 0.0 506.15 False Jung Woo Man 2019 9
4493200611419 0.0 6.371798 0.0 2.019926 0.0 0.0 63.761761 0.0 0.000000 0.0 ... 7.882636 0.0 0.000000 0.0 0.0 522.15 False Jung Woo Man 2019 9

91738 rows × 78 columns

See the documentation for the SciGlass class for more information on how to control your initial data collection.