GlassPy: loading data
Introduction
GlassPy can load experimental data in its subpackage glasspy.data
. Currently, GlassPy has the SciGlass database as an available data source.
Basic usage
Below is a minimal example of loading SciGlass data into a pandas
DataFrame. This loads the SciGlass data with the default configuration. This means that you will load most of the available data and metadata.
[1]:
from glasspy.data import SciGlass
source = SciGlass()
df = source.data
It takes a while to run this cell, but after it loads all the data, we can check what we have.
[2]:
df
[2]:
elements | ... | property | metadata | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H | Li | Be | B | C | N | O | F | Na | Mg | ... | SurfaceTensionAboveTg | SurfaceTension1173K | SurfaceTension1473K | SurfaceTension1573K | SurfaceTension1673K | ChemicalAnalysis | Author | Year | NumberElements | NumberCompounds | |
ID | |||||||||||||||||||||
20400020000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.666667 | 0.0 | 0.000000 | 0.000000 | ... | NaN | NaN | NaN | NaN | NaN | False | Volarovich M.P. | 1936 | 2 | 1 |
20500020001 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.579213 | 0.0 | 0.196815 | 0.000000 | ... | NaN | NaN | NaN | NaN | NaN | False | Hoj J.W. | 1992 | 5 | 4 |
20500020002 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.580869 | 0.0 | 0.193449 | 0.000000 | ... | NaN | NaN | NaN | NaN | NaN | False | Hoj J.W. | 1992 | 5 | 4 |
20500020003 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.581986 | 0.0 | 0.187167 | 0.000000 | ... | NaN | NaN | NaN | NaN | NaN | False | Hoj J.W. | 1992 | 5 | 4 |
20500020004 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.583672 | 0.0 | 0.183080 | 0.000000 | ... | NaN | NaN | NaN | NaN | NaN | False | Hoj J.W. | 1992 | 5 | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4493300611694 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.625485 | 0.0 | 0.000000 | 0.049125 | ... | NaN | NaN | NaN | NaN | NaN | False | Murata T. | 2019 | 7 | 6 |
4493300611695 | 0.0 | 0.0 | 0.0 | 0.001948 | 0.0 | 0.0 | 0.637540 | 0.0 | 0.000000 | 0.009932 | ... | NaN | NaN | NaN | NaN | NaN | False | Murata T. | 2019 | 10 | 9 |
4493300611696 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.635921 | 0.0 | 0.000000 | 0.000000 | ... | NaN | NaN | NaN | NaN | NaN | False | Murata T. | 2019 | 8 | 7 |
4493300611697 | 0.0 | 0.0 | 0.0 | 0.014544 | 0.0 | 0.0 | 0.622226 | 0.0 | 0.035890 | 0.000000 | ... | NaN | NaN | NaN | NaN | NaN | False | Murata T. | 2019 | 9 | 8 |
4493300611698 | 0.0 | 0.0 | 0.0 | 0.041532 | 0.0 | 0.0 | 0.634462 | 0.0 | 0.000000 | 0.000487 | ... | NaN | NaN | NaN | NaN | NaN | False | Murata T. | 2019 | 7 | 6 |
283102 rows × 793 columns
To avoid naming conflicts and to make it easier to navigate through the DataFrame, the data is structured in two levels. In the first level, we have information grouped by composition, property, or metadata.
[3]:
print(df.columns.levels[0])
Index(['elements', 'compounds', 'property', 'metadata'], dtype='object')
So if you want to explore the chemical elements of the data, you can just filter that part of the DataFrame.
[4]:
els = df["elements"]
els
[4]:
H | Li | Be | B | C | N | O | F | Na | Mg | ... | W | Re | Pt | Au | Hg | Tl | Pb | Bi | Th | U | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
20400020000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.666667 | 0.0 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
20500020001 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.579213 | 0.0 | 0.196815 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
20500020002 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.580869 | 0.0 | 0.193449 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
20500020003 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.581986 | 0.0 | 0.187167 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
20500020004 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.583672 | 0.0 | 0.183080 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4493300611694 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.625485 | 0.0 | 0.000000 | 0.049125 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4493300611695 | 0.0 | 0.0 | 0.0 | 0.001948 | 0.0 | 0.0 | 0.637540 | 0.0 | 0.000000 | 0.009932 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4493300611696 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.635921 | 0.0 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4493300611697 | 0.0 | 0.0 | 0.0 | 0.014544 | 0.0 | 0.0 | 0.622226 | 0.0 | 0.035890 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4493300611698 | 0.0 | 0.0 | 0.0 | 0.041532 | 0.0 | 0.0 | 0.634462 | 0.0 | 0.000000 | 0.000487 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
283102 rows × 76 columns
The same is true if you want to explore a particular column of the DataFrame. Suppose you want to explore the glass transition temperature:
[5]:
Tg = df["property"]["Tg"]
Tg
[5]:
ID
20400020000 NaN
20500020001 1017.15
20500020002 1096.15
20500020003 1013.15
20500020004 1013.15
...
4493300611694 NaN
4493300611695 NaN
4493300611696 NaN
4493300611697 NaN
4493300611698 NaN
Name: Tg, Length: 283102, dtype: float64
As you can see, not all entries have a value for Tg.
To check for all available properties in GlassPy, run:
[6]:
print(SciGlass.available_properties())
['T0', 'T1', 'T2', 'T3', 'T4', 'T5', 'T6', 'T7', 'T8', 'T9', 'T10', 'T11', 'T12', 'Viscosity773K', 'Viscosity873K', 'Viscosity973K', 'Viscosity1073K', 'Viscosity1173K', 'Viscosity1273K', 'Viscosity1373K', 'Viscosity1473K', 'Viscosity1573K', 'Viscosity1673K', 'Viscosity1773K', 'Viscosity1873K', 'Viscosity2073K', 'Viscosity2273K', 'Viscosity2473K', 'Tg', 'Tmelt', 'Tliquidus', 'TLittletons', 'TAnnealing', 'Tstrain', 'Tsoft', 'TdilatometricSoftening', 'AbbeNum', 'RefractiveIndex', 'RefractiveIndexLow', 'RefractiveIndexHigh', 'MeanDispersion', 'Permittivity', 'TangentOfLossAngle', 'TresistivityIs1MOhm.m', 'Resistivity273K', 'Resistivity373K', 'Resistivity423K', 'Resistivity573K', 'Resistivity1073K', 'Resistivity1273K', 'Resistivity1473K', 'Resistivity1673K', 'YoungModulus', 'ShearModulus', 'Microhardness', 'PoissonRatio', 'Density293K', 'Density1073K', 'Density1273K', 'Density1473K', 'Density1673K', 'ThermalConductivity', 'ThermalShockRes', 'CTEbelowTg', 'CTE328K', 'CTE373K', 'CTE433K', 'CTE483K', 'CTE623K', 'Cp293K', 'Cp473K', 'Cp673K', 'Cp1073K', 'Cp1273K', 'Cp1473K', 'Cp1673K', 'NucleationTemperature', 'NucleationRate', 'TMaxGrowthVelocity', 'MaxGrowthVelocity', 'CrystallizationPeak', 'CrystallizationOnset', 'SurfaceTensionAboveTg', 'SurfaceTension1173K', 'SurfaceTension1473K', 'SurfaceTension1573K', 'SurfaceTension1673K']
See the pandas
documentation if you are not familiar with how to use a pandas
DataFrame.
Controlling the initial data collection
It takes a while to load all the SciGlass data. It is wise to load only what you will actually use. You can control what you load by passing your configuration as dictionaries to the SciGlass
class.
For example, say you don’t want glasses with silver or gold in their composition, you are only interested in the glass transition temperature, and you don’t want information about the compounds that make up the glass. You can run this query like this:
[7]:
all_properties_except_Tg = SciGlass.available_properties()
all_properties_except_Tg.remove("Tg")
config_el = {
"drop": ["Ag", "Au"],
}
config_prop = {
"keep": ["Tg"],
"drop": all_properties_except_Tg,
}
config_comp = {}
source = SciGlass(
elements_cfg=config_el,
properties_cfg=config_prop,
compounds_cfg=config_comp,
)
df = source.data
[8]:
df
[8]:
elements | property | metadata | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H | Li | Be | B | C | N | O | F | Na | Mg | ... | Tl | Pb | Bi | Th | U | Tg | ChemicalAnalysis | Author | Year | NumberElements | |
ID | |||||||||||||||||||||
20500020001 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 57.921249 | 0.0 | 19.681530 | 0.0 | ... | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 1017.15 | False | Hoj J.W. | 1992 | 5 |
20500020002 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 58.086941 | 0.0 | 19.344940 | 0.0 | ... | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 1096.15 | False | Hoj J.W. | 1992 | 5 |
20500020003 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 58.198601 | 0.0 | 18.716690 | 0.0 | ... | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 1013.15 | False | Hoj J.W. | 1992 | 5 |
20500020004 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 58.367241 | 0.0 | 18.308001 | 0.0 | ... | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 1013.15 | False | Hoj J.W. | 1992 | 5 |
20500020005 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 58.282768 | 0.0 | 18.264561 | 0.0 | ... | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 978.15 | False | Hoj J.W. | 1992 | 5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4493200611415 | 0.0 | 7.250638 | 0.0 | 2.368801 | 0.0 | 0.0 | 59.389221 | 0.0 | 0.000000 | 0.0 | ... | 8.964828 | 0.0 | 5.536447 | 0.0 | 0.0 | 543.15 | False | Jung Woo Man | 2019 | 9 |
4493200611416 | 0.0 | 7.445931 | 0.0 | 2.358826 | 0.0 | 0.0 | 59.595871 | 0.0 | 0.000000 | 0.0 | ... | 6.650183 | 0.0 | 5.808963 | 0.0 | 0.0 | 545.15 | False | Jung Woo Man | 2019 | 9 |
4493200611417 | 0.0 | 6.593068 | 0.0 | 10.288480 | 0.0 | 0.0 | 59.600090 | 0.0 | 0.000000 | 0.0 | ... | 10.782570 | 0.0 | 0.000000 | 0.0 | 0.0 | 532.15 | False | Jung Woo Man | 2019 | 9 |
4493200611418 | 0.0 | 5.919064 | 0.0 | 1.936039 | 0.0 | 0.0 | 64.014076 | 0.0 | 0.000000 | 0.0 | ... | 7.322553 | 0.0 | 0.000000 | 0.0 | 0.0 | 506.15 | False | Jung Woo Man | 2019 | 9 |
4493200611419 | 0.0 | 6.371798 | 0.0 | 2.019926 | 0.0 | 0.0 | 63.761761 | 0.0 | 0.000000 | 0.0 | ... | 7.882636 | 0.0 | 0.000000 | 0.0 | 0.0 | 522.15 | False | Jung Woo Man | 2019 | 9 |
91738 rows × 78 columns
See the documentation for the SciGlass
class for more information on how to control your initial data collection.