Datatypes and Datashapes

Every value in Ibis has two important properties: a type and shape.

The type is probably familiar to you. It is something like

The shape is one of

Datatype Flavors

For some datatypes, there are further options that define them. For instance, Integer values can be signed or unsigned, and they have a precision. For example, “uint8”, “int64”, etc. These flavors don’t affect their capabilities (eg both signed and unsigned ints have a .abs() method), but the flavor does impact how the underlying backend performs the computation.

Capabilities

Depending on the combination of datatype and datashape, a value has different capabilities. For example:

  • All String values (both StringScalars and StringColumns) have the method .upper() that transforms the string to uppercase. Floating and Array values don’t have this method, of course.
  • IntegerColumn and FloatingColumn values have .mean(), .max(), etc methods because you can aggregate over them, since they are a collection of values. On the other hand, IntegerScalar and FloatingScalar values do not have these methods, because it doesn’t make sense to take the mean or max of a single value.
  • If you call .to_pandas() on these values, you get different results. Scalar shapes result in scalar objects:
    • IntegerScalar: NumPy int64 object (or whatever specific flavor).
    • FloatingScalar: NumPy float64 object (or whatever specific flavor).
    • StringScalar: plain python str object.
    • ArrayScalar: plain python list object.
  • On the other hand, Column shapes result in pandas.Series:
    • IntegerColumn: pd.Series of integers, with the same flavor. For example, if the IntegerColumn was specifically “uint16”, then the pandas series will hold a numpy array of type “uint16”.
    • FloatingColumn: pd.Series of numpy floats with the same flavor.
    • etc.

Broadcasting and Alignment

There are rules for how different datashapes are combined. This is similar to how SQL and NumPy handles merging datashapes, if you are familiar with them.

import ibis

ibis.options.interactive = True
t1 = ibis.examples.penguins.fetch().head(100)
t1
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

We can look at the datatype of the year Column

t1.year.type()
Int64(nullable=True)

Combining two Scalars results in a Scalar:

t1.year.mean() + t1.year.std()

┌─────────────┐
│ 2008.002519 │
└─────────────┘

Combining a Column and Scalar results in a Column:

t1.year + 1000
┏━━━━━━━━━━━━━━━━━┓
┃ Add(year, 1000) ┃
┡━━━━━━━━━━━━━━━━━┩
│ int64           │
├─────────────────┤
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│                │
└─────────────────┘

Combining two Columns results in a Column:

t1.year + t1.bill_length_mm
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Add(year, bill_length_mm) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64                   │
├───────────────────────────┤
│                    2046.1 │
│                    2046.5 │
│                    2047.3 │
│                      NULL │
│                    2043.7 │
│                    2046.3 │
│                    2045.9 │
│                    2046.2 │
│                    2041.1 │
│                    2049.0 │
│                          │
└───────────────────────────┘

One requirement that might surprise you if you are coming from NumPy is Ibis’s requirements on aligning Columns: In NumPy, if you have two arbitrary arrays, each of length 100, you can add them together, and it works because the elements are “lined up” based on position. Ibis is different. Because it is based around SQL, and SQL has no notion of inherent row ordering, you cannot “line up” any two Columns in Ibis: They both have to be derived from the same Table expression. For example:

t2 = ibis.examples.population.fetch().head(100)
t2
┏━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ country      year   population ┃
┡━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ stringint64int64      │
├─────────────┼───────┼────────────┤
│ Afghanistan199517586073 │
│ Afghanistan199618415307 │
│ Afghanistan199719021226 │
│ Afghanistan199819496836 │
│ Afghanistan199919987071 │
│ Afghanistan200020595360 │
│ Afghanistan200121347782 │
│ Afghanistan200222202806 │
│ Afghanistan200323116142 │
│ Afghanistan200424018682 │
│  │
└─────────────┴───────┴────────────┘
t1.bill_depth_mm + t2.population

╭─────────────────────── Traceback (most recent call last) ────────────────────────╮
 /nix/store/pfzmxw8jv7sb8mvh5cmx3aiz4w6xr9jr-ibis-3.11/lib/python3.11/site-packag 
 es/IPython/core/formatters.py:226 in catch_format_error                          
                                                                                  
 /nix/store/pfzmxw8jv7sb8mvh5cmx3aiz4w6xr9jr-ibis-3.11/lib/python3.11/site-packag 
 es/IPython/core/formatters.py:711 in __call__                                    
                                                                                  
                             ... 8 frames hidden ...                              
                                                                                  
 /home/runner/work/ibis/ibis/ibis/expr/types/pretty.py:318 in _to_rich_table      
                                                                                  
   315 max_string = max_string or ibis.options.repr.interactive.max_string    
   316 show_types = ibis.options.repr.interactive.show_types                  
   317                                                                        
 318 table = tablish.as_table()                                             
   319 orig_ncols = len(table.columns)                                        
   320                                                                        
   321 if console_width == float("inf"):                                      
                                                                                  
 /home/runner/work/ibis/ibis/ibis/expr/types/generic.py:1525 in as_table          
                                                                                  
   1522 │   │   │   (parent,) = parents                                           
   1523 │   │   │   return parent.to_expr().select(self)                          
   1524 │   │   else:                                                             
 1525 │   │   │   raise com.RelationError(                                      
   1526 │   │   │   │   f"Cannot convert {type(self)} expression involving multip 
   1527 │   │   │   │   "base table references to a projection"                   
   1528 │   │   │   )                                                             
╰──────────────────────────────────────────────────────────────────────────────────╯
RelationError: Cannot convert <class 'ibis.expr.types.numeric.FloatingColumn'> expression involving multiple base 
table references to a projection

If you want to use these two columns together, you would need to join the tables together first:

j = ibis.join(t1, t2, "year")
j
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year   country  population ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64stringint64      │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼─────────┼────────────┤
│ Adelie Torgersen39.118.71813750male  2007Andorra81292 │
│ Adelie Torgersen39.517.41863800female2007Andorra81292 │
│ Adelie Torgersen40.318.01953250female2007Andorra81292 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007Andorra81292 │
│ Adelie Torgersen36.719.31933450female2007Andorra81292 │
│ Adelie Torgersen39.320.61903650male  2007Andorra81292 │
│ Adelie Torgersen38.917.81813625female2007Andorra81292 │
│ Adelie Torgersen39.219.61954675male  2007Andorra81292 │
│ Adelie Torgersen34.118.11933475NULL2007Andorra81292 │
│ Adelie Torgersen42.020.21904250NULL2007Andorra81292 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴─────────┴────────────┘
j.bill_depth_mm + j.population
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Add(bill_depth_mm, population) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64                        │
├────────────────────────────────┤
│                   2.634926e+07 │
│                   2.703222e+07 │
│                   3.166243e+06 │
│                   3.156626e+06 │
│                   3.509706e+07 │
│                   3.572540e+07 │
│                   5.794020e+04 │
│                   5.707150e+04 │
│                   8.131320e+04 │
│                   7.998750e+04 │
│                               │
└────────────────────────────────┘
Back to top