Table expressions

Tables are one of the core data structures in Ibis.

Table

Table(self, arg)

An immutable and lazy dataframe.

Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns, each with a single known type. Unless explicitly ordered with an .order_by(), the order of rows is undefined.

Table immutability means that the data underlying an Ibis Table cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.

You will not create Table objects directly. Instead, you will create one

See the user guide for more info.

Attributes

Name Description
columns Return a tuple of column names in this table.

Methods

Name Description
aggregate Aggregate a table with a given set of reductions grouping by by.
alias Create a table expression with a specific name alias.
as_scalar Inform ibis that the table expression should be treated as a scalar.
as_table Promote the expression to a table.
asof_join Perform an “as-of” join between left and right.
bind Bind column values to a table expression.
cache Cache the provided expression.
cast Cast the columns of a table.
compile Compile to an execution target.
count Compute the number of rows in the table.
cross_join Compute the cross join of a sequence of tables.
describe Return summary information about a table.
difference Compute the set difference of multiple table expressions.
distinct Return a Table with duplicate rows removed.
drop Remove fields from a table.
drop_null Remove rows with null values from the table.
dropna Deprecated - use drop_null instead.
equals Return whether this expression is structurally equivalent to other.
execute Execute an expression against its backend if one exists.
fill_null Fill null values in a table expression.
fillna Deprecated - use fill_null instead.
filter Select rows from table based on predicates.
get_backend Get the current Ibis backend of the expression.
get_name Return the fully qualified name of the table.
group_by Create a grouped table expression.
head Select the first n rows of a table.
info Return summary information about a table.
intersect Compute the set intersection of multiple table expressions.
join Perform a join between two tables.
limit Select n rows from self starting at offset.
mutate Add columns to a table expression.
nunique Compute the number of unique rows in the table.
order_by Sort a table by one or more expressions.
pipe Compose f with self.
pivot_longer Transform a table from wider to longer.
pivot_wider Pivot a table to a wider format.
preview Return a subset as a Rich Table.
relabel Deprecated in favor of Table.rename.
relocate Relocate columns before or after other specified columns.
rename Rename columns in the table.
rowid A unique integer per row.
sample Sample a fraction of rows from a table.
schema Return the Schema for this table.
select Compute a new table expression using exprs and named_exprs.
sql Run a SQL query against a table expression.
to_array View a single column table as an array.
to_csv Write the results of executing the given expression to a CSV file.
to_delta Write the results of executing the given expression to a Delta Lake table.
to_pandas Convert a table expression to a pandas DataFrame.
to_pandas_batches Execute expression and return an iterator of pandas DataFrames.
to_parquet Write the results of executing the given expression to a parquet file.
to_parquet_dir Write the results of executing the given expression to a parquet file in a directory.
to_polars Execute expression and return results as a polars dataframe.
to_pyarrow Execute expression and return results in as a pyarrow table.
to_pyarrow_batches Execute expression and return a RecordBatchReader.
to_torch Execute an expression and return results as a dictionary of torch tensors.
try_cast Cast the columns of a table.
unbind Return an expression built on UnboundTable instead of backend-specific objects.
union Compute the set union of multiple table expressions.
unnest Unnest an array column from a table.
unpack Project the struct fields of each of columns into self.
value_counts Compute a frequency table of this table’s values.
view Create a new table expression distinct from the current one.
visualize Visualize an expression as a GraphViz graph in the browser.

aggregate

aggregate(metrics=(), by=(), having=(), **kwargs)

Aggregate a table with a given set of reductions grouping by by.

Parameters

Name Type Description Default
metrics Sequence[ir.Scalar] | None Aggregate expressions. These can be any scalar-producing expression, including aggregation functions like sum or literal values like ibis.literal(1). ()
by Sequence[ir.Value] | None Grouping expressions. ()
having Sequence[ir.BooleanValue] | None Post-aggregation filters. The shape requirements are the same metrics, but the output type for having is boolean. ::: {.callout-warning} ## Expressions like x is None return bool and will not generate a SQL comparison to NULL ::: ()
kwargs ir.Value Named aggregate expressions {}

Returns

Name Type Description
Table An aggregate table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "fruit": ["apple", "apple", "banana", "orange"],
...         "price": [0.5, 0.5, 0.25, 0.33],
...     }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit   price   ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ stringfloat64 │
├────────┼─────────┤
│ apple 0.50 │
│ apple 0.50 │
│ banana0.25 │
│ orange0.33 │
└────────┴─────────┘
>>> t.aggregate(
...     by=["fruit"],
...     total_cost=_.price.sum(),
...     avg_cost=_.price.mean(),
...     having=_.price.sum() < 0.5,
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit   total_cost  avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringfloat64float64  │
├────────┼────────────┼──────────┤
│ banana0.250.25 │
│ orange0.330.33 │
└────────┴────────────┴──────────┘

alias

alias(alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters

Name Type Description Default
alias str Name of the child expression required

Returns

Name Type Description
Table An table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> expr = t.alias("pingüinos").sql('SELECT * FROM "pingüinos" LIMIT 5')
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

as_scalar

as_scalar()

Inform ibis that the table expression should be treated as a scalar.

Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.

Returns

Name Type Description
Scalar A scalar subquery

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.select("island").as_scalar())
>>> from_that_island.species.value_counts().order_by("species")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species  species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ stringint64         │
├─────────┼───────────────┤
│ Adelie 44 │
│ Gentoo 124 │
└─────────┴───────────────┘

as_table

as_table()

Promote the expression to a table.

This method is a no-op for table expressions.

Returns

Name Type Description
Table A table expression

Examples

>>> t = ibis.table(dict(a="int"), name="t")
>>> s = t.as_table()
>>> t is s
True

asof_join

asof_join(
    left,
    right,
    on,
    predicates=(),
    tolerance=None,
    *,
    lname='',
    rname='{name}_right',
)

Perform an “as-of” join between left and right.

Similar to a left join except that the match is done on nearest key rather than equal keys.

Parameters

Name Type Description Default
left Table Table expression required
right Table Table expression required
on str | ir.BooleanColumn Closest match inequality condition required
predicates str | ir.Column | Sequence[str | ir.Column] Additional join predicates ()
tolerance str | ir.IntervalScalar | None Amount of time to look behind when joining None
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Returns

Name Type Description
Table Table expression

bind

bind(*args, **kwargs)

Bind column values to a table expression.

This method handles the binding of every kind of column-like value that Ibis handles, including strings, integers, deferred expressions and selectors, to a table expression.

Parameters

Name Type Description Default
args Any Column-like values to bind. ()
kwargs Any Column-like values to bind, with names. {}

Returns

Name Type Description
tuple[Value, …] A tuple of bound values

cache

cache()

Cache the provided expression.

All subsequent operations on the returned expression will be performed on the cached data. The lifetime of the cached table is tied to its python references (ie. it is released once the last reference to it is garbage collected). Alternatively, use the with statement or call the .release() method for more control.

This method is idempotent: calling it multiple times in succession will return the same value as the first call.

This method eagerly evaluates the expression prior to caching

Subsequent evaluations will not recompute the expression so method chaining will not incur the overhead of caching more than once.

Returns

Name Type Description
Table Cached table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_computation = ibis.literal("Heavy Computation")
>>> cached_penguins = t.mutate(computation=heavy_computation).cache()
>>> cached_penguins
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year   computation       ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64string            │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼───────────────────┤
│ Adelie Torgersen39.118.71813750male  2007Heavy Computation │
│ Adelie Torgersen39.517.41863800female2007Heavy Computation │
│ Adelie Torgersen40.318.01953250female2007Heavy Computation │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007Heavy Computation │
│ Adelie Torgersen36.719.31933450female2007Heavy Computation │
│ Adelie Torgersen39.320.61903650male  2007Heavy Computation │
│ Adelie Torgersen38.917.81813625female2007Heavy Computation │
│ Adelie Torgersen39.219.61954675male  2007Heavy Computation │
│ Adelie Torgersen34.118.11933475NULL2007Heavy Computation │
│ Adelie Torgersen42.020.21904250NULL2007Heavy Computation │
│                  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴───────────────────┘

Explicit cache cleanup

>>> with t.mutate(computation=heavy_computation).cache() as cached_penguins:
...     cached_penguins

cast

cast(schema)

Cast the columns of a table.

Similar to pandas.DataFrame.astype.

If you need to cast columns to a single type, use selectors.

Parameters

Name Type Description Default
schema SchemaLike Mapping, schema or iterable of pairs to use for casting required

Returns

Name Type Description
Table Casted table

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
  species            string
  island             string
  bill_length_mm     float64
  bill_depth_mm      float64
  flipper_length_mm  int64
  body_mass_g        int64
  sex                string
  year               int64
}
>>> cols = ["body_mass_g", "bill_length_mm"]
>>> t[cols].head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g  bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ int64float64        │
├─────────────┼────────────────┤
│        375039.1 │
│        380039.5 │
│        325040.3 │
│        NULLNULL │
│        345036.7 │
└─────────────┴────────────────┘

Columns not present in the input schema will be passed through unchanged

>>> t.columns
('species',
 'island',
 'bill_length_mm',
 'bill_depth_mm',
 'flipper_length_mm',
 'body_mass_g',
 'sex',
 'year')
>>> expr = t.cast({"body_mass_g": "float64", "bill_length_mm": "int"})
>>> expr.select(*cols).head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g  bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64int64          │
├─────────────┼────────────────┤
│      3750.039 │
│      3800.040 │
│      3250.040 │
│        NULLNULL │
│      3450.037 │
└─────────────┴────────────────┘

Columns that are in the input schema but not in the table raise an error

>>> t.cast({"foo": "string"})  
---------------------------------------------------------------------------
IbisError                                 Traceback (most recent call last)
Cell In[35], line 1
----> 1 t.cast({"foo": "string"})  

File ~/work/ibis/ibis/ibis/expr/types/relations.py:447, in Table.cast(self, schema)
    372 def cast(self, schema: SchemaLike) -> Table:
    373     """Cast the columns of a table.
    374 
    375     Similar to `pandas.DataFrame.astype`.
   (...)
    445     ibis.common.exceptions.IbisError: Cast schema has fields that are not in the table: ['foo']
    446     """
--> 447     return self._cast(schema, cast_method="cast")

File ~/work/ibis/ibis/ibis/expr/types/relations.py:490, in Table._cast(self, schema, cast_method)
    488 columns = self.columns
    489 if missing_fields := frozenset(schema.names).difference(columns):
--> 490     raise com.IbisError(
    491         f"Cast schema has fields that are not in the table: {sorted(missing_fields)}"
    492     )
    494 for col in columns:
    495     if (new_type := schema.get(col)) is not None:

IbisError: Cast schema has fields that are not in the table: ['foo']

compile

compile(limit=None, params=None, pretty=False)

Compile to an execution target.

Parameters

Name Type Description Default
limit int | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
params Mapping[ir.Value, Any] | None Mapping of scalar parameter expressions to value None
pretty bool In case of SQL backends, return a pretty formatted SQL query. False

count

count(where=None)

Compute the number of rows in the table.

Parameters

Name Type Description Default
where ir.BooleanValue | None Optional boolean expression to filter rows when counting. None

Returns

Name Type Description
IntegerScalar Number of rows in the table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["foo", "bar", "baz"]})
>>> t
┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ baz    │
└────────┘
>>> t.count()

┌───┐
│ 3 │
└───┘
>>> t.count(t.a != "foo")

┌───┐
│ 2 │
└───┘
>>> type(t.count())
ibis.expr.types.numeric.IntegerScalar

cross_join

cross_join(left, right, *rest, lname='', rname='{name}_right')

Compute the cross join of a sequence of tables.

Parameters

Name Type Description Default
left Table Left table required
right Table Right table required
rest Table Additional tables to cross join ()
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Returns

Name Type Description
Table Cross join of left, right and rest

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.count()

┌─────┐
│ 344 │
└─────┘
>>> agg = t.drop("year").agg(s.across(s.numeric(), _.mean()))
>>> expr = t.cross_join(agg)
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year   bill_length_mm_right  bill_depth_mm_right  flipper_length_mm_right  body_mass_g_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64float64float64float64float64           │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼──────────────────────┼─────────────────────┼─────────────────────────┼───────────────────┤
│ Adelie Torgersen39.118.71813750male  200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen39.517.41863800female200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen40.318.01953250female200743.9219317.15117200.9152054201.754386 │
│ Adelie TorgersenNULLNULLNULLNULLNULL200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen36.719.31933450female200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen39.320.61903650male  200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen38.917.81813625female200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen39.219.61954675male  200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen34.118.11933475NULL200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen42.020.21904250NULL200743.9219317.15117200.9152054201.754386 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴──────────────────────┴─────────────────────┴─────────────────────────┴───────────────────┘
>>> expr.columns
('species',
 'island',
 'bill_length_mm',
 'bill_depth_mm',
 'flipper_length_mm',
 'body_mass_g',
 'sex',
 'year',
 'bill_length_mm_right',
 'bill_depth_mm_right',
 'flipper_length_mm_right',
 'body_mass_g_right')
>>> expr.count()

┌─────┐
│ 344 │
└─────┘

describe

describe(quantile=(0.25, 0.5, 0.75))

Return summary information about a table.

Parameters

Name Type Description Default
quantile Sequence[ir.NumericValue | float] The quantiles to compute for numerical columns. Defaults to (0.25, 0.5, 0.75). (0.25, 0.5, 0.75)

Returns

Name Type Description
Table A table containing summary information about the columns of self.

Notes

This function computes summary statistics for each column in the table. For numerical columns, it computes statistics such as minimum, maximum, mean, standard deviation, and quantiles. For string columns, it computes the mode and the number of unique values.

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> p = ibis.examples.penguins.fetch()
>>> p.describe()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name               pos    type     count  nulls  unique  mode    mean         std         min      p25       p50      p75      max     ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ stringint16stringint64int64int64stringfloat64float64float64float64float64float64float64 │
├───────────────────┼───────┼─────────┼───────┼───────┼────────┼────────┼─────────────┼────────────┼─────────┼──────────┼─────────┼─────────┼─────────┤
│ species          0string 34403AdelieNULLNULLNULLNULLNULLNULLNULL │
│ island           1string 34403BiscoeNULLNULLNULLNULLNULLNULLNULL │
│ bill_length_mm   2float643442164NULL43.9219305.45958432.139.22544.4548.559.6 │
│ bill_depth_mm    3float64344280NULL17.1511701.97479313.115.60017.3018.721.5 │
│ flipper_length_mm4int64  344255NULL200.91520514.061714172.0190.000197.00213.0231.0 │
│ body_mass_g      5int64  344294NULL4201.754386801.9545362700.03550.0004050.004750.06300.0 │
│ sex              6string 344112male  NULLNULLNULLNULLNULLNULLNULL │
│ year             7int64  34403NULL2008.0290700.8183562007.02007.0002008.002009.02009.0 │
└───────────────────┴───────┴─────────┴───────┴───────┴────────┴────────┴─────────────┴────────────┴─────────┴──────────┴─────────┴─────────┴─────────┘
>>> p.select(s.of_type("numeric")).describe()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name               pos    type     count  nulls  unique  mean         std         min      p25       p50      p75      max     ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ stringint16stringint64int64int64float64float64float64float64float64float64float64 │
├───────────────────┼───────┼─────────┼───────┼───────┼────────┼─────────────┼────────────┼─────────┼──────────┼─────────┼─────────┼─────────┤
│ flipper_length_mm2int64  344255200.91520514.061714172.0190.000197.00213.0231.0 │
│ body_mass_g      3int64  3442944201.754386801.9545362700.03550.0004050.004750.06300.0 │
│ year             4int64  344032008.0290700.8183562007.02007.0002008.002009.02009.0 │
│ bill_length_mm   0float64344216443.9219305.45958432.139.22544.4548.559.6 │
│ bill_depth_mm    1float6434428017.1511701.97479313.115.60017.3018.721.5 │
└───────────────────┴───────┴─────────┴───────┴───────┴────────┴─────────────┴────────────┴─────────┴──────────┴─────────┴─────────┴─────────┘
>>> p.select(s.of_type("string")).describe()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ name     pos    type    count  nulls  unique  mode   ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ stringint16stringint64int64int64string │
├─────────┼───────┼────────┼───────┼───────┼────────┼────────┤
│ sex    2string344112male   │
│ species0string34403Adelie │
│ island 1string34403Biscoe │
└─────────┴───────┴────────┴───────┴───────┴────────┴────────┘

difference

difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only diff distinct rows not occurring in the calling table True

See Also

ibis.difference

Returns

Name Type Description
Table The rows present in self that are not present in tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.difference(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
└───────┘

distinct

distinct(on=None, keep='first')

Return a Table with duplicate rows removed.

Similar to pandas.DataFrame.drop_duplicates().

Some backends do not support keep='last'

Parameters

Name Type Description Default
on str | Iterable[str] | s.Selector | None Only consider certain columns for identifying duplicates. By default deduplicate all of the columns. None
keep Literal['first', 'last'] | None Determines which duplicates to keep. - "first": Drop duplicates except for the first occurrence. - "last": Drop duplicates except for the last occurrence. - None: Drop all duplicates 'first'

Examples

>>> import ibis
>>> import ibis.examples as ex
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Compute the distinct rows of a subset of columns

>>> t[["species", "island"]].distinct().order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species    island    ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringstring    │
├───────────┼───────────┤
│ Adelie   Biscoe    │
│ Adelie   Dream     │
│ Adelie   Torgersen │
│ ChinstrapDream     │
│ Gentoo   Biscoe    │
└───────────┴───────────┘

Drop all duplicate rows except the first

>>> t.distinct(on=["species", "island"], keep="first").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   37.818.31743400female2007 │
│ Adelie   Dream    39.516.71783250female2007 │
│ Adelie   Torgersen39.118.71813750male  2007 │
│ ChinstrapDream    46.517.91923500female2007 │
│ Gentoo   Biscoe   46.113.22114500female2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicate rows except the last

>>> t.distinct(on=["species", "island"], keep="last").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   42.718.31964075male  2009 │
│ Adelie   Dream    41.518.52014000male  2009 │
│ Adelie   Torgersen43.119.21973500male  2009 │
│ ChinstrapDream    50.218.71983775female2009 │
│ Gentoo   Biscoe   49.916.12135400male  2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicated rows

>>> expr = t.distinct(on=["species", "island", "year", "bill_length_mm"], keep=None)
>>> expr.count()

┌─────┐
│ 273 │
└─────┘
>>> t.count()

┌─────┐
│ 344 │
└─────┘

You can pass selectors to on

>>> t.distinct(on=~s.numeric())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   37.718.71803600male  2007 │
│ Gentoo   Biscoe   50.016.32305700male  2007 │
│ Gentoo   Biscoe   44.514.32164100NULL2007 │
│ Adelie   Torgersen39.118.71813750male  2007 │
│ Adelie   Torgersen34.118.11933475NULL2007 │
│ Adelie   Dream    37.218.11783900male  2007 │
│ Adelie   Dream    37.518.91792975NULL2007 │
│ ChinstrapDream    46.517.91923500female2007 │
│ Adelie   Torgersen39.517.41863800female2007 │
│ Adelie   Dream    39.516.71783250female2007 │
│  │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

The only valid values of keep are "first", "last" and None.

>>> t.distinct(on="species", keep="second")  
---------------------------------------------------------------------------
IbisError                                 Traceback (most recent call last)
Cell In[101], line 1
----> 1 t.distinct(on="species", keep="second")  

File ~/work/ibis/ibis/ibis/expr/types/relations.py:1199, in Table.distinct(self, on, keep)
   1197     method = keep
   1198 else:
-> 1199     raise com.IbisError(
   1200         f"Invalid value for `keep`: {keep!r}, must be 'first', 'last' or None"
   1201     )
   1203 aggs = {col.get_name(): getattr(col, method)() for col in (~on).expand(self)}
   1204 res = self.aggregate(aggs, by=on, having=having)

IbisError: Invalid value for `keep`: 'second', must be 'first', 'last' or None

drop

drop(*fields)

Remove fields from a table.

Parameters

Name Type Description Default
fields str | Selector Fields to drop. Strings and selectors are accepted. ()

Returns

Name Type Description
Table A table with all columns matching fields removed.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop one or more columns

>>> t.drop("species").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringfloat64float64int64int64stringint64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen39.118.71813750male  2007 │
│ Torgersen39.517.41863800female2007 │
│ Torgersen40.318.01953250female2007 │
│ TorgersenNULLNULLNULLNULLNULL2007 │
│ Torgersen36.719.31933450female2007 │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.drop("species", "bill_length_mm").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island     bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringfloat64int64int64stringint64 │
├───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen18.71813750male  2007 │
│ Torgersen17.41863800female2007 │
│ Torgersen18.01953250female2007 │
│ TorgersenNULLNULLNULLNULL2007 │
│ Torgersen19.31933450female2007 │
└───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop with selectors, mix and match

>>> import ibis.selectors as s
>>> t.drop("species", s.startswith("bill_")).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island     flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringint64int64stringint64 │
├───────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen1813750male  2007 │
│ Torgersen1863800female2007 │
│ Torgersen1953250female2007 │
│ TorgersenNULLNULLNULL2007 │
│ Torgersen1933450female2007 │
└───────────┴───────────────────┴─────────────┴────────┴───────┘

drop_null

drop_null(subset=None, how='any')

Remove rows with null values from the table.

Parameters

Name Type Description Default
subset Sequence[str] | str | None Columns names to consider when dropping nulls. By default all columns are considered. None
how Literal['any', 'all'] Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). 'any'

Returns

Name Type Description
Table Table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.count()

┌─────┐
│ 344 │
└─────┘
>>> t.drop_null(["bill_length_mm", "body_mass_g"]).count()

┌─────┐
│ 342 │
└─────┘
>>> t.drop_null(how="all").count()  # no rows where all columns are null

┌─────┐
│ 344 │
└─────┘

dropna

dropna(subset=None, how='any')

Deprecated - use drop_null instead.

equals

equals(other)

Return whether this expression is structurally equivalent to other.

If you want to produce an equality expression, use == syntax.

Parameters

Name Type Description Default
other Another expression required

Examples

>>> import ibis
>>> t1 = ibis.table(dict(a="int"), name="t")
>>> t2 = ibis.table(dict(a="int"), name="t")
>>> t1.equals(t2)
True
>>> v = ibis.table(dict(a="string"), name="v")
>>> t1.equals(v)
False

execute

execute(limit='default', params=None, **kwargs)

Execute an expression against its backend if one exists.

Parameters

Name Type Description Default
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. 'default'
params Mapping[ir.Value, Any] | None Mapping of scalar parameter expressions to value None
kwargs Any Keyword arguments {}

fill_null

fill_null(replacements)

Fill null values in a table expression.

There is potential lack of type stability with the fill_null API

For example, different library versions may impact whether a given backend promotes integer replacement values to floats.

Parameters

Name Type Description Default
replacements ir.Scalar | Mapping[str, ir.Scalar] Value with which to fill nulls. If replacements is a mapping, the keys are column names that map to their replacement value. If passed as a scalar all columns are filled with that value. required

Returns

Name Type Description
Table Table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.sex
┏━━━━━━━━┓
┃ sex    ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male   │
│ female │
│ female │
│ NULL   │
│ female │
│ male   │
│ female │
│ male   │
│ NULL   │
│ NULL   │
│       │
└────────┘
>>> t.fill_null({"sex": "unrecorded"}).sex
┏━━━━━━━━━━━━┓
┃ sex        ┃
┡━━━━━━━━━━━━┩
│ string     │
├────────────┤
│ male       │
│ female     │
│ female     │
│ unrecorded │
│ female     │
│ male       │
│ female     │
│ male       │
│ unrecorded │
│ unrecorded │
│           │
└────────────┘

fillna

fillna(replacements)

Deprecated - use fill_null instead.

filter

filter(*predicates)

Select rows from table based on predicates.

Parameters

Name Type Description Default
predicates ir.BooleanValue | Sequence[ir.BooleanValue] | IfAnyAll Boolean value expressions used to select rows in table. ()

Returns

Name Type Description
Table Filtered table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.filter([t.species == "Adelie", t.body_mass_g > 3500]).sex.value_counts().drop_null(
...     "sex"
... ).order_by("sex")
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex     sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ stringint64     │
├────────┼───────────┤
│ female22 │
│ male  68 │
└────────┴───────────┘

get_backend

get_backend()

Get the current Ibis backend of the expression.

Returns

Name Type Description
BaseBackend The Ibis backend.

Examples

>>> import ibis
>>> con = ibis.duckdb.connect()
>>> t = con.create_table("t", {"id": [1, 2, 3]})
>>> backend = t.get_backend()
>>> backend.name
'duckdb'
>>> type(backend)
ibis.backends.duckdb.Backend

get_name

get_name()

Return the fully qualified name of the table.

Examples

>>> import ibis
>>> con = ibis.duckdb.connect()
>>> t = con.create_table("t", {"id": [1, 2, 3]})
>>> t.get_name()
'memory.main.t'

group_by

group_by(*by, **key_exprs)

Create a grouped table expression.

Similar to SQL’s GROUP BY statement, or pandas .groupby() method.

Parameters

Name Type Description Default
by str | ir.Value | Iterable[str] | Iterable[ir.Value] | None Grouping expressions ()
key_exprs str | ir.Value | Iterable[str] | Iterable[ir.Value] Named grouping expressions {}

Returns

Name Type Description
GroupedTable A grouped table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "fruit": ["apple", "apple", "banana", "orange"],
...         "price": [0.5, 0.5, 0.25, 0.33],
...     }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit   price   ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ stringfloat64 │
├────────┼─────────┤
│ apple 0.50 │
│ apple 0.50 │
│ banana0.25 │
│ orange0.33 │
└────────┴─────────┘
>>> t.group_by("fruit").agg(total_cost=_.price.sum(), avg_cost=_.price.mean()).order_by(
...     "fruit"
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit   total_cost  avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringfloat64float64  │
├────────┼────────────┼──────────┤
│ apple 1.000.50 │
│ banana0.250.25 │
│ orange0.330.33 │
└────────┴────────────┴──────────┘

head

head(n=5)

Select the first n rows of a table.

The result set is not deterministic without a call to order_by.

Parameters

Name Type Description Default
n int Number of rows to include 5

Returns

Name Type Description
Table self limited to n rows

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
│     2a      │
└───────┴────────┘
>>> t.head(2)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
└───────┴────────┘

See Also

Table.limit Table.order_by

info

info()

Return summary information about a table.

Returns

Name Type Description
Table Summary of self

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.info()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ name               type     nullable  nulls  non_nulls  null_frac  pos   ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ stringstringbooleanint64int64float64int16 │
├───────────────────┼─────────┼──────────┼───────┼───────────┼───────────┼───────┤
│ species          string  │ True     │     03440.0000000 │
│ island           string  │ True     │     03440.0000001 │
│ bill_length_mm   float64 │ True     │     23420.0058142 │
│ bill_depth_mm    float64 │ True     │     23420.0058143 │
│ flipper_length_mmint64   │ True     │     23420.0058144 │
│ body_mass_g      int64   │ True     │     23420.0058145 │
│ sex              string  │ True     │    113330.0319776 │
│ year             int64   │ True     │     03440.0000007 │
└───────────────────┴─────────┴──────────┴───────┴───────────┴───────────┴───────┘

intersect

intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only return distinct rows True

Returns

Name Type Description
Table A new table containing the intersection of all input tables.

See Also

ibis.intersect

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.intersect(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
└───────┘

join

join(left, right, predicates=(), how='inner', *, lname='', rname='{name}_right')

Perform a join between two tables.

Parameters

Name Type Description Default
left Table Left table to join required
right Table Right table to join required
predicates str | Sequence[str | ir.BooleanColumn | Literal[True] | Literal[False] | tuple[str | ir.Column | ir.Deferred, str | ir.Column | ir.Deferred]] Condition(s) to join on. See examples for details. ()
how JoinKind Join method, e.g. "inner" or "left". 'inner'
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId  title                               genres                                      ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64stringstring                                      │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│       1Toy Story (1995)                  Adventure|Animation|Children|Comedy|Fantasy │
│       2Jumanji (1995)                    Adventure|Children|Fantasy                  │
│       3Grumpier Old Men (1995)           Comedy|Romance                              │
│       4Waiting to Exhale (1995)          Comedy|Drama|Romance                        │
│       5Father of the Bride Part II (1995)Comedy                                      │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = ibis.examples.ml_latest_small_ratings.fetch().drop("timestamp")
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64int64float64 │
├────────┼─────────┼─────────┤
│      114.0 │
│      134.0 │
│      164.0 │
│      1475.0 │
│      1505.0 │
└────────┴─────────┴─────────┘

Equality left join on the shared movieId column. Note the _right suffix added to all overlapping columns from the right table (in this case only the “movieId” column).

>>> ratings.join(movies, "movieId", how="left").head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   movieId_right  title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64int64stringstring                                      │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.01Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.03Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.06Heat (1995)                Action|Crime|Thriller                       │
│      1475.047Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.050Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘

Explicit equality join using the default how value of "inner". Note how there is no _right suffix added to the movieId column since this is an inner join and the movieId column is part of the join condition.

>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64stringstring                                      │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.0Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.0Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.0Heat (1995)                Action|Crime|Thriller                       │
│      1475.0Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.0Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = ibis.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId  movieId  tag              timestamp  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64int64stringint64      │
├────────┼─────────┼─────────────────┼────────────┤
│      260756funny          1445714994 │
│      260756Highly quotable1445714996 │
│      260756will ferrell   1445714992 │
│      289774Boxing story   1445715207 │
│      289774MMA            1445715200 │
└────────┴─────────┴─────────────────┴────────────┘

You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:

>>> tags.join(ratings, ["userId", "movieId"]).head(5).order_by("userId")
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  tag             timestamp   rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64int64stringint64float64 │
├────────┼─────────┼────────────────┼────────────┼─────────┤
│     622Robin Williams15288439074.0 │
│     62110sword fight   15281525354.5 │
│     62410gothic        15256366094.5 │
│     622023mafia         15256367335.0 │
│     622124quirky        15256368465.0 │
└────────┴─────────┴────────────────┴────────────┴─────────┘

To self-join a table with itself, you need to call .view() on one of the arguments so the two tables are distinct from each other.

For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key}), where each key can be

  • a Column
  • Deferred expression
  • lambda of the form (Table) -> Column

For example, to find all movies pairings that received the same (ignoring case) tags:

>>> movie_tags = tags["movieId", "tag"]
>>> view = movie_tags.view()
>>> movie_tags.join(
...     view,
...     [
...         movie_tags.movieId != view.movieId,
...         (_.tag.lower(), lambda t: t.tag.lower()),
...     ],
... ).head().order_by(("movieId", "movieId_right"))
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId  tag                movieId_right  tag_right         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64string            │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│   60756funny            1732funny             │
│   60756Highly quotable  1732Highly quotable   │
│   89774Tom Hardy        139385tom hardy         │
│  106782drugs            1732drugs             │
│  106782Leonardo DiCaprio5989Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘

limit

limit(n, offset=0)

Select n rows from self starting at offset.

The result set is not deterministic without a call to order_by.

Parameters

Name Type Description Default
n int | None Number of rows to include. If None, the entire table is selected starting from offset. required
offset int Number of rows to skip first 0

Returns

Name Type Description
Table The first n rows of self starting at offset

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
│     2a      │
└───────┴────────┘
>>> t.limit(2)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
└───────┴────────┘

You can use None with offset to slice starting from a particular row

>>> t.limit(None, offset=1)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1a      │
│     2a      │
└───────┴────────┘

See Also

Table.order_by

mutate

mutate(*exprs, **mutations)

Add columns to a table expression.

Parameters

Name Type Description Default
exprs Sequence[ir.Expr] | None List of named expressions to add as columns ()
mutations ir.Value Named expressions using keyword arguments {}

Returns

Name Type Description
Table Table expression with additional columns

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().select("species", "year", "bill_length_mm")
>>> t
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species  year   bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringint64float64        │
├─────────┼───────┼────────────────┤
│ Adelie 200739.1 │
│ Adelie 200739.5 │
│ Adelie 200740.3 │
│ Adelie 2007NULL │
│ Adelie 200736.7 │
│ Adelie 200739.3 │
│ Adelie 200738.9 │
│ Adelie 200739.2 │
│ Adelie 200734.1 │
│ Adelie 200742.0 │
│  │
└─────────┴───────┴────────────────┘

Add a new column from a per-element expression

>>> t.mutate(next_year=_.year + 1).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species  year   bill_length_mm  next_year ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringint64float64int64     │
├─────────┼───────┼────────────────┼───────────┤
│ Adelie 200739.12008 │
│ Adelie 200739.52008 │
│ Adelie 200740.32008 │
│ Adelie 2007NULL2008 │
│ Adelie 200736.72008 │
└─────────┴───────┴────────────────┴───────────┘

Add a new column based on an aggregation. Note the automatic broadcasting.

>>> t.select("species", bill_demean=_.bill_length_mm - _.bill_length_mm.mean()).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ species  bill_demean ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━┩
│ stringfloat64     │
├─────────┼─────────────┤
│ Adelie -4.82193 │
│ Adelie -4.42193 │
│ Adelie -3.62193 │
│ Adelie NULL │
│ Adelie -7.22193 │
└─────────┴─────────────┘

Mutate across multiple columns

>>> t.mutate(s.across(s.numeric() & ~s.cols("year"), _ - _.mean())).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species  year   bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringint64float64        │
├─────────┼───────┼────────────────┤
│ Adelie 2007-4.82193 │
│ Adelie 2007-4.42193 │
│ Adelie 2007-3.62193 │
│ Adelie 2007NULL │
│ Adelie 2007-7.22193 │
└─────────┴───────┴────────────────┘

nunique

nunique(where=None)

Compute the number of unique rows in the table.

Parameters

Name Type Description Default
where ir.BooleanValue | None Optional boolean expression to filter rows when counting. None

Returns

Name Type Description
IntegerScalar Number of unique rows in the table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["foo", "bar", "bar"]})
>>> t
┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ bar    │
└────────┘
>>> t.nunique()

┌───┐
│ 2 │
└───┘
>>> t.nunique(t.a != "foo")

┌───┐
│ 1 │
└───┘

order_by

order_by(*by)

Sort a table by one or more expressions.

Similar to pandas.DataFrame.sort_values().

Parameters

Name Type Description Default
by str | ir.Column | s.Selector | Sequence[str] | Sequence[ir.Column] | Sequence[s.Selector] | None Expressions to sort the table by. ()

Returns

Name Type Description
Table Sorted table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "a": [3, 2, 1, 3],
...         "b": ["a", "B", "c", "D"],
...         "c": [4, 6, 5, 7],
...     }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     3a     4 │
│     2B     6 │
│     1c     5 │
│     3D     7 │
└───────┴────────┴───────┘

Sort by b. Default is ascending. Note how capital letters come before lowercase

>>> t.order_by("b")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     2B     6 │
│     3D     7 │
│     3a     4 │
│     1c     5 │
└───────┴────────┴───────┘

Sort in descending order

>>> t.order_by(ibis.desc("b"))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     3a     4 │
│     3D     7 │
│     2B     6 │
└───────┴────────┴───────┘

You can also use the deferred API to get the same result

>>> from ibis import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     3a     4 │
│     3D     7 │
│     2B     6 │
└───────┴────────┴───────┘

Sort by multiple columns/expressions

>>> t.order_by(["a", _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     2B     6 │
│     3D     7 │
│     3a     4 │
└───────┴────────┴───────┘

You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b

>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     3a     4 │
│     2B     6 │
│     1c     5 │
│     3D     7 │
└───────┴────────┴───────┘

This means that shuffling a Table is super simple

>>> t.order_by(ibis.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     2B     6 │
│     3a     4 │
│     3D     7 │
└───────┴────────┴───────┘

Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria

>>> import ibis.selectors as s
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins[["year", "island"]].value_counts().order_by(s.startswith("year"))
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year   island     year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64             │
├───────┼───────────┼───────────────────┤
│  2007Torgersen20 │
│  2007Biscoe   44 │
│  2007Dream    46 │
│  2008Torgersen16 │
│  2008Dream    34 │
│  2008Biscoe   64 │
│  2009Torgersen16 │
│  2009Dream    44 │
│  2009Biscoe   60 │
└───────┴───────────┴───────────────────┘

Use the across selector to apply a specific order to multiple columns

>>> penguins[["year", "island"]].value_counts().order_by(
...     s.across(s.startswith("year"), _.desc())
... )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year   island     year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64             │
├───────┼───────────┼───────────────────┤
│  2009Biscoe   60 │
│  2009Dream    44 │
│  2009Torgersen16 │
│  2008Biscoe   64 │
│  2008Dream    34 │
│  2008Torgersen16 │
│  2007Dream    46 │
│  2007Biscoe   44 │
│  2007Torgersen20 │
└───────┴───────────┴───────────────────┘

pipe

pipe(f, *args, **kwargs)

Compose f with self.

Parameters

Name Type Description Default
f If the expression needs to be passed as anything other than the first argument to the function, pass a tuple with the argument name. For example, (f, ‘data’) if the function f expects a ‘data’ keyword required
args Any Positional arguments to f ()
kwargs Any Keyword arguments to f {}

Examples

>>> import ibis
>>> t = ibis.memtable(
...     {
...         "a": [5, 10, 15],
...         "b": ["a", "b", "c"],
...     }
... )
>>> f = lambda a: (a + 1).name("a")
>>> g = lambda a: (a * 2).name("a")
>>> result1 = t.a.pipe(f).pipe(g)
>>> result1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│    12 │
│    22 │
│    32 │
└───────┘
>>> result2 = g(f(t.a))  # equivalent to the above
>>> result1.equals(result2)
True

Returns

Name Type Description
Expr Result type of passed function

pivot_longer

pivot_longer(
    col,
    *,
    names_to='name',
    names_pattern='(.+)',
    names_transform=None,
    values_to='value',
    values_transform=None,
)

Transform a table from wider to longer.

Parameters

Name Type Description Default
col str | s.Selector String column name or selector. required
names_to str | Iterable[str] A string or iterable of strings indicating how to name the new pivoted columns. 'name'
names_pattern str | re.Pattern Pattern to use to extract column names from the input. By default the entire column name is extracted. '(.+)'
names_transform Callable[[str], ir.Value] | Mapping[str, Callable[[str], ir.Value]] | None Function or mapping of a name in names_to to a function to transform a column name to a value. None
values_to str Name of the pivoted value column. 'value'
values_transform Callable[[ir.Value], ir.Value] | Deferred | None Apply a function to the value column. This can be a lambda or deferred expression. None

Returns

Name Type Description
Table Pivoted table

Examples

Basic usage

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> relig_income = ibis.examples.relig_income_raw.fetch()
>>> relig_income
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ religion                 <$10k  $10-20k  $20-30k  $30-40k  $40-50k  $50-75k  $75-100k  $100-150k  >150k  Don't know/refused ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ stringint64int64int64int64int64int64int64int64int64int64              │
├─────────────────────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼───────────┼───────┼────────────────────┤
│ Agnostic               27346081761371221098496 │
│ Atheist                12273752357073597476 │
│ Buddhist               27213034335862395354 │
│ Catholic               41861773267063811169497926331489 │
│ Don’t know/refused     151415111035211718116 │
│ Evangelical Prot       575869106498288114869497234141529 │
│ Hindu                  1979113447485437 │
│ Historically Black Prot2282442362381972231318178339 │
│ Jehovah's Witness      2027242421301511637 │
│ Jewish                 1919252530956987151162 │
│  │
└─────────────────────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴──────────┴───────────┴───────┴────────────────────┘

Here we convert column names not matching the selector for the religion column and convert those names into values

>>> relig_income.pivot_longer(~s.cols("religion"), names_to="income", values_to="count")
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ religion  income              count ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ stringstringint64 │
├──────────┼────────────────────┼───────┤
│ Agnostic<$10k             27 │
│ Agnostic$10-20k           34 │
│ Agnostic$20-30k           60 │
│ Agnostic$30-40k           81 │
│ Agnostic$40-50k           76 │
│ Agnostic$50-75k           137 │
│ Agnostic$75-100k          122 │
│ Agnostic$100-150k         109 │
│ Agnostic>150k             84 │
│ AgnosticDon't know/refused96 │
│  │
└──────────┴────────────────────┴───────┘

Similarly for a different example dataset, we convert names to values but using a different selector and the default values_to value.

>>> world_bank_pop = ibis.examples.world_bank_pop_raw.fetch()
>>> world_bank_pop.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ country  indicator    2000          2001          2002          2003          2004          2005          2006          2007          2008          2009          2010          2011          2012          2013          2014          2015          2016          2017         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ stringstringfloat64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64      │
├─────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ ABW    SP.URB.TOTL4.162500e+044.202500e+044.219400e+044.227700e+044.231700e+044.239900e+044.255500e+044.272900e+044.290600e+044.307900e+044.320600e+044.349300e+044.386400e+044.422800e+044.458800e+044.494300e+044.529700e+044.564800e+04 │
│ ABW    SP.URB.GROW1.664222e+009.563731e-014.013352e-011.965172e-019.456936e-021.935880e-013.672580e-014.080490e-014.133830e-014.023963e-012.943735e-016.620631e-018.493932e-018.264135e-018.106692e-017.930256e-017.845785e-017.718989e-01 │
│ ABW    SP.POP.TOTL8.910100e+049.069100e+049.178100e+049.270100e+049.354000e+049.448300e+049.560600e+049.678700e+049.799600e+049.921200e+041.003410e+051.012880e+051.021120e+051.028800e+051.035940e+051.042570e+051.048740e+051.054390e+05 │
│ ABW    SP.POP.GROW2.539234e+001.768757e+001.194718e+009.973955e-019.009892e-011.003077e+001.181566e+001.227711e+001.241397e+001.233231e+001.131541e+009.393559e-018.102306e-017.493010e-016.916153e-016.379592e-015.900625e-015.372957e-01 │
│ AFE    SP.URB.TOTL1.155517e+081.197755e+081.242275e+081.288340e+081.336475e+081.387456e+081.440267e+081.492313e+081.553838e+081.617762e+081.684561e+081.754157e+081.825587e+081.901087e+081.980733e+082.065563e+082.150833e+082.237321e+08 │
└─────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
>>> world_bank_pop.pivot_longer(s.matches(r"\d{4}"), names_to="year").head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ country  indicator    year    value   ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ stringstringstringfloat64 │
├─────────┼─────────────┼────────┼─────────┤
│ ABW    SP.URB.TOTL2000  41625.0 │
│ ABW    SP.URB.TOTL2001  42025.0 │
│ ABW    SP.URB.TOTL2002  42194.0 │
│ ABW    SP.URB.TOTL2003  42277.0 │
│ ABW    SP.URB.TOTL2004  42317.0 │
└─────────┴─────────────┴────────┴─────────┘

pivot_longer has some preprocessing capabilities like stripping a prefix and applying a function to column names

>>> billboard = ibis.examples.billboard.fetch()
>>> billboard
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ artist          track                    date_entered  wk1    wk2    wk3    wk4    wk5    wk6    wk7    wk8    wk9    wk10   wk11   wk12   wk13   wk14   wk15   wk16   wk17   wk18   wk19   wk20   wk21   wk22   wk23   wk24   wk25   wk26   wk27   wk28   wk29   wk30   wk31   wk32   wk33   wk34   wk35   wk36   wk37   wk38   wk39   wk40   wk41   wk42   wk43   wk44   wk45   wk46   wk47   wk48   wk49   wk50   wk51   wk52   wk53   wk54   wk55   wk56   wk57   wk58   wk59   wk60   wk61   wk62   wk63   wk64   wk65   wk66    wk67    wk68    wk69    wk70    wk71    wk72    wk73    wk74    wk75    wk76   ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ stringstringdateint64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64stringstringstringstringstringstringstringstringstringstringstring │
├────────────────┼─────────────────────────┼──────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ 2 Pac         Baby Don't Cry (Keep...2000-02-2687827277879499NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 2Ge+her       The Hardest Part Of ...2000-09-02918792NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 3 Doors Down  Kryptonite             2000-04-088170686766575453515151514744382822181814127666554444333455991514131416172122242833424249NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 3 Doors Down  Loser                  2000-10-217676726967655559626161596166727675677370NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 504 Boyz      Wobble Wobble          2000-04-15573425171731364953576470757678859296NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 98^0          Give Me Just One Nig...2000-08-1951393426261922367222936476766849394NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ A*Teens       Dancing Queen          2000-07-0897979695100NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ Aaliyah       I Don't Wanna          2000-01-298462514138353538383637373849616362678386NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ Aaliyah       Try Again              2000-03-1859533828211816141210986122223455691314162322333643NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ Adams, YolandaOpen My Heart          2000-08-267676746968676158575966686167596367717989NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│       │
└────────────────┴─────────────────────────┴──────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
>>> billboard.pivot_longer(
...     s.startswith("wk"),
...     names_to="week",
...     names_pattern=r"wk(.+)",
...     names_transform=int,
...     values_to="rank",
...     values_transform=_.cast("int"),
... ).drop_null("rank")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ artist   track                    date_entered  week  rank  ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ stringstringdateint8int64 │
├─────────┼─────────────────────────┼──────────────┼──────┼───────┤
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26187 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26282 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26372 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26477 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26587 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26694 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26799 │
│ 2Ge+herThe Hardest Part Of ...2000-09-02191 │
│ 2Ge+herThe Hardest Part Of ...2000-09-02287 │
│ 2Ge+herThe Hardest Part Of ...2000-09-02392 │
│  │
└─────────┴─────────────────────────┴──────────────┴──────┴───────┘

You can use regular expression capture groups to extract multiple variables stored in column names

>>> who = ibis.examples.who.fetch()
>>> who
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ country      iso2    iso3    year   new_sp_m014  new_sp_m1524  new_sp_m2534  new_sp_m3544  new_sp_m4554  new_sp_m5564  new_sp_m65  new_sp_f014  new_sp_f1524  new_sp_f2534  new_sp_f3544  new_sp_f4554  new_sp_f5564  new_sp_f65  new_sn_m014  new_sn_m1524  new_sn_m2534  new_sn_m3544  new_sn_m4554  new_sn_m5564  new_sn_m65  new_sn_f014  new_sn_f1524  new_sn_f2534  new_sn_f3544  new_sn_f4554  new_sn_f5564  new_sn_f65  new_ep_m014  new_ep_m1524  new_ep_m2534  new_ep_m3544  new_ep_m4554  new_ep_m5564  new_ep_m65  new_ep_f014  new_ep_f1524  new_ep_f2534  new_ep_f3544  new_ep_f4554  new_ep_f5564  new_ep_f65  newrel_m014  newrel_m1524  newrel_m2534  newrel_m3544  newrel_m4554  newrel_m5564  newrel_m65  newrel_f014  newrel_f1524  newrel_f2534  newrel_f3544  newrel_f4554  newrel_f5564  newrel_f65 ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ stringstringstringint64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64      │
├─────────────┼────────┼────────┼───────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┤
│ AfghanistanAF    AFG   1980NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1981NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1982NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1983NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1984NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1985NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1986NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1987NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1988NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1989NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│  │
└─────────────┴────────┴────────┴───────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┘
>>> len(who.columns)
60
>>> who.pivot_longer(
...     s.index["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_(.)(.*)",
...     values_to="count",
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ country      iso2    iso3    year   diagnosis  gender  age     count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringstringint64stringstringstringint64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼────────┼───────┤
│ AfghanistanAF    AFG   1980sp       m     014   NULL │
│ AfghanistanAF    AFG   1980sp       m     1524  NULL │
│ AfghanistanAF    AFG   1980sp       m     2534  NULL │
│ AfghanistanAF    AFG   1980sp       m     3544  NULL │
│ AfghanistanAF    AFG   1980sp       m     4554  NULL │
│ AfghanistanAF    AFG   1980sp       m     5564  NULL │
│ AfghanistanAF    AFG   1980sp       m     65    NULL │
│ AfghanistanAF    AFG   1980sp       f     014   NULL │
│ AfghanistanAF    AFG   1980sp       f     1524  NULL │
│ AfghanistanAF    AFG   1980sp       f     2534  NULL │
│  │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴────────┴───────┘

names_transform is flexible, and can be:

1. A mapping of one or more names in `names_to` to callable
2. A callable that will be applied to every name

Let’s recode gender and age to numeric values using a mapping

>>> who.pivot_longer(
...     s.index["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_(.)(.*)",
...     names_transform=dict(
...         gender={"m": 1, "f": 2}.get,
...         age=dict(
...             zip(
...                 ["014", "1524", "2534", "3544", "4554", "5564", "65"],
...                 range(7),
...             )
...         ).get,
...     ),
...     values_to="count",
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ country      iso2    iso3    year   diagnosis  gender  age   count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ stringstringstringint64stringint8int8int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼──────┼───────┤
│ AfghanistanAF    AFG   1980sp       10NULL │
│ AfghanistanAF    AFG   1980sp       11NULL │
│ AfghanistanAF    AFG   1980sp       12NULL │
│ AfghanistanAF    AFG   1980sp       13NULL │
│ AfghanistanAF    AFG   1980sp       14NULL │
│ AfghanistanAF    AFG   1980sp       15NULL │
│ AfghanistanAF    AFG   1980sp       16NULL │
│ AfghanistanAF    AFG   1980sp       20NULL │
│ AfghanistanAF    AFG   1980sp       21NULL │
│ AfghanistanAF    AFG   1980sp       22NULL │
│  │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴──────┴───────┘

The number of match groups in names_pattern must match the length of names_to

>>> who.pivot_longer(  
...     s.index["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_.(.*)",
... )
---------------------------------------------------------------------------
IbisInputError                            Traceback (most recent call last)
Cell In[293], line 1
----> 1 who.pivot_longer(  
      2     s.index["new_sp_m014":"newrel_f65"],
      3     names_to=["diagnosis", "gender", "age"],
      4     names_pattern="new_?(.*)_.(.*)",
      5 )

File ~/work/ibis/ibis/ibis/expr/types/relations.py:3838, in Table.pivot_longer(self, col, names_to, names_pattern, names_transform, values_to, values_transform)
   3836 names_pattern = re.compile(names_pattern)
   3837 if (ngroups := names_pattern.groups) != (nnames := len(names_to)):
-> 3838     raise com.IbisInputError(
   3839         f"Number of match groups in `names_pattern`"
   3840         f"{names_pattern.pattern!r} ({ngroups:d} groups) doesn't "
   3841         f"match the length of `names_to` {names_to} (length {nnames:d})"
   3842     )
   3844 if names_transform is None:
   3845     names_transform = dict.fromkeys(names_to, toolz.identity)

IbisInputError: Number of match groups in `names_pattern`'new_?(.*)_.(.*)' (2 groups) doesn't match the length of `names_to` ['diagnosis', 'gender', 'age'] (length 3)

names_transform must be a mapping or callable

>>> who.pivot_longer(
...     s.index["new_sp_m014":"newrel_f65"], names_transform="upper"
... )  # quartodoc: +EXPECTED_FAILURE
---------------------------------------------------------------------------
IbisTypeError                             Traceback (most recent call last)
Cell In[296], line 1
----> 1 who.pivot_longer(
      2     s.index["new_sp_m014":"newrel_f65"], names_transform="upper"
      3 )  # quartodoc: +EXPECTED_FAILURE

File ~/work/ibis/ibis/ibis/expr/types/relations.py:3850, in Table.pivot_longer(self, col, names_to, names_pattern, names_transform, values_to, values_transform)
   3848         names_transform = dict.fromkeys(names_to, names_transform)
   3849     else:
-> 3850         raise com.IbisTypeError(
   3851             f"`names_transform` must be a mapping or callable. Got {type(names_transform)}"
   3852         )
   3854 for name in names_to:
   3855     names_transform.setdefault(name, toolz.identity)

IbisTypeError: `names_transform` must be a mapping or callable. Got <class 'str'>

pivot_wider

pivot_wider(
    id_cols=None,
    names_from='name',
    names_prefix='',
    names_sep='_',
    names_sort=False,
    names=None,
    values_from='value',
    values_fill=None,
    values_agg='arbitrary',
)

Pivot a table to a wider format.

Parameters

Name Type Description Default
id_cols s.Selector | None A set of columns that uniquely identify each observation. None
names_from str | Iterable[str] | s.Selector An argument describing which column or columns to use to get the name of the output columns. 'name'
names_prefix str String added to the start of every column name. ''
names_sep str If names_from or values_from contains multiple columns, this argument will be used to join their values together into a single string to use as a column name. '_'
names_sort bool If True columns are sorted. If False column names are ordered by appearance. False
names Iterable[str] | None An explicit sequence of values to look for in columns matching names_from. * When this value is None, the values will be computed from names_from. * When this value is not None, each element’s length must match the length of names_from. See examples below for more detail. None
values_from str | Iterable[str] | s.Selector An argument describing which column or columns to get the cell values from. 'value'
values_fill int | float | str | ir.Scalar | None A scalar value that specifies what each value should be filled with when missing. None
values_agg str | Callable[[ir.Value], ir.Scalar] | Deferred A function applied to the value in each cell in the output. 'arbitrary'

Returns

Name Type Description
Table Wider pivoted table

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True

Basic usage

>>> fish_encounters = ibis.examples.fish_encounters.fetch()
>>> fish_encounters
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish   station  seen  ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼─────────┼───────┤
│  4842Release1 │
│  4842I80_1  1 │
│  4842Lisbon 1 │
│  4842Rstr   1 │
│  4842Base_TD1 │
│  4842BCE    1 │
│  4842BCW    1 │
│  4842BCE2   1 │
│  4842BCW2   1 │
│  4842MAE    1 │
│      │
└───────┴─────────┴───────┘
>>> fish_encounters.pivot_wider(names_from="station", values_from="seen")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish   Lisbon  Base_TD  MAE    Release  I80_1  BCE    BCE2   Rstr   BCW    BCW2   MAW   ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64int64int64int64int64int64int64int64int64int64int64int64 │
├───────┼────────┼─────────┼───────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│  4851NULLNULLNULL11NULLNULLNULLNULLNULLNULL │
│  485711NULL1111111NULL │
│  485811111111111 │
│  486211NULL1111111NULL │
│  4863NULLNULLNULL11NULLNULLNULLNULLNULLNULL │
│  484211111111111 │
│  48471NULLNULL11NULLNULLNULLNULLNULLNULL │
│  4850NULL1NULL111NULL11NULLNULL │
│  4854NULLNULLNULL11NULLNULLNULLNULLNULLNULL │
│  485511NULL11NULLNULL1NULLNULLNULL │
│      │
└───────┴────────┴─────────┴───────┴─────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┘

You can do simple transpose-like operations using pivot_wider

>>> t = ibis.memtable(dict(outcome=["yes", "no"], counted=[3, 4]))
>>> t
┏━━━━━━━━━┳━━━━━━━━━┓
┃ outcome  counted ┃
┡━━━━━━━━━╇━━━━━━━━━┩
│ stringint64   │
├─────────┼─────────┤
│ yes    3 │
│ no     4 │
└─────────┴─────────┘
>>> t.pivot_wider(names_from="outcome", values_from="counted", names_sort=True)
┏━━━━━━━┳━━━━━━━┓
┃ no     yes   ┃
┡━━━━━━━╇━━━━━━━┩
│ int64int64 │
├───────┼───────┤
│     43 │
└───────┴───────┘

Fill missing pivoted values using values_fill

>>> fish_encounters.pivot_wider(
...     names_from="station", values_from="seen", values_fill=0
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish   Rstr   BCW    BCW2   Lisbon  Base_TD  MAE    Release  I80_1  BCE    BCE2   MAW   ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64int64int64int64int64int64int64int64int64int64int64int64 │
├───────┼───────┼───────┼───────┼────────┼─────────┼───────┼─────────┼───────┼───────┼───────┼───────┤
│  484411111111111 │
│  484510011011000 │
│  484900000011000 │
│  485910011011000 │
│  486111111111111 │
│  484311111111111 │
│  484810010011000 │
│  486500010011000 │
│  485100000011000 │
│  485711111011110 │
│      │
└───────┴───────┴───────┴───────┴────────┴─────────┴───────┴─────────┴───────┴───────┴───────┴───────┘

Compute multiple values columns

>>> us_rent_income = ibis.examples.us_rent_income.fetch()
>>> us_rent_income
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ geoid   name        variable  estimate  moe   ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ stringstringstringint64int64 │
├────────┼────────────┼──────────┼──────────┼───────┤
│ 01    Alabama   income  24476136 │
│ 01    Alabama   rent    7473 │
│ 02    Alaska    income  32940508 │
│ 02    Alaska    rent    120013 │
│ 04    Arizona   income  27517148 │
│ 04    Arizona   rent    9724 │
│ 05    Arkansas  income  23789165 │
│ 05    Arkansas  rent    7095 │
│ 06    Californiaincome  29454109 │
│ 06    Californiarent    13583 │
│  │
└────────┴────────────┴──────────┴──────────┴───────┘
>>> us_rent_income.pivot_wider(
...     names_from="variable", values_from=["estimate", "moe"]
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid   name          estimate_income  moe_income  estimate_rent  moe_rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringint64int64int64int64    │
├────────┼──────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05    Arkansas    237891657095 │
│ 06    California  2945410913583 │
│ 13    Georgia     270241069273 │
│ 15    Hawaii      32453218150718 │
│ 16    Idaho       252982087927 │
│ 30    Montana     262492067519 │
│ 38    North Dakota323362457759 │
│ 39    Ohio        27435947642 │
│ 40    Oklahoma    262071017663 │
│ 47    Tennessee   254531028084 │
│  │
└────────┴──────────────┴─────────────────┴────────────┴───────────────┴──────────┘

The column name separator can be changed using the names_sep parameter

>>> us_rent_income.pivot_wider(
...     names_from="variable",
...     names_sep=".",
...     values_from=("estimate", "moe"),
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid   name          estimate.income  moe.income  estimate.rent  moe.rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringint64int64int64int64    │
├────────┼──────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05    Arkansas    237891657095 │
│ 06    California  2945410913583 │
│ 13    Georgia     270241069273 │
│ 15    Hawaii      32453218150718 │
│ 16    Idaho       252982087927 │
│ 30    Montana     262492067519 │
│ 38    North Dakota323362457759 │
│ 39    Ohio        27435947642 │
│ 40    Oklahoma    262071017663 │
│ 47    Tennessee   254531028084 │
│  │
└────────┴──────────────┴─────────────────┴────────────┴───────────────┴──────────┘

Supply an alternative function to summarize values

>>> warpbreaks = ibis.examples.warpbreaks.fetch().select("wool", "tension", "breaks")
>>> warpbreaks
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ wool    tension  breaks ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ stringstringint64  │
├────────┼─────────┼────────┤
│ A     L      26 │
│ A     L      30 │
│ A     L      54 │
│ A     L      25 │
│ A     L      70 │
│ A     L      52 │
│ A     L      51 │
│ A     L      26 │
│ A     L      67 │
│ A     M      18 │
│  │
└────────┴─────────┴────────┘
>>> warpbreaks.pivot_wider(
...     names_from="wool", values_from="breaks", values_agg="mean"
... ).select("tension", "A", "B").order_by("tension")
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ tension  A          B         ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringfloat64float64   │
├─────────┼───────────┼───────────┤
│ H      24.55555618.777778 │
│ L      44.55555628.222222 │
│ M      24.00000028.777778 │
└─────────┴───────────┴───────────┘

Passing Deferred objects to values_agg is supported

>>> warpbreaks.pivot_wider(
...     names_from="tension",
...     values_from="breaks",
...     values_agg=_.sum(),
... ).select("wool", "H", "L", "M").order_by(s.all())
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ wool    H      L      M     ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringint64int64int64 │
├────────┼───────┼───────┼───────┤
│ A     221401216 │
│ B     169254259 │
└────────┴───────┴───────┴───────┘

Use a custom aggregate function

>>> warpbreaks.pivot_wider(
...     names_from="wool",
...     values_from="breaks",
...     values_agg=lambda col: col.std() / col.mean(),
... ).select("tension", "A", "B").order_by("tension")
┏━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ tension  A         B        ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ stringfloat64float64  │
├─────────┼──────────┼──────────┤
│ H      0.4183440.260590 │
│ L      0.4061830.349325 │
│ M      0.3608440.327719 │
└─────────┴──────────┴──────────┘

Generate some random data, setting the random seed for reproducibility

>>> import random
>>> random.seed(0)
>>> raw = ibis.memtable(
...     [
...         dict(
...             product=product,
...             country=country,
...             year=year,
...             production=random.random(),
...         )
...         for product in "AB"
...         for country in ["AI", "EI"]
...         for year in range(2000, 2015)
...     ]
... )
>>> production = raw.filter(((_.product == "A") & (_.country == "AI")) | (_.product == "B"))
>>> production.order_by(s.all())
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ product  country  year   production ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ stringstringint64float64    │
├─────────┼─────────┼───────┼────────────┤
│ A      AI     20000.844422 │
│ A      AI     20010.757954 │
│ A      AI     20020.420572 │
│ A      AI     20030.258917 │
│ A      AI     20040.511275 │
│ A      AI     20050.404934 │
│ A      AI     20060.783799 │
│ A      AI     20070.303313 │
│ A      AI     20080.476597 │
│ A      AI     20090.583382 │
│  │
└─────────┴─────────┴───────┴────────────┘

Pivoting with multiple name columns

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     values_from="production",
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year   A_AI      B_AI      B_EI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64float64float64float64  │
├───────┼──────────┼──────────┼──────────┤
│  20040.5112750.5486990.967540 │
│  20060.7837990.7197050.447970 │
│  20070.3033130.3988240.080446 │
│  20080.4765970.8248450.320055 │
│  20110.5046870.4935780.109058 │
│  20010.7579540.8653100.191067 │
│  20030.2589170.8050280.238616 │
│  20090.5833820.6681530.507941 │
│  20120.2818380.8676030.551267 │
│  20000.8444220.4770100.870471 │
│      │
└───────┴──────────┴──────────┴──────────┘

Select a subset of names. This call incurs no computation when constructing the expression.

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     names=[("A", "AI"), ("B", "AI")],
...     values_from="production",
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year   A_AI      B_AI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64float64float64  │
├───────┼──────────┼──────────┤
│  20020.4205720.260492 │
│  20050.4049340.014042 │
│  20130.7558040.243911 │
│  20000.8444220.477010 │
│  20100.9081130.001143 │
│  20140.6183690.325204 │
│  20040.5112750.548699 │
│  20060.7837990.719705 │
│  20070.3033130.398824 │
│  20080.4765970.824845 │
│      │
└───────┴──────────┴──────────┘

Sort the new columns’ names

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     values_from="production",
...     names_sort=True,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year   A_AI      B_AI      B_EI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64float64float64float64  │
├───────┼──────────┼──────────┼──────────┤
│  20040.5112750.5486990.967540 │
│  20060.7837990.7197050.447970 │
│  20070.3033130.3988240.080446 │
│  20080.4765970.8248450.320055 │
│  20110.5046870.4935780.109058 │
│  20020.4205720.2604920.567511 │
│  20050.4049340.0140420.803179 │
│  20130.7558040.2439110.706561 │
│  20010.7579540.8653100.191067 │
│  20030.2589170.8050280.238616 │
│      │
└───────┴──────────┴──────────┴──────────┘

preview

preview(
    max_rows=None,
    max_columns=None,
    max_length=None,
    max_string=None,
    max_depth=None,
    console_width=None,
)

Return a subset as a Rich Table.

This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive.

Parameters

Name Type Description Default
max_rows int | None Maximum number of rows to display None
max_columns int | None Maximum number of columns to display None
max_length int | None Maximum length for pretty-printed arrays and maps None
max_string int | None Maximum length for pretty-printed strings None
max_depth int | None Maximum depth for nested data types None
console_width int | float | None Width of the console in characters. If not specified, the width will be inferred from the console. None

Examples

>>> import ibis
>>> t = ibis.examples.penguins.fetch()

Because the console_width is too small, only 2 columns are shown even though we specified up to 3.

>>> t.preview(
...     max_rows=3,
...     max_columns=3,
...     max_string=8,
...     console_width=30,
... )
┏━━━━━━━━━┳━━━━━━━━━━┳━━━┓
┃ species  island   ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━┩
│ stringstring │
├─────────┼──────────┼───┤
│ Adelie Torgers… │
│ Adelie Torgers… │
│ Adelie Torgers… │
│  │
└─────────┴──────────┴───┘

relabel

relabel(substitutions)

Deprecated in favor of Table.rename.

relocate

relocate(*columns, before=None, after=None, **kwargs)

Relocate columns before or after other specified columns.

Parameters

Name Type Description Default
columns str | s.Selector Columns to relocate. Selectors are accepted. ()
before str | s.Selector | None A column name or selector to insert the new columns before. None
after str | s.Selector | None A column name or selector. Columns in columns are relocated after the last column selected in after. None
kwargs str Additional column names to relocate, renaming argument values to keyword argument names. {}

Returns

Name Type Description
Table A table with the columns relocated.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> import ibis.selectors as s
>>> t = ibis.memtable(dict(a=[1], b=[1], c=[1], d=["a"], e=["a"], f=["a"]))
>>> t
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a      b      c      d       e       f      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64int64int64stringstringstring │
├───────┼───────┼───────┼────────┼────────┼────────┤
│     111a     a     a      │
└───────┴───────┴───────┴────────┴────────┴────────┘
>>> t.relocate("f")
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ f       a      b      c      d       e      ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ stringint64int64int64stringstring │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a     111a     a      │
└────────┴───────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a", after="c")
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ b      c      a      d       e       f      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64int64int64stringstringstring │
├───────┼───────┼───────┼────────┼────────┼────────┤
│     111a     a     a      │
└───────┴───────┴───────┴────────┴────────┴────────┘
>>> t.relocate("f", before="b")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a      f       b      c      d       e      ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64stringint64int64stringstring │
├───────┼────────┼───────┼───────┼────────┼────────┤
│     1a     11a     a      │
└───────┴────────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a", after=s.last())
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ b      c      d       e       f       a     ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64int64stringstringstringint64 │
├───────┼───────┼────────┼────────┼────────┼───────┤
│     11a     a     a     1 │
└───────┴───────┴────────┴────────┴────────┴───────┘

Relocate allows renaming

>>> t.relocate(ff="f")
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ ff      a      b      c      d       e      ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ stringint64int64int64stringstring │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a     111a     a      │
└────────┴───────┴───────┴───────┴────────┴────────┘

You can relocate based on any predicate selector, such as of_type

>>> t.relocate(s.of_type("string"))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d       e       f       a      b      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringstringint64int64int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a     a     a     111 │
└────────┴────────┴────────┴───────┴───────┴───────┘
>>> t.relocate(s.numeric(), after=s.last())
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d       e       f       a      b      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringstringint64int64int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a     a     a     111 │
└────────┴────────┴────────┴───────┴───────┴───────┘

When multiple columns are selected with before or after, those selected columns are moved before and after the selectors input

>>> t = ibis.memtable(dict(a=[1], b=["a"], c=[1], d=["a"]))
>>> t.relocate(s.numeric(), after=s.of_type("string"))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b       d       a      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringint64int64 │
├────────┼────────┼───────┼───────┤
│ a     a     11 │
└────────┴────────┴───────┴───────┘
>>> t.relocate(s.numeric(), before=s.of_type("string"))
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a      c      b       d      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64int64stringstring │
├───────┼───────┼────────┼────────┤
│     11a     a      │
└───────┴───────┴────────┴────────┘

When there are duplicate renames in a call to relocate, the last one is preserved

>>> t.relocate(e="d", f="d")
┏━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ f       a      b       c     ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringint64stringint64 │
├────────┼───────┼────────┼───────┤
│ a     1a     1 │
└────────┴───────┴────────┴───────┘

However, if there are duplicates that are not part of a rename, the order specified in the relocate call is preserved

>>> t.relocate(
...     "b",
...     s.of_type("string"),  # "b" is a string column, so the selector matches
... )
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b       d       a      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringint64int64 │
├────────┼────────┼───────┼───────┤
│ a     a     11 │
└────────┴────────┴───────┴───────┘

rename

rename(method=None, /, **substitutions)

Rename columns in the table.

Parameters

Name Type Description Default
method str | Callable[[str], str | None] | Literal['snake_case', 'ALL_CAPS'] | Mapping[str, str] | None An optional method for renaming columns. May be one of: - A format string to use to rename all columns, like "prefix_{name}". - A function from old name to new name. If the function returns None the old name is used. - The literal strings "snake_case" or "ALL_CAPS" to rename all columns using a snake_case or "ALL_CAPS"`` naming convention respectively. - A mapping from new name to old name. Existing columns not present in the mapping will passthrough with their original name. |None| | substitutions | [str](str) | Columns to be explicitly renamed, expressed asnew_name=old_name`keyword arguments. |`

Returns

Name Type Description
Table A renamed table expression

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> first3 = s.index[:3]  # first 3 columns
>>> t = ibis.examples.penguins_raw_raw.fetch().select(first3)
>>> t
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ studyName  Sample Number  Species                             ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708  1Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  2Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  3Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  4Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  5Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  6Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  7Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  8Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  9Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  10Adelie Penguin (Pygoscelis adeliae) │
│                                    │
└───────────┴───────────────┴─────────────────────────────────────┘

Rename specific columns by passing keyword arguments like `new_name=“old_name”``

>>> t.rename(study_name="studyName").head(1)
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name  Sample Number  Species                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708   1Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘

Rename all columns using a format string

>>> t.rename("p_{name}").head(1)
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ p_studyName  p_Sample Number  p_Species                           ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├─────────────┼─────────────────┼─────────────────────────────────────┤
│ PAL0708    1Adelie Penguin (Pygoscelis adeliae) │
└─────────────┴─────────────────┴─────────────────────────────────────┘

Rename all columns using a snake_case convention

>>> t.rename("snake_case").head(1)
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name  sample_number  species                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708   1Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘

Rename all columns using an ALL_CAPS convention

>>> t.rename("ALL_CAPS").head(1)
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDY_NAME  SAMPLE_NUMBER  SPECIES                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708   1Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘

Rename all columns using a callable

>>> t.rename(str.upper).head(1)
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDYNAME  SAMPLE NUMBER  SPECIES                             ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708  1Adelie Penguin (Pygoscelis adeliae) │
└───────────┴───────────────┴─────────────────────────────────────┘

rowid

rowid()

A unique integer per row.

This operation is only valid on physical tables

Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, SQLite and DuckDB’s rowid).

For a monotonically increasing row number, see ibis.row_number.

Returns

Name Type Description
IntegerColumn An integer column

sample

sample(fraction, *, method='row', seed=None)

Sample a fraction of rows from a table.

Results may be non-repeatable

Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.

In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.

Parameters

Name Type Description Default
fraction float The percentage of rows to include in the sample, expressed as a float between 0 and 1. required
method Literal['row', 'block'] The sampling method to use. The default is “row”, which includes each row with a probability of fraction. If method is “block”, some backends may instead sample a fraction of blocks of rows (where “block” is a backend dependent definition), which may be significantly more efficient (at the cost of a less statistically random sample). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause. 'row'
seed int | None An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases. None

Returns

Name Type Description
Table The input table, with fraction of rows selected.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x      y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1a      │
│     2b      │
│     3c      │
│     4d      │
└───────┴────────┘

Sample approximately half the rows, with a seed specified for reproducibility.

>>> t.sample(0.5, seed=1234)
┏━━━━━━━┳━━━━━━━━┓
┃ x      y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     2b      │
│     3c      │
└───────┴────────┘

schema

schema()

Return the Schema for this table.

Returns

Name Type Description
Schema The table’s schema.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
  species            string
  island             string
  bill_length_mm     float64
  bill_depth_mm      float64
  flipper_length_mm  int64
  body_mass_g        int64
  sex                string
  year               int64
}

select

select(*exprs, **named_exprs)

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters

Name Type Description Default
exprs ir.Value | str | Iterable[ir.Value | str] Column expression, string, or list of column expressions and strings. ()
named_exprs ir.Value | str Column expressions {}

Returns

Name Type Description
Table Table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Simple projection

>>> t.select("island", "bill_length_mm").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island     bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringfloat64        │
├───────────┼────────────────┤
│ Torgersen39.1 │
│ Torgersen39.5 │
│ Torgersen40.3 │
│ TorgersenNULL │
│ Torgersen36.7 │
└───────────┴────────────────┘

In that simple case, you could also just use python’s indexing syntax

>>> t[["island", "bill_length_mm"]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island     bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringfloat64        │
├───────────┼────────────────┤
│ Torgersen39.1 │
│ Torgersen39.5 │
│ Torgersen40.3 │
│ TorgersenNULL │
│ Torgersen36.7 │
└───────────┴────────────────┘

Projection by zero-indexed column position

>>> t.select(t[0], t[4]).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringint64             │
├─────────┼───────────────────┤
│ Adelie 181 │
│ Adelie 186 │
│ Adelie 195 │
│ Adelie NULL │
│ Adelie 193 │
└─────────┴───────────────────┘

Projection with renaming and compute in one call

>>> t.select(next_year=t.year + 1).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

You can do the same thing with a named expression, and using the deferred API

>>> from ibis import _
>>> t.select((_.year + 1).name("next_year")).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

Projection with aggregation expressions

>>> t.select("island", bill_mean=t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island     bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringfloat64   │
├───────────┼───────────┤
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
└───────────┴───────────┘

Projection with a selector

>>> import ibis.selectors as s
>>> t.select(s.numeric() & ~s.cols("year")).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64float64int64int64       │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│           39.118.71813750 │
│           39.517.41863800 │
│           40.318.01953250 │
│           NULLNULLNULLNULL │
│           36.719.31933450 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

Projection + aggregation across multiple columns

>>> from ibis import _
>>> t.select(s.across(s.numeric() & ~s.cols("year"), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64float64float64float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

sql

sql(query, dialect=None)

Run a SQL query against a table expression.

Parameters

Name Type Description Default
query str Query string required
dialect str | None Optional string indicating the dialect of query. Defaults to the backend’s native dialect. None

Returns

Name Type Description
Table An opaque table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island     avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├───────────┼─────────────────┤
│ Biscoe   45.257485 │
│ Dream    44.167742 │
│ Torgersen38.950980 │
└───────────┴─────────────────┘

Mix and match ibis expressions with SQL queries

>>> t = ibis.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr = expr.mutate(
...     island=_.island.lower(),
...     avg_bill_length=_.avg_bill_length.round(1),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island     avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├───────────┼─────────────────┤
│ biscoe   45.3 │
│ dream    44.2 │
│ torgersen39.0 │
└───────────┴─────────────────┘

Because ibis expressions aren’t named, they aren’t visible to subsequent .sql calls. Use the alias method to assign a name to an expression.

>>> expr.alias("b").sql("SELECT * FROM b WHERE avg_bill_length > 40")
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island  avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├────────┼─────────────────┤
│ biscoe45.3 │
│ dream 44.2 │
└────────┴─────────────────┘

See Also

Table.alias

to_array

to_array()

View a single column table as an array.

Returns

Name Type Description
Value A single column view of a table

to_csv

to_csv(path, *, params=None, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

See https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html for details.

Parameters

Name Type Description Default
path str | Path The data source. A string or Path to the CSV file. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
**kwargs Any Additional keyword arguments passed to pyarrow.csv.CSVWriter {}

to_delta

to_delta(path, *, params=None, **kwargs)

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
path str | Path The data source. A string or Path to the Delta Lake table directory. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
**kwargs Any Additional keyword arguments passed to deltalake.writer.write_deltalake method {}

to_pandas

to_pandas(**kwargs)

Convert a table expression to a pandas DataFrame.

Parameters

Name Type Description Default
kwargs Same as keyword arguments to execute {}

to_pandas_batches

to_pandas_batches(limit=None, params=None, chunk_size=1000000, **kwargs)

Execute expression and return an iterator of pandas DataFrames.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
params Mapping[ir.Value, Any] | None Mapping of scalar parameter expressions to value. None
chunk_size int Maximum number of rows in each returned DataFrame. 1000000
kwargs Any Keyword arguments {}

Returns

Name Type Description
Iterator[pd.DataFrame]

to_parquet

to_parquet(path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html for details.

Parameters

Name Type Description Default
path str | Path The data source. A string or Path to the parquet file. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
**kwargs Any Additional keyword arguments passed to pyarrow.parquet.ParquetWriter {}

Examples

Write out an expression to a single parquet file.

>>> import ibis
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins.to_parquet(tempfile.mktemp())

Partition on a single column.

>>> penguins.to_parquet(tempfile.mkdtemp(), partition_by="year")

Partition on multiple columns.

>>> penguins.to_parquet(tempfile.mkdtemp(), partition_by=("year", "island"))
Hive-partitioned output is currently only supported when using DuckDB

to_parquet_dir

to_parquet_dir(directory, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file in a directory.

This method is eager and will execute the associated expression immediately.

See https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html for details.

Parameters

Name Type Description Default
directory str | Path The data target. A string or Path to the directory where the parquet file will be written. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
**kwargs Any Additional keyword arguments passed to pyarrow.dataset.write_dataset {}

to_polars

to_polars(params=None, limit=None, **kwargs)

Execute expression and return results as a polars dataframe.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs Any Keyword arguments {}

Returns

Name Type Description
DataFrame A polars dataframe holding the results of the executed expression.

to_pyarrow

to_pyarrow(params=None, limit=None, **kwargs)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs Any Keyword arguments {}

Returns

Name Type Description
Table A pyarrow table holding the results of the executed expression.

to_pyarrow_batches

to_pyarrow_batches(limit=None, params=None, chunk_size=1000000, **kwargs)

Execute expression and return a RecordBatchReader.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
params Mapping[ir.Value, Any] | None Mapping of scalar parameter expressions to value. None
chunk_size int Maximum number of rows in each returned record batch. 1000000
kwargs Any Keyword arguments {}

Returns

Name Type Description
results RecordBatchReader

to_torch

to_torch(params=None, limit=None, **kwargs)

Execute an expression and return results as a dictionary of torch tensors.

Parameters

Name Type Description Default
params Mapping[ir.Scalar, Any] | None Parameters to substitute into the expression. None
limit int | str | None An integer to effect a specific row limit. A value of None means no limit. None
kwargs Any Keyword arguments passed into the backend’s to_torch implementation. {}

Returns

Name Type Description
dict[str, torch.Tensor] A dictionary of torch tensors, keyed by column name.

try_cast

try_cast(schema)

Cast the columns of a table.

If the cast fails for a row, the value is returned as NULL or NaN depending on backend behavior.

Parameters

Name Type Description Default
schema SchemaLike Mapping, schema or iterable of pairs to use for casting required

Returns

Name Type Description
Table Casted table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["1", "2", "3"], "b": ["2.2", "3.3", "book"]})
>>> t.try_cast({"a": "int", "b": "float"})
┏━━━━━━━┳━━━━━━━━━┓
┃ a      b       ┃
┡━━━━━━━╇━━━━━━━━━┩
│ int64float64 │
├───────┼─────────┤
│     12.2 │
│     23.3 │
│     3NULL │
└───────┴─────────┘

unbind

unbind()

Return an expression built on UnboundTable instead of backend-specific objects.

Examples

>>> import ibis
>>> import pandas as pd
>>> duckdb_con = ibis.duckdb.connect()
>>> polars_con = ibis.polars.connect()
>>> for backend in (duckdb_con, polars_con):
...     t = backend.create_table("t", pd.DataFrame({"a": [1, 2, 3]}))
>>> bound_table = duckdb_con.table("t")
>>> bound_table.get_backend().name
'duckdb'
>>> unbound_table = bound_table.unbind()
>>> polars_con.execute(unbound_table)
a
0 1
1 2
2 3

union

union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only return distinct rows False

Returns

Name Type Description
Table A new table containing the union of all input tables.

See Also

ibis.union

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.union(t2)  # union all by default
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     2 │
│     3 │
└───────┘
>>> t1.union(t2, distinct=True).order_by("a")
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

unnest

unnest(column, offset=None, keep_empty=False)

Unnest an array column from a table.

When unnesting an existing column the newly unnested column replaces the existing column.

Parameters

Name Type Description Default
column Array column to unnest. required
offset str | None Name of the resulting index column. None
keep_empty bool Keep empty array values as NULL in the output table, as well as existing NULL values. False

Returns

Name Type Description
Table Table with the array column column unnested.

See Also

ArrayValue.unnest

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True

Construct a table expression with an array column.

>>> t = ibis.memtable({"x": [[1, 2], [], None, [3, 4, 5]], "y": [1, 2, 3, 4]})
>>> t
┏━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ x               y     ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ array<int64>int64 │
├────────────────┼───────┤
│ [1, 2]1 │
│ []2 │
│ NULL3 │
│ [3, 4, ... +1]4 │
└────────────────┴───────┘

Unnest the array column x, replacing the existing x column.

>>> t.unnest("x")
┏━━━━━━━┳━━━━━━━┓
┃ x      y     ┃
┡━━━━━━━╇━━━━━━━┩
│ int64int64 │
├───────┼───────┤
│     11 │
│     21 │
│     34 │
│     44 │
│     54 │
└───────┴───────┘

Unnest the array column x with an offset. The offset parameter is the name of the resulting index column.

>>> t.unnest(t.x, offset="idx")
┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ x      y      idx   ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64int64int64 │
├───────┼───────┼───────┤
│     110 │
│     211 │
│     340 │
│     441 │
│     542 │
└───────┴───────┴───────┘

Unnest the array column x keep empty array values as NULL in the output table.

>>> t.unnest(_.x, offset="idx", keep_empty=True)
┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ x      y      idx   ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64int64int64 │
├───────┼───────┼───────┤
│     110 │
│     211 │
│     340 │
│     441 │
│     542 │
│  NULL2NULL │
│  NULL3NULL │
└───────┴───────┴───────┘

If you need to preserve the row order of the preserved empty arrays or null values use row_number to create an index column before calling unnest.

>>> (
...     t.mutate(original_row=ibis.row_number())
...     .unnest("x", offset="idx", keep_empty=True)
...     .relocate("original_row")
...     .order_by("original_row")
... )
┏━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ original_row  x      y      idx   ┃
┡━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64int64int64int64 │
├──────────────┼───────┼───────┼───────┤
│            0110 │
│            0211 │
│            1NULL2NULL │
│            2NULL3NULL │
│            3340 │
│            3441 │
│            3542 │
└──────────────┴───────┴───────┴───────┘

You can also unnest more complex expressions, and the resulting column will be projected as the last expression in the result.

>>> t.unnest(_.x.map(lambda v: v + 1).name("plus_one"))
┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ x               y      plus_one ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ array<int64>int64int64    │
├────────────────┼───────┼──────────┤
│ [1, 2]12 │
│ [1, 2]13 │
│ [3, 4, ... +1]44 │
│ [3, 4, ... +1]45 │
│ [3, 4, ... +1]46 │
└────────────────┴───────┴──────────┘

unpack

unpack(*columns)

Project the struct fields of each of columns into self.

Existing fields are retained in the projection.

Parameters

Name Type Description Default
columns str String column names to project into self. ()

Returns

Name Type Description
Table The child table with struct fields of each of columns projected.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''
...     {"name": "a", "pos": {"lat": 10.1, "lon": 30.3}}
...     {"name": "b", "pos": {"lat": 10.2, "lon": 30.2}}
...     {"name": "c", "pos": {"lat": 10.3, "lon": 30.1}}
... '''
>>> with open("/tmp/lines.json", "w") as f:
...     nbytes = f.write(lines)  # nbytes is unused
>>> t = ibis.read_json("/tmp/lines.json")
>>> t
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ name    pos                                ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringstruct<lat: float64, lon: float64> │
├────────┼────────────────────────────────────┤
│ a     {'lat': 10.1, 'lon': 30.3}         │
│ b     {'lat': 10.2, 'lon': 30.2}         │
│ c     {'lat': 10.3, 'lon': 30.1}         │
└────────┴────────────────────────────────────┘
>>> t.unpack("pos")
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name    lat      lon     ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ stringfloat64float64 │
├────────┼─────────┼─────────┤
│ a     10.130.3 │
│ b     10.230.2 │
│ c     10.330.1 │
└────────┴─────────┴─────────┘

See Also

StructValue.lift

value_counts

value_counts(name=None)

Compute a frequency table of this table’s values.

Parameters

Name Type Description Default
name str | None The name to use for the frequency column. A suitable name will be automatically generated if not provided. None

Returns

Name Type Description
Table Frequency table of this table’s values.

Examples

>>> from ibis import examples
>>> ibis.options.interactive = True
>>> t = examples.penguins.fetch()
>>> t.head()
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.year.value_counts(name="n").order_by("year")
┏━━━━━━━┳━━━━━━━┓
┃ year   n     ┃
┡━━━━━━━╇━━━━━━━┩
│ int64int64 │
├───────┼───────┤
│  2007110 │
│  2008114 │
│  2009120 │
└───────┴───────┘
>>> t[["year", "island"]].value_counts().order_by("year", "island")
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year   island     year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64             │
├───────┼───────────┼───────────────────┤
│  2007Biscoe   44 │
│  2007Dream    46 │
│  2007Torgersen20 │
│  2008Biscoe   64 │
│  2008Dream    34 │
│  2008Torgersen16 │
│  2009Biscoe   60 │
│  2009Dream    44 │
│  2009Torgersen16 │
└───────┴───────────┴───────────────────┘

view

view()

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns

Name Type Description
Table Table expression

visualize

visualize(
    format='svg',
    *,
    label_edges=False,
    verbose=False,
    node_attr=None,
    node_attr_getter=None,
    edge_attr=None,
    edge_attr_getter=None,
)

Visualize an expression as a GraphViz graph in the browser.

Parameters

Name Type Description Default
format str Image output format. These are specified by the graphviz Python library. 'svg'
label_edges bool Show operation input names as edge labels False
verbose bool Print the graphviz DOT code to stderr if True False
node_attr Mapping[str, str] | None Mapping of (attribute, value) pairs set for all nodes. Options are specified by the graphviz Python library. None
node_attr_getter NodeAttributeGetter | None Callback taking a node and returning a mapping of (attribute, value) pairs for that node. Options are specified by the graphviz Python library. None
edge_attr Mapping[str, str] | None Mapping of (attribute, value) pairs set for all edges. Options are specified by the graphviz Python library. None
edge_attr_getter EdgeAttributeGetter | None Callback taking two adjacent nodes and returning a mapping of (attribute, value) pairs for the edge between those nodes. Options are specified by the graphviz Python library. None

Examples

Open the visualization of an expression in default browser:

>>> import ibis
>>> import ibis.expr.operations as ops
>>> left = ibis.table(dict(a="int64", b="string"), name="left")
>>> right = ibis.table(dict(b="string", c="int64", d="string"), name="right")
>>> expr = left.inner_join(right, "b").select(left.a, b=right.c, c=right.d)
>>> expr.visualize(
...     format="svg",
...     label_edges=True,
...     node_attr={"fontname": "Roboto Mono", "fontsize": "10"},
...     node_attr_getter=lambda node: isinstance(node, ops.Field) and {"shape": "oval"},
...     edge_attr={"fontsize": "8"},
...     edge_attr_getter=lambda u, v: isinstance(u, ops.Field) and {"color": "red"},
... )  # quartodoc: +SKIP

Raises

Name Type Description
ImportError If graphviz is not installed.

GroupedTable

GroupedTable(self, **kwargs)

An intermediate table expression to hold grouping information.

Methods

Name Description
aggregate Compute aggregates over a group by.
count Computing the number of rows per group.
having Add a post-aggregation result filter expr.
mutate Return a table projection with window functions applied.
order_by Sort a grouped table expression by expr.
over Apply a window over the input expressions.
select Project new columns out of the grouped table.

aggregate

aggregate(*metrics, **kwds)

Compute aggregates over a group by.

count

count()

Computing the number of rows per group.

Returns

Name Type Description
Table The aggregated table

having

having(*predicates)

Add a post-aggregation result filter expr.

Expressions like x is None return bool and will not generate a SQL comparison to NULL

Parameters

Name Type Description Default
predicates ir.BooleanScalar Expressions that filters based on an aggregate value. ()

Returns

Name Type Description
GroupedTable A grouped table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {"grouper": ["a", "a", "a", "b", "b", "c"], "values": [1, 2, 3, 1, 2, 1]}
... )
>>> expr = (
...     t.group_by(t.grouper)
...     .having(t.count() < 3)
...     .aggregate(values_count=t.count(), values_sum=t.values.sum())
...     .order_by(t.grouper)
... )
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ grouper  values_count  values_sum ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ stringint64int64      │
├─────────┼──────────────┼────────────┤
│ b      23 │
│ c      11 │
└─────────┴──────────────┴────────────┘

mutate

mutate(*exprs, **kwexprs)

Return a table projection with window functions applied.

Any arguments can be functions.

Parameters

Name Type Description Default
exprs ir.Value | Sequence[ir.Value] List of expressions ()
kwexprs ir.Value Expressions {}

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> (
...     t.select("species", "bill_length_mm")
...     .group_by("species")
...     .mutate(centered_bill_len=ibis._.bill_length_mm - ibis._.bill_length_mm.mean())
...     .order_by(s.all())
... )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  bill_length_mm  centered_bill_len ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringfloat64float64           │
├─────────┼────────────────┼───────────────────┤
│ Adelie 32.1-6.691391 │
│ Adelie 33.1-5.691391 │
│ Adelie 33.5-5.291391 │
│ Adelie 34.0-4.791391 │
│ Adelie 34.1-4.691391 │
│ Adelie 34.4-4.391391 │
│ Adelie 34.5-4.291391 │
│ Adelie 34.6-4.191391 │
│ Adelie 34.6-4.191391 │
│ Adelie 35.0-3.791391 │
│  │
└─────────┴────────────────┴───────────────────┘

Returns

Name Type Description
Table A table expression with window functions applied

order_by

order_by(*by)

Sort a grouped table expression by expr.

Notes

This API call is ignored in aggregations.

Parameters

Name Type Description Default
by ir.Value Expressions to order the results by ()

Returns

Name Type Description
GroupedTable A sorted grouped GroupedTable

over

over(window=None, *, rows=None, range=None, group_by=None, order_by=None)

Apply a window over the input expressions.

Parameters

Name Type Description Default
window Window to add to the input None
rows Whether to use the ROWS window clause None
range Whether to use the RANGE window clause None
group_by Grouping key None
order_by Ordering key None

Returns

Name Type Description
GroupedTable A new grouped table expression

select

select(*exprs, **kwexprs)

Project new columns out of the grouped table.

See Also

GroupedTable.mutate

read_csv

ibis.read_csv(sources, table_name=None, **kwargs)

Lazily load a CSV or set of CSVs.

This function delegates to the read_csv method on the current default backend (DuckDB or ibis.config.default_backend).

Parameters

Name Type Description Default
sources str | Path | Sequence[str | Path] A filesystem path or URL or list of same. Supports CSV and TSV files. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. For the DuckDB backend used by default, please refer to: * CSV/TSV: https://duckdb.org/docs/data/csv/overview.html#parameters. {}

Returns

Name Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''a,b
... 1,d
... 2,
... ,f
... '''
>>> with open("/tmp/lines.csv", mode="w") as f:
...     nbytes = f.write(lines)  # nbytes is unused
>>> t = ibis.read_csv("/tmp/lines.csv")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1d      │
│     2NULL   │
│  NULLf      │
└───────┴────────┘

read_delta

ibis.read_delta(source, table_name=None, **kwargs)

Lazily load a Delta Lake table.

Parameters

Name Type Description Default
source str | Path A filesystem path or URL. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. {}

Returns

Name Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> import pandas as pd
>>> ibis.options.interactive = True
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": list("ghi")})
>>> df
a b
0 1 g
1 2 h
2 3 i
>>> import deltalake as dl
>>> dl.write_deltalake("/tmp/data.delta", df, mode="overwrite")
>>> t = ibis.read_delta("/tmp/data.delta")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1g      │
│     2h      │
│     3i      │
└───────┴────────┘

read_json

ibis.read_json(sources, table_name=None, **kwargs)

Lazily load newline-delimited JSON data.

This API is experimental and subject to change.

This function delegates to the read_json method on the current default backend (DuckDB or ibis.config.default_backend).

Parameters

Name Type Description Default
sources str | Path | Sequence[str | Path] A filesystem path or URL or list of same. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. See https://duckdb.org/docs/extensions/json.html for details. {}

Returns

Name Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''
... {"a": 1, "b": "d"}
... {"a": 2, "b": null}
... {"a": null, "b": "f"}
... '''
>>> with open("/tmp/lines.json", mode="w") as f:
...     nbytes = f.write(lines)  # nbytes is unused
>>> t = ibis.read_json("/tmp/lines.json")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1d      │
│     2NULL   │
│  NULLf      │
└───────┴────────┘

read_parquet

ibis.read_parquet(sources, table_name=None, **kwargs)

Lazily load a parquet file or set of parquet files.

This function delegates to the read_parquet method on the current default backend (DuckDB or ibis.config.default_backend).

Parameters

Name Type Description Default
sources str | Path | Sequence[str | Path] A filesystem path or URL or list of same. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. For the DuckDB backend used by default, please refer to: * Parquet: https://duckdb.org/docs/data/parquet {}

Returns

Name Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> import pandas as pd
>>> ibis.options.interactive = True
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": list("ghi")})
>>> df
a b
0 1 g
1 2 h
2 3 i
>>> df.to_parquet("/tmp/data.parquet")
>>> t = ibis.read_parquet("/tmp/data.parquet")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1g      │
│     2h      │
│     3i      │
└───────┴────────┘

memtable

ibis.memtable(data, *, columns=None, schema=None, name=None)

Construct an ibis table expression from in-memory data.

Parameters

Name Type Description Default
data A table-like object (pandas.DataFrame, pyarrow.Table, or polars.DataFrame), or any data accepted by the pandas.DataFrame constructor (e.g. a list of dicts). Note that ibis objects (e.g. MapValue) may not be passed in as part of data and will result in an error. Do not depend on the underlying storage type (e.g., pyarrow.Table), it’s subject to change across non-major releases. required
columns Iterable[str] | None Optional typing.Iterable of str column names. If provided, must match the number of columns in data. None
schema SchemaLike | None Optional Schema. The functions use data to infer a schema if not passed. None
name str | None Optional name of the table. None

Returns

Name Type Description
Table A table expression backed by in-memory data.

Examples

>>> import ibis
>>> t = ibis.memtable([{"a": 1}, {"a": 2}])
>>> t
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t = ibis.memtable([{"a": 1, "b": "foo"}, {"a": 2, "b": "baz"}])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1foo    │
│     2baz    │
└───────┴────────┘

Create a table literal without column names embedded in the data and pass columns

>>> t = ibis.memtable([(1, "foo"), (2, "baz")], columns=["a", "b"])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1foo    │
│     2baz    │
└───────┴────────┘

Create a table literal without column names embedded in the data. Ibis generates column names if none are provided.

>>> t = ibis.memtable([(1, "foo"), (2, "baz")])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ col0   col1   ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1foo    │
│     2baz    │
└───────┴────────┘

table

ibis.table(schema=None, name=None, catalog=None, database=None)

Create a table literal or an abstract table without data.

Ibis uses the word database to refer to a collection of tables, and the word catalog to refer to a collection of databases. You can use a combination of catalog and database to specify a hierarchical location for table.

Parameters

Name Type Description Default
schema SchemaLike | None A schema for the table None
name str | None Name for the table. One is generated if this value is None. None
catalog str | None A collection of database. None
database str | None A collection of tables. Required if catalog is not None. None

Returns

Name Type Description
Table A table expression

Examples

Create a table with no data backing it

>>> import ibis
>>> ibis.options.interactive = False
>>> t = ibis.table(schema=dict(a="int", b="string"), name="t")
>>> t
UnboundTable: t
  a int64
  b string

Create a table with no data backing it in a specific location

>>> import ibis
>>> ibis.options.interactive = False
>>> t = ibis.table(schema=dict(a="int"), name="t", catalog="cat", database="db")
>>> t
UnboundTable: cat.db.t
  a int64

difference

ibis.difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table ir.Table A table expression required
*rest ir.Table Additional table expressions ()
distinct bool Only diff distinct rows not occurring in the calling table True

Returns

Name Type Description
Table The rows present in self that are not present in tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> ibis.difference(t1, t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
└───────┘

intersect

ibis.intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table ir.Table A table expression required
*rest ir.Table Additional table expressions ()
distinct bool Only return distinct rows True

Returns

Name Type Description
Table A new table containing the intersection of all input tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> ibis.intersect(t1, t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
└───────┘

union

ibis.union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table ir.Table A table expression required
*rest ir.Table Additional table expressions ()
distinct bool Only return distinct rows False

Returns

Name Type Description
Table A new table containing the union of all input tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> ibis.union(t1, t2)  # union all by default
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     2 │
│     3 │
└───────┘
>>> ibis.union(t1, t2, distinct=True).order_by("a")
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

join

ibis.join(
    left,
    right,
    predicates=(),
    how='inner',
    *,
    lname='',
    rname='{name}_right',
)

Perform a join between two tables.

Parameters

Name Type Description Default
left Table Left table to join required
right Table Right table to join required
predicates str | Sequence[str | ir.BooleanColumn | Literal[True] | Literal[False] | tuple[str | ir.Column | ir.Deferred, str | ir.Column | ir.Deferred]] Condition(s) to join on. See examples for details. ()
how JoinKind Join method, e.g. "inner" or "left". 'inner'
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId  title                               genres                                      ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64stringstring                                      │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│       1Toy Story (1995)                  Adventure|Animation|Children|Comedy|Fantasy │
│       2Jumanji (1995)                    Adventure|Children|Fantasy                  │
│       3Grumpier Old Men (1995)           Comedy|Romance                              │
│       4Waiting to Exhale (1995)          Comedy|Drama|Romance                        │
│       5Father of the Bride Part II (1995)Comedy                                      │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = ibis.examples.ml_latest_small_ratings.fetch().drop("timestamp")
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64int64float64 │
├────────┼─────────┼─────────┤
│      114.0 │
│      134.0 │
│      164.0 │
│      1475.0 │
│      1505.0 │
└────────┴─────────┴─────────┘

Equality left join on the shared movieId column. Note the _right suffix added to all overlapping columns from the right table (in this case only the “movieId” column).

>>> ratings.join(movies, "movieId", how="left").head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   movieId_right  title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64int64stringstring                                      │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.01Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.03Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.06Heat (1995)                Action|Crime|Thriller                       │
│      1475.047Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.050Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘

Explicit equality join using the default how value of "inner". Note how there is no _right suffix added to the movieId column since this is an inner join and the movieId column is part of the join condition.

>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64stringstring                                      │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.0Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.0Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.0Heat (1995)                Action|Crime|Thriller                       │
│      1475.0Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.0Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = ibis.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId  movieId  tag              timestamp  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64int64stringint64      │
├────────┼─────────┼─────────────────┼────────────┤
│      260756funny          1445714994 │
│      260756Highly quotable1445714996 │
│      260756will ferrell   1445714992 │
│      289774Boxing story   1445715207 │
│      289774MMA            1445715200 │
└────────┴─────────┴─────────────────┴────────────┘

You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:

>>> tags.join(ratings, ["userId", "movieId"]).head(5).order_by("userId")
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  tag             timestamp   rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64int64stringint64float64 │
├────────┼─────────┼────────────────┼────────────┼─────────┤
│     622Robin Williams15288439074.0 │
│     62110sword fight   15281525354.5 │
│     62410gothic        15256366094.5 │
│     622023mafia         15256367335.0 │
│     622124quirky        15256368465.0 │
└────────┴─────────┴────────────────┴────────────┴─────────┘

To self-join a table with itself, you need to call .view() on one of the arguments so the two tables are distinct from each other.

For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key}), where each key can be

  • a Column
  • Deferred expression
  • lambda of the form (Table) -> Column

For example, to find all movies pairings that received the same (ignoring case) tags:

>>> movie_tags = tags["movieId", "tag"]
>>> view = movie_tags.view()
>>> movie_tags.join(
...     view,
...     [
...         movie_tags.movieId != view.movieId,
...         (_.tag.lower(), lambda t: t.tag.lower()),
...     ],
... ).head().order_by(("movieId", "movieId_right"))
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId  tag                movieId_right  tag_right         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64string            │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│   60756funny            1732funny             │
│   60756Highly quotable  1732Highly quotable   │
│   89774Tom Hardy        139385tom hardy         │
│  106782drugs            1732drugs             │
│  106782Leonardo DiCaprio5989Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘

row_number

ibis.row_number()

Return an analytic function expression for the current row number.

Note

row_number is normalized across backends to start at 0

Returns

Name Type Description
IntegerColumn A column expression enumerating rows

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rownum=ibis.row_number())
┏━━━━━━━━┳━━━━━━━━┓
┃ values  rownum ┃
┡━━━━━━━━╇━━━━━━━━┩
│ int64int64  │
├────────┼────────┤
│      10 │
│      21 │
│      12 │
│      23 │
│      34 │
│      25 │
└────────┴────────┘

rank

ibis.rank()

Compute position of first element within each equal-value group in sorted order.

Equivalent to SQL’s RANK() window function.

Returns

Name Type Description
Int64Column The min rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=ibis.rank().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values  rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      22 │
│      22 │
│      22 │
│      35 │
└────────┴───────┘

dense_rank

ibis.dense_rank()

Position of first element within each group of equal values.

Values are returned in sorted order and duplicate values are ignored.

Equivalent to SQL’s DENSE_RANK().

Returns

Name Type Description
IntegerColumn The rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=ibis.dense_rank().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values  rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      21 │
│      21 │
│      21 │
│      32 │
└────────┴───────┘

percent_rank

ibis.percent_rank()

Return the relative rank of the values in the column.

Returns

Name Type Description
FloatingColumn The percent rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(pct_rank=ibis.percent_rank().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━━━━┓
┃ values  pct_rank ┃
┡━━━━━━━━╇━━━━━━━━━━┩
│ int64float64  │
├────────┼──────────┤
│      10.0 │
│      10.0 │
│      20.4 │
│      20.4 │
│      20.4 │
│      31.0 │
└────────┴──────────┘

cume_dist

ibis.cume_dist()

Return the cumulative distribution over a window.

Returns

Name Type Description
FloatingColumn The cumulative distribution

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(dist=ibis.cume_dist().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━━━━┓
┃ values  dist     ┃
┡━━━━━━━━╇━━━━━━━━━━┩
│ int64float64  │
├────────┼──────────┤
│      10.333333 │
│      10.333333 │
│      20.833333 │
│      20.833333 │
│      20.833333 │
│      31.000000 │
└────────┴──────────┘

ntile

ibis.ntile(buckets)

Return the integer number of a partitioning of the column values.

Parameters

Name Type Description Default
buckets int | ir.IntegerValue Number of buckets to partition into required

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(ntile=ibis.ntile(2).over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values  ntile ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      20 │
│      21 │
│      21 │
│      31 │
└────────┴───────┘

window

ibis.window(
    preceding=None,
    following=None,
    order_by=None,
    group_by=None,
    *,
    rows=None,
    range=None,
    between=None,
)

Create a window clause for use with window functions.

The ROWS window clause includes peer rows based on differences in row number whereas RANGE includes rows based on the differences in row value of a single order_by expression.

All window frame bounds are inclusive.

Parameters

Name Type Description Default
preceding Number of preceding rows in the window None
following Number of following rows in the window None
group_by Grouping key None
order_by Ordering key None
rows Whether to use the ROWS window clause None
range Whether to use the RANGE window clause None
between Automatically infer the window kind based on the boundaries None

Returns

Name Type Description
Window A window frame

cumulative_window

ibis.cumulative_window(group_by=None, order_by=None)

Create a cumulative window for use with window functions.

All window frames / ranges are inclusive.

Parameters

Name Type Description Default
group_by Grouping key None
order_by Ordering key None

Returns

Name Type Description
Window A window frame

range_window

ibis.range_window(preceding=None, following=None, group_by=None, order_by=None)

Create a range-based window clause for use with window functions.

This RANGE window clause aggregates rows based upon differences in the value of the order-by expression.

All window frames / ranges are inclusive.

Parameters

Name Type Description Default
preceding Number of preceding rows in the window None
following Number of following rows in the window None
group_by Grouping key None
order_by Ordering key None

Returns

Name Type Description
Window A window frame

trailing_range_window

ibis.trailing_range_window(preceding, order_by, group_by=None)

Create a trailing range window for use with window functions.

Parameters

Name Type Description Default
preceding A value expression required
order_by Ordering key required
group_by Grouping key None

Returns

Name Type Description
Window A window frame

trailing_window

ibis.trailing_window(preceding, group_by=None, order_by=None)

Create a trailing window for use with window functions.

Parameters

Name Type Description Default
preceding The number of preceding rows required
group_by Grouping key None
order_by Ordering key None

Returns

Name Type Description
Window A window frame
Back to top