Ok. Digital Humanities. Everybody is talking about it, so am I right now. But this talking talking is not Digital Humanities (DH). The discussion about DH is not DH. The definition of DH is not DH. This post is not DH. Yep, it doesn’t matter it is on the tweeter, facebook or blogs. The problem is that everybody is giving an opinion, and opinion over opinions, and then I don’t really know what is people talking about.

So, what is this DH about? Does it makes sense? It seems is something related with humanities and digital. Humanities includes a lot of other areas and the term digital is already corrupted. Originally the term “digital” is related with discrete values in contrast with “analog” (continues values).  A discrete system is for example the binary. Other discrete system is the alphabet, which is used to write novels, which are studied by the area of literature that belongs to humanities. Therefore wasn’t the literature already digital?

I guess they mean something else with digital here. It seems to be more related with either Internet or computing. In the first case, Internet, I agree that there are new forms of expressions (Orsai or Balada/Track) that deserve being studied separately because they include new elements to the “traditional” literature. Before the Internet, those elements didn’t exist. Other examples are studying the behaviour of people in social networks or analysing the work of graphic designers. I agree that they are new forms of expressions that need theories that have to be developed. But not all out there is new. Most of the change that we are seeing nowadays is because of there is much more information that could be studied. From independent movies or books to interactions between individuals.

This take me to the second case, computing. Specially processing lots of information. Here, the possibilities are huge. Not just to study the new forms of expressions but also the old ones. For example, processing a whole collection of the Spanish golden age period. However, things here are also not necessarily that new. For example, computational linguistics has been there for a while. Note the use of more specific name. It requires a big deal of linguistics, statistics and computer sciences. And yep, it is claimed to be part of DH.

There are clearly two big areas here. And, each one, contains also big areas.  A criticism of digital graphic design should be way different of that girl who is dealing with understanding Orsai. What about computer linguistics and artificial societies (simulation of social interaction)? Is digital humanities all of this? Seems a bit too much. I think it would make more sense to start (1) listing all the new forms of expression and (2) study which new tools are being applied to the already existing areas of humanities. DH is just an umbrella for all this but I cannot think in a curriculum for all of that…

Anyway, this is just me increasing the bubble of Digital Humanities… talking talking…

The last week I was in the Latinamerican Conference in Informatics. Three to five were running at the same time. It was an overwhelming schedule that serve more as a restaurant menu. Papers from history of computer science to  biological computing. However, the best part of the event was listening to John E. Hopcroft  in his tutorial “Mathematics for the Information Era” and his conference “New directions in computer science”.

Both of them were related with the constantly increasing and, since long time ago already, intractable amount of information available to us. We generate an amount of information equivalent to a couple of hundreds newspapers per day per person. Of course that include junk and data generated by computers themselves. Still, information should be useful to us, unless we are just trying to create the biggest museum of the world (called Internet) receiving visits just to take pictures.

But there is nothing new in what I just said. What is new is the summary that Hopcroft manage to put together. He gave sort of the set of the “new” mathematical tools that we had to attack the problem of Big Information. Here is the list that I can extract from my notebook and a very brief explanation (forgive if I am not precise as him or if I misunderstood any of the concepts):

High dimensionality: things in many dimensions doesn’t behave as things in 1,2, 3 or even 4 dimensions. There is a new set of theories that are being developing in this area increasing our understanding of how complex problems (with many dimensions, such as texts) may work. Some of the next topics are related with high dimensionality. This is more like a new whole area of study.

Volumes: in many dimensions volumes does not behave as we expect. For example, as dimensions tent to infinite, the volume of the sphere (or hypersphere) goes to zero. For the Gaussian surface things are even more unexpected. It looks like a ring. For that reason, its consequences are totally nuts. Let say you have two Gaussians  that generated points in a high dimensional space. Then you can deduce which of the Gaussian generate each point (with extremely good accuracy).

Dimension reduction: if you have some vectors in a (very) high dimensional space, you can decrease the numbers of dimensions and still distinguish between them (very different when you reduce from 3 dimension to 2 dimiensions).

SVM: One of the breakthrough in machine learning (and Artificial Intelligence, unfortunately he didn’t have more time to continue on this topic). This could be seen opposed as the previous one. This is, the possibility of increase the number of dimensions of the vectors so it is to classify (learn) quicker the solutions of problems. For example, try to draw a straight line that separates the “x” from “+”.

x + x x  ++++  x x + x

Now, the mathematical magic come and increase the dimensions. Now, try again!

+         ++++         +

x     x x              x x     x

Of course, it is not as simply at that, there is a lot hidden in the mathematical magic part. It is important to notice that computers are better learning single lines than curves.

Sparse Vectors: in nature, if you have a huge vector representing, say a genome, most of the values (gens) are 0. Even more, the sparse solution is unique. This is why geneticists are able to crack out ADN. This also reminds me that in nature, interactions of 3 factors (or more) are rare. This is great for statistics in which we look for low interactions or no interactions (independent variables).

Probability and statistics: no need to explain what is this. But most of the big advances in the areas that are producing big information is going to require more than means and standard deviations. It took almost a year to generate enough collitions and, just then, be able to confirm the Gibbs Bosson with 5 sigmas, a statisc.

On the infinite and beyond:  well, basically calculus. A good understanding of certain concepts such as limits, derivatives, singularities.

Others: Markov chains (random walks), generative models for producing graphs (richer gets richer concept), giant components, ranking and voting (and its problems), boosting, zero knowledge proof, sampling (accepting you cannot store or process all the information, how can you sample to get the right answer?)

In general the good news is that if you have huge information and you lose some, you are still able to find all the answers because the structure and properties doesn’t change. This reminds me “The library of Babel” of Jorge Luis Borges.

Hopcroft points out seven problems in the new directions of computer science:

  1. Track ideas in scientific literature (a machine who tells you the key articles of a particular topic)
  2. Evolution of communities in social networks
  3. Extract information from unstructured data sources}
  4. Processing massive data sets and streams
  5. Detecting signals from noise
  6. Dealing with high dimensional data
  7. Much more application oriented…

It sounds a bit like Cultureplex, doesn’t it?

“The information age is a fundamental revolution that is changing all aspects of our lives. Those individuals, institutions and nations who recognize this change and position themselves for the future will benefit enormously” John E. Hopcroft.