La esquina del cibernauta: julio 2010

iRevolution

Enviado por luishernando a través de Google Reader:

Demystifying Crowdsourcing: An Introduction to Non-Probability Sampling

vía iRevolution de Patrick Meier el 27/06/10

The use of crowdsourcing may be relatively new to the technology, business and humanitarian sectors but when it comes to statistics, crowdsourcing is a well known and established sampling method. Crowdsourcing is just non-probability sampling. The crowdsourcing of crisis information is simply an application of non-probability sampling.

Lets first review probability sampling in which every unit in the population being sampled has a known probability (greater than zero) of being selected. This approach makes it possible to "produce unbiased estimates of population totals, by weighting sampled units according to their probability selection."

Non-probability sampling, on the other hand, describes an approach in which some units of the population have no chance of being selected or where the probability of selection cannot be accurately determined. An example is convenience sampling. The main drawback of non-probability sampling techniques is that "information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population."

There are several advantages, however. First, non-probability sampling is a quick way to collect way to collect and analyze data in range of settings with diverse populations. The approach is also a "cost-efficient means of greatly increasing the sample, thus enabling more frequent measurement." In some cases, the non-probability sampling may actually be the only approach available—a common constrain in a lot of research, including many medical studies, not to mention Ushahidi Haiti. The method is also used in exploratory research, e.g., for hypothesis generation, especially when attempting to determine whether a problem exists or not.

The point is that non-probability sampling can save lives, many lives. Much of the data used for medical research is the product of convenience sampling. When you see your doctor, or you're hospitalized, that is not a representative sample. Should the medical field throw away all this data based on the fact that it constitutes non-probability sampling. Of course not, that would be ludicrous.

The notion of bounded crowdsourcing, which I blogged about here, is also a known sampling technique called purposive sampling. This approach involves targeting experts or key informants. Snowball sampling is another type of non-probability sampling, which may also be applied to crowdsource of crisis information.

In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find.

A project like Mission 4636 and Ushahidi-Haiti could take advantage of this approach by using two-way SMS communication to ask respondents to spread the word. Individuals who sent in text messages about persons trapped under the rubble could (later) be sent an SMS asking them to share the 4636 short code with people who may know of other trapped individuals. When the humanitarian response began to scale during the search and rescue operations, purposive sampling using UN personnel could also have been implemented.

In contrast to non-probability sampling techniques, probability sampling often requires considerable time and extensive resources. Furthermore, non-response effects can easily turn any probability design into non-probability sampling if the "characteristics of non-response are not well understood" since these modify each unit's probability of being sampled.

This is not to suggest that one approach is better than the other since this depends entirely on the context and research question.

Patrick Philippe Meier

Cosas que puedes hacer desde aquí:

Subscribirte a iRevolution con Google Reader
Empezar a utilizar Google Reader para mantenerte al día fácilmente de todos tus sitios favoritos

Caparazon

Enviado por luishernando a través de Google Reader:

Vídeo con subtítulos en español: Metaweb y otros sobre la web semántica

vía El caparazon de Dolors Reig el 26/07/10

Ha sido una de las noticias clave de la semana: Google (sí, la misma empresa que hace unos meses decía literalmente que la web semántica no tenía nada que ver con ella, que era una utopía irrealizable y que ya decíamos que era previsible que cambiase de opinión ) compraba Metaweb, la empresa responsable de Freebase, que ya hacía más inteligente Google News y que ahora parece que va a incorporar sus "entidades", ontologías y base de datos a toda la web.

Microsoft se adelantaba hace un tiempo, cuando adquiría Powerset . Ya entonces decíamos que había un único camino (decíamos que era el inicio "real" de la web semántica), el que ahora parece aclararse con la unión explícita y reconocida de la empresa más grande de la red comprando una de las más importantes en el ámbito de la semweb.

Son muchos los análisis posibles, además de que creo que vuelve a ser actualidad absoluta, vuelve a situarse en el centro de la web 3.0 la semantización de los contenidos web para ofrecer resultados más orgnaizados, inteligentes y útiles al usuario final.

Os dejo, además del vídeo de Metaweb que viene circulando en inglés en muchos otros lugares con subtítulos en castellano (gracias a Verónica, de Factor Humano, que se tomaba el trabajo de añadirlos hace un par de días), el material correspondiente al módulo sobre web semántica que terminaba hace unos días para la Escuela de Verano Espiral, similar al que impartiré en breve para la Universidad de Panamá. Hay un apartado especialmente dedicado a resolver mitos y realidades sobre el tema de la web semántica – web 3.0 en el que podréis contrastar la actual posición de Google con la que mantenía hace un tiempo.

Recomiendo, finalmente, para una inmersión casi completa este importante concepto, el vídeo que subtitulé personalmente hace unos años ya.

Que siga la evolución: