GeekMusfir: April 2011

Wednesday, April 27, 2011

Reinforcement learning

Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics--trial-and-error search and delayed reward--are the two most important distinguishing features of reinforcement learning.

Reinforcement learning is defined not by characterizing learning methods, but by characterizing a learning problem. Any method that is well suited to solving that problem, we consider to be a reinforcement learning method. The basic idea is simply to capture the most important aspects of the real problem facing a learning agent interacting with its environment to achieve a goal. Clearly, such an agent must be able to sense the state of the environment to some extent and must be able to take actions that affect the state. The agent also must have a goal or goals relating to the state of the environment. The formulation is intended to include just these three aspects--sensation, action, and goal--in their simplest possible forms without trivializing any of them.

Reinforcement learning is different from supervised learning, the kind of learning studied in most current research in machine learning, statistical pattern recognition, and artificial neural networks. Supervised learning is learning from examples provided by a knowledgable external supervisor. This is an important kind of learning, but alone it is not adequate for learning from interaction. In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. In uncharted territory--where one would expect learning to be most beneficial--an agent must be able to learn from its own experience.

One of the challenges that arise in reinforcement learning and not in other kinds of learning is the trade-off between exploration and exploitation. To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it already knows in order to obtain reward, but it also has to explore in order to make better action selections in the future. The dilemma is that neither exploration nor exploitation can be pursued exclusively without failing at the task. The agent must try a variety of actions and progressively favor those that appear to be best. On a stochastic task, each action must be tried many times to gain a reliable estimate its expected reward. The exploration-exploitation dilemma has been intensively studied by mathematicians for many decades. The entire issue of balancing exploration and exploitation does not even arise in supervised learning as it is usually defined.

Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This is in contrast with many approaches that consider subproblems without addressing how they might fit into a larger picture. For example, we have mentioned that much of machine learning research is concerned with supervised learning without explicitly specifying how such an ability would finally be useful. Other researchers have developed theories of planning with general goals, but without considering planning's role in real-time decision-making, or the question of where the predictive models necessary for planning would come from. Although these approaches have yielded many useful results, their focus on isolated subproblems is a significant limitation.

Reinforcement learning takes the opposite tack, starting with a complete, interactive, goal-seeking agent. All reinforcement learning agents have explicit goals, can sense aspects of their environments, and can choose actions to influence their environments. Moreover, it is usually assumed from the beginning that the agent has to operate despite significant uncertainty about the environment it faces. When reinforcement learning involves planning, it has to address the interplay between planning and real-time action selection, as well as the question of how environmental models are acquired and improved. When reinforcement learning involves supervised learning, it does so for specific reasons that determine which capabilities are critical and which are not. For learning research to make progress, important subproblems have to be isolated and studied, but they should be subproblems that play clear roles in complete, interactive, goal-seeking agents, even if all the details of the complete agent cannot yet be filled in.

One of the larger trends of which reinforcement learning is a part is that toward greater contact between artificial intelligence and other engineering disciplines. Not all that long ago, artificial intelligence was viewed as almost entirely separate from control theory and statistics. It had to do with logic and symbols, not numbers. Artificial intelligence was large LISP programs, not linear algebra, differential equations, or statistics. Over the last decades this view has gradually eroded. Modern artificial intelligence researchers accept statistical and control algorithms, for example, as relevant competing methods or simply as tools of their trade. The previously ignored areas lying between artificial intelligence and conventional engineering are now among the most active, including new fields such as neural networks, intelligent control, and our topic, reinforcement learning. In reinforcement learning we extend ideas from optimal control theory and stochastic approximation to address the broader and more ambitious goals of artificial intelligence.

Feed Forward Control

We all are familiar with the feedback control used in a closed loop controller. This one is something new; came across today from the lecture at MESCE by Prof. C. Chandrsekhar - IITM. Combined feedforward plus feedback control can significantly improve performance over simple feedback control whenever there is a major disturbance that can be measured before it affects the process output. In the most ideal situation, feedforward control can entirely eliminate the effect of the measured disturbance on the process output. Even when there are modeling errors, feedforward control can often reduce the effect of the measured disturbance on the output better than that achievable by feedback control alone. However, the decision as to whether or not to use feedforward control depends on whether the degree of improvement in the response to the measured disturbance justifies the added costs of implementation and maintenance. The economic benefits of feedforward control can come from lower operating costs and/or increased salability of the product due to its more consistent quality. Feedforward control is always used along with feedback control because a feedback control system is required to track setpoint changes and to suppress unmeasured disturbances that are always present in any real process. The continuously stirred tank reactor is under feedback temperature control. Feedforward control is used to rapidly suppress feed flow rate disturbances.

a) open-loop b) feed-forward c)closed loop.

This is a control technique that can be measured but not controlled. The disturbance is measured and fed forward to an earlier part of the control loop so that corrective action can be initiated in advance of the disturbance having an adverse effect on the system response.

Thus Feedback control is typically used to regulate a variable (or variables) in a control systems design which has time varying disturbances, and or operating parameters. It is also used when the accuracy afforded by feedforward controls is not adequate to meet the application performance specifications.

Sunday, April 24, 2011

Wireless Energy Transmission

The team MESCE is about to start a work on the ever awaited dream WIRELESS ENERGY TRANSMISSION. The idea was put forward by Mr. Nithin Kamal. It would be the technology of the century once we can viably use it. Saves a lot of energy copper and electrolocution by malfunctioning of cables. We look forward to bid farewell from cables.

NOT A MERE DREAM

Do you think this s impossible? The wonderful property of electromagnetic induction and ionization of the air brings a small hope in us. If we can transmit radiowaves through modulation can we dream of transmitting a 60Hz 220V wave over a viable distance?

The electrodynamic induction wireless transmission technique is near field over distances up to about one-sixth of the wavelength used. Near field energy itself is non-radiative but some radiative losses do occur. In addition there are usually resistive losses. With electrodynamic induction, electric current flowing through a primary coil creates a magnetic field that acts on a secondary coil producing a current within it. Coupling must be tight in order to achieve high efficiency. As the distance from the primary is increased, more and more of the magnetic field misses the secondary. Even over a relatively short range the inductive coupling is grossly inefficient, wasting much of the transmitted energy.

This action of an electrical transformer is the simplest form of wireless power transmission. The primary and secondary circuits of a transformer are not directly connected. Energy transfer takes place through a process known as mutual induction. Principal functions are stepping the primary voltage either up or down and electrical isolation. Mobile phone and electric toothbrush battery chargers, and electrical power distribution transformers are examples of how this principle is used. Induction cookers use this method. The main drawback to this basic form of wireless transmission is short range. The receiver must be directly adjacent to the transmitter or induction unit in order to efficiently couple with it.

The application of resonance improves the situation somewhat. When resonant coupling is used the transmitter and receiver inductors are tuned to a mutual frequency and the drive current is modified from a sinusoidal to a nonsinusoidal transient waveform. Pulse power transfer occurs over multiple cycles. In this way significant power may be transmitted over a distance of up to a few times the size of the primary coil. Transmitting and receiving coils are usually single layer solenoids or flat spirals with series capacitors, which, in combination, allow the receiving element to be tuned to the transmitter frequency.

Common uses of resonance-enhanced electrodynamic induction are charging the batteries of portable devices such as laptop computers and cell phones, medical implants and electric vehicles. A localized charging technique selects the appropriate transmitting coil in a multilayer winding array structure. Resonance is used in both the wireless charging pad (the transmitter circuit) and the receiver module (embedded in the load) to maximize energy transfer efficiency. This approach is suitable for universal wireless charging pads for portable electronics such as mobile phones. It has been adopted as part of the Qi wireless charging standard.

It is also used for powering devices having no batteries, such as RFID patches and contactless smartcards, and to couple electrical energy from the primary inductor to the helical resonator of Tesla coil wireless power transmitters.

If this available for such a small distance can we try fro may be a meter then for 10,100s, 1000s.......

Kindly give your opinion.......

Thursday, April 7, 2011

BELBIC - Brain Emotional Learning based Intelligent Controller

BELBIC is considered as one controller which adopts the network model based from Moren and Balkenius.It is stated that some parts of the brain is responsible for our emotions.

The limbic system is comprised of four main structures: the amygdala, the hippocampus, regions of the limbic cortex and the septal area. These structures form connections between the limbic system and the hypothalamus, thalamus, and cerebral cortex. The hippocampus is important in memory and learning, while the limbic system itself is central in the control of emotional responses.

Amygdala - Almond shaped structure mass of nuclei.The primary roles in the formation and storage of memories associated with emotional events.
Orbitofrontal Cortex - A prefrontal cortex region in the frontal lobes in the brain which is involved in the cognitive processing of decision-making.The human OFC is among the least-understood regions of the human brain; but it has been proposed that the OFC is involved in sensory integration, in representing the affective value of reinforcers, and in decision-making and expectation
Thalamus - It is situated in the Midbrain and is responsible for processing and relaying movement and sensory information.

Emotional Learning

Traditionally, the study of learning in biological systems was conducted at the expense of overlooking its lesser known counterparts: motivation and emotion.Motivation is the drive that causes any system to do anything – without it, there is no reason to act. Emotions indicate how successful a course of actions have been and whether another set of actions should have been taken instead – they are a constant feedback to the learning system. Learning on the other hand, guarantees that motivation and emotional subsystems are able to adapt to constantly changing conditions.Every creature has innate abilities that accommodate its survival in the world. It can identify food, shelter, partners, and danger. But these “simple mappings between stimuli and reactions will not be enough to keep the organisms from encountering problems.”

Saturday, April 2, 2011

TIME DILATION

Time dilation is an observed difference of elapsed time between two observers which are moving relative to each other, or being differently situated from nearby gravitational masses. An observer will see the other observer's clock ticking at slower rate than his/hers. This effect doesn't arise from technical aspects of the clock or the fact that any signal needs time to propagate, but from the nature of space-time described by theory of relativity.
There are two types:
1. Relative velocity time dilation
When two observers are in relative uniform motion and uninfluenced by any gravitational mass, the point of view of each will be that the other's clock is ticking at a slower rate than the local clock. The faster the relative velocity, the greater the magnitude of time dilation. This case is sometimes called special relativistic time dilation. It is often interpreted as time "slowing down" for the other clock. But that is only true from the physical point of view of the local observer, and of others at relative rest . The point of view of the other observer will be that again the local clock is correct and it is the distant moving one that is slow. From a local perspective, time registered by clocks that are at rest with respect to the local frame of reference always appears to pass at the same rate.
2. Gravitational time dilation
There is another case of time dilation, where both observers are differently situated in their distance from a significant gravitational mass, such as the Earth or the Sun. One may suppose for simplicity that the observers are at relative rest. In the simplified case, the general theory of relativity describes how, for both observers, the clock that is closer to the gravitational mass, i.e. deeper in its "gravity well", appears to go slower than the clock that is more distant from the mass. They agree at least that the clock nearer the mass is slower in rate and on the ratio of the difference.